Drive faster breakthroughs through faster code: Get more results on your hardware today and carry your code forward to the future with code modernization.
This work uses the k-means algorithm to asses the performance portability of one of the most advanced implementations of the literature, He-Vialle, over different programming models (DPC++, CUDA, OpenMP) and multi-vendor CPU-GPU architectures.
This project demonstrates the visualization of a direction field with Python using the differential equation of a falling object as a case study. The effectiveness of Heterogeneous Computing is also shown by exploring optimized libraries & added functionalities in Intel® Distribution for Python*.
This project targets specifically the acceleration of the pairwise alignment algorithms and pattern matching algorithms contained in the SeqAn library.
A library-based programming model for C, C++ and Fortran based on Managed Abstract Memory Arrays, aiming to deliver simplified and efficient usage of diverse memory systems to application developers in a performance portable way.
a systolic none deterministic and intercrosses communication support to make dataflow pipeline applications more reliable, more flexible for producing high-performance computing. The basic idea behind this work is to obtain a dynamic reconfigurable interconnection structure for FPGA dataflow applica
Dpctl provides Python SYCL bindings and SYCL-based Python Array API library. The dpctl simplifies building Python native extensions that use oneAPI DPC++ to implement portable data-parallel functions, as well as implements such extensions for its array library.
Phase field technique is used to simulate microstructure evolution during materials processing such as 3D printing and additive manufacturing apart from traditional manufacturing techniques like welding, casting etc. These non-linear PDE solvers are compute intensive and also memory intensive.
This is a program i designed to plot the Mandelbrot set, it can do this either with 1 thread, all threads or utilising SYCL via intels DPC++ compiler included in OneAPI toolkit to use GPU acceleration.
A novel algorithm to aggressively reduce on-chip block RAM (BRAM) and off-chip DRAM utilisation of stencil codes running on FPGAs.
The algorithm extracts memory accesses from computational pipelines and removes all redundant intermediate arrays, including those used for stencil buffering.
XTASK enables extreme fine-grained parallelism across modern many-core architectures with hundreds of cores by implementing a novel lock-less multiple producer multiple consumer, out-of-order queuing mechanism for managing parallel tasks.
We propose the PLSSVM library, which efficiently brings SVMs to massively parallel accelerators. It implements the basic functionality of the most widely used SVM library, LIBSVM, and can target different hardware from various vendors by using our backends: OpenMP, CUDA, HIP, OpenCL, and SYCL.
Gavin AI is a project, created by Scot_Survivor (Joshua Shiells) & ShmarvDogg, which aims to have Englsih human like conversations through the use of AI and ML. Gavin works on the Transformer architecture however, Performer & FNet architectures are being investigated for better scaling.
In the age of AI, algorithms must efficiently cope with vast data sets. We propose a performance-portable implementation of Locality-Sensitive Hashing (LSH), an approximate k-nearest neighbors algorithm, using different SYCL implementations—ComputeCpp, hipSYCL, DPC++—supporting multiple GPUs.