Accelerate The “Stable” Three-Way QuickSort Performance Leveraging The Modern Nvidia GPGPUs
Arthur V. Ratz
Lviv, Lviv Oblast
- 0 Collaborators
Another alternative of the classical “stable” three-way quicksort performance optimization using Nvidia CUDA Development Toolkit, OpenMP 4.5/5.0 and Intel’s Open-Source Clang/LLVM compiler distribution. ...learn more
Project status: Published/In Market
Intel Technologies
Other
Overview / Usage
The following project is another alternative of the parallel “stable” three-way quicksort implementation, previously introduced in my https://devmesh.intel.com/projects/parallel-stable-sort-performance-optimization-using-intel-parallel-studio-xe-and-intel-oneapi-hpc-toolkit project. The main goal of this project is to provide an even better performance speed-up gain of the parallel “stable” three-way quicksort, offloading the execution of specific workloads to the Nvidia GPUs, rather than host CPU and other acceleration targets, offering the ultimately high performance (about 36x faster) compared to the sequential quicksort execution. Unlike the previous project, I’ve used the OpenMP 4.5/5.0 library with offloading capabilities and open-source distribution of the Intel’s Clang/LLVM compiler (https://github.com/llvm/llvm-project) to deliver a modern code, implementing the parallel three-way quicksort, being introduced.
Methodology / Approach
The parallel “stable” three-way quicksort algorithm introduced in:-
"An Efficient Parallel Three-Way Quicksort Using Intel C++ Compiler And OpenMP 4.5 Library" - https://software.intel.com/en-us/articles/an-efficient-parallel-three-way-quicksort-using-intel-c-compiler-and-openmp-45-library
-
"How To Implement A Parallel "Stable" Three-Way Quicksort Using Intel C++ Compiler and OpenMP 4.5 library" - https://software.intel.com/en-us/articles/how-to-implement-a-parallel-stable-three-way-quicksort-using-intel-c-compiler-and-openmp-45
-
"How To Implement The Parallel "Stable" Sort Using Intel® MPI Library And Deploy It To A Multi-Node Computational Cluster" - https://software.intel.com/en-us/articles/how-to-implement-a-multi-node-parallel-stable-sort-using-intel-mpi-library
-
"How To Optimize A Parallel Stable Sort Performance Using The Revolutionary Intel® oneAPI HPC Toolkit" - https://software.intel.com/en-us/articles/how-to-optimize-the-parallel-stable-sort-performance-using-intel-oneapi-hpc-toolkit
Technologies Used
Hardware:
· Nvidia GeForce GTX 1070 SLI x 2 8 GiB GDDR5 Graphics Cards;
Software:
· Nvidia CUDA Development Toolkit;
· Intel’s Open-Source Clang/LLVM compiler distribution;
· OpenMP 4.5/5.0 Library with offloading capabilities;
Repository
https://github.com/arthurratz/parallel_stable_sort_nvptx64