A DPC++ Backend for the OCCA Portability Framework

Kris Rowe

Lemont, Illinois

0 0

OCCA—an open source, portable, and vendor neutral framework for parallel programming on heterogeneous platforms—is used by mission critical computational science and engineering applications of public and private sector organizations, including the U.S. Department of Energy and Shell. ...learn more

Project status: Published/In Market

oneAPI, HPC

Intel Technologies
oneAPI, DPC++, Intel Iris Xe, Intel Iris Xe MAX, Intel vTune

Docs/PDFs [1]Code Samples [1]Links [5]

Overview / Usage

OCCA is an open source, portable, and vendor neutral framework for parallel programming on heterogeneous platforms. The framework consists of several orthogonal components which can be used together or individually: the OCCA API and runtime, the OCCA kernel language, and the OCCA command line tool.

The OCCA API provides unified models for—such as a device, memory, or kernel—which are common to other programming models. The OCCA runtime provides several backends—including DPC++, CUDA, HIP, OpenMP, OpenCL, and Metal—which implement the API as a set of lightweight wrappers. Language support is provided for applications written in C, C, and Fortran.

The OCCA Kernel Language (OKL) enables the creation of portable device kernels using a directive-based extension to the C-language. During runtime, the OCCA Jitter translates OKL code to the programming language of the chosen backend, eventually generating the device binary using the chosen backend stack. Alternatively, kernels can be written as backend-specific code (e.g., OpenCL or CUDA) directly.

Mission critical computational science and engineering applications from the U.S. Department of Energy and Shell rely on OCCA. For example, NekRS—a new computational fluid dynamics solver from the Nek5000 team—is used simulate coolant flow inside of small modular reactors, and design more efficient combustion engines. The development of a DPC+ + backend for OCCA was jointly undertaken by Argonne Leadership Computing Facility and Intel in order to support these applications on platforms utilizing Intel Xe GPUs, including the Aurora exascale supercomputer.

Methodology / Approach

Since OCCA supports device kernels written using backend-specific code, the OCCA DPC+ + backend was developed in three phases.

The OCCA API was implemented. Device kernels written in DPC+ + were used to verify correctness.
The logic for OKL to DPC+ + translation was implemented. To verify correctness, OKL kernels corresponding to the DPC+ + kernels in step 1 (above) were translated using the OCCA command line tool.
Combined usage of the OCCA API and OKL kernels was validated using OCCA's internal test harness, microbenchmark kernels and mini-apps, and the full NekRS application.

The OCCA runtime uses the PIMPL design pattern. An abstract interface is provided via a core collection of base classes. To create a new OCCA backend, developers need to extend these base classes and implement their virtual functions. Memory management in OCCA applications is handled through C-style malloc and free functions; subsequently, memory is passed to device kernels as pointer arguments. The DPC+ + Unified Shared Memory model was used in the implementation of the OCCA DPC+ + backend since it most closely aligns with OCCA's API. In contrast, using the DPC+ + buffer/accessor approach would have required significant redesigning of OCCA's internal API and would likely affect many downstream projects. Finally, OKL kernels are translated to extern "C" functions which invoke a DPC+ + kernel—defined as a lambda capture—using the nd_range flavor of parallel_for.

Technologies Used

Software

Intel oneAPI Base Toolkit
- Intel oneAPI DPC++ Compiler
- Intel Distribution for GDB
- Intel Advisor
- Intel VTune Profiler
CMake
Visual Studio Code
OCCA
NekRS

Hardware

Intel Iris Xe (Gen9) GPU
Intel Iris Xe MAX GPU
Intel Xe-HP GPU
Argonne National Laboratory's JLSE Testbeds

Acknowledgments

_This work was supported by Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357 and by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative. _

Projects 1
Followers 2

Anoop Madhusoodhanan Prabha

Hillsboro, Oregon

Comments (0)

You have disabled JavaScript

We are sorry, but without JavaScript we are currently unable to display the latest activity feed. Please, enable Javascript in your browser.

A DPC++ Backend for the OCCA Portability Framework

Kris Rowe

Overview / Usage

Methodology / Approach

Technologies Used

Documents and Presentations

Repository

Other links

Collaborators

Anoop Madhusoodhanan Prabha

Login to continue

This action requires you to be logged in.

Thanks for voting. Please leave a comment.

A DPC++ Backend for the OCCA Portability Framework

Kris Rowe

Overview / Usage

Methodology / Approach

Technologies Used

Documents and Presentations

Repository

Other links

Collaborators

Anoop Madhusoodhanan Prabha

Login to continue

This action requires you to be logged in.