Code Generation
OP2 uses a code translator to transform a user’s sequential OP2 source files into parallelised variants targeting specific hardware backends. The current OP2 translator uses:
libclang to parse C/C++ source files.
fparser2 (the
fparserPyPI package) to parse Fortran source files.Jinja2 to render backend-specific kernel code from templates.
It is the recommended tool for all projects. A legacy translator is also retained for compatibility, consisting of a collection of standalone Python scripts.
Note
For most users the translator runs automatically. Any application that uses the standard OP2 Makefiles (makefiles/common.mk + makefiles/c_app.mk or f_app.mk) will invoke the translator transparently when you run make <app_name>_<variant>. You only need to be aware of the translator’s options if you are setting up a custom build system, running the translator in isolation for debugging, or need to override its behaviour.
OP2 Translator
Requirements
The translator and its dependencies are bundled inside translator-v2/ and are set up automatically by the OP2 Makefiles. If you need to run the translator outside the Makefile (e.g., in a custom CI pipeline), install the dependencies manually:
Python >= 3.8
Python packages:
jinja2,fparser(fparser2 API),libclang,pcpp,sympy
cd translator-v2
pip install -r requirements.txt
Note
No system Clang installation is required. The libclang PyPI wheel (pinned to 18.1.1 in requirements.txt) is a self-contained manylinux wheel that bundles its own libclang.so — no apt install libclang-dev or equivalent is needed. The fparser package provides the fparser.two (fparser2) API used to parse Fortran source files.
Manual Usage
When invoked manually, the translator is called as a Python module:
python3 op2-translator [options] <source files ...>
Key options:
Option |
Description |
|---|---|
|
Code-generation target (see `Targets`_ below). Can be specified multiple times to generate several targets in a single invocation. |
|
Output directory for generated files. Defaults to the directory of the first input source file. |
|
Force Struct of Arrays data layout for all datasets. Equivalent to setting |
|
Add a directory to the include search path. |
|
Add a preprocessor define. |
|
Pass a JSON object of target-specific configuration options (can be repeated). |
|
Enable verbose output. |
Example — generate OpenMP and JIT CUDA variants:
python3 op2-translator -t openmp -t c_cuda airfoil.cpp
The translator produces a generated/ subdirectory in the output directory containing one subdirectory per target (e.g. generated/airfoil/openmp/, generated/airfoil/c_cuda/). Each subdirectory contains the generated op2_kernels.* file(s) ready to be compiled as part of the application build.
C/C++ Targets
The following code-generation targets are available for C/C++ applications (.cpp source files):
Target name |
Build variant |
Description |
|---|---|---|
|
|
Code-generated sequential CPU implementation. Produces a single |
|
|
Multi-threaded CPU implementation using OpenMP. Produces |
|
|
Ahead-of-time compiled NVIDIA GPU implementation using CUDA. Produces |
|
|
Ahead-of-time compiled AMD GPU implementation using HIP. Produces |
|
|
JIT-compiled NVIDIA GPU implementation. Produces |
|
|
JIT-compiled AMD GPU implementation. Produces |
All targets have a corresponding distributed-memory MPI variant built automatically when an MPI-enabled OP2 library is available (e.g. the cuda target produces both <app>_cuda and <app>_mpi_cuda).
Note
The c_cuda and c_hip JIT targets use the same CUDA/HIP runtime libraries as the cuda and hip AOT targets, and both also require nvcc/hipcc at build time. The key difference is that device kernel source is embedded in the binary and compiled at application launch via NVRTC/HIP RTC, allowing the GPU architecture to be selected at runtime rather than fixed at build time.
Fortran Targets
The language is detected automatically from the file extension (.F90 or .f90). The same target name strings are used as for C/C++, and the translator selects the appropriate Fortran code-generation scheme.
Example — generate Fortran OpenMP and C_CUDA variants:
python3 op2-translator -t openmp -t c_cuda myapp.F90
The following targets are available for Fortran applications:
Target name |
Description |
|---|---|
|
Code-generated sequential Fortran implementation. Produces a |
|
Multi-threaded Fortran implementation using OpenMP pragmas. Includes optional SIMD vectorisation of indirect loops. |
|
Native Fortran CUDA implementation using CUDA Fortran ( |
|
Fortran interop with C sequential kernels. Generates both a Fortran host file ( |
|
Fortran interop with CUDA kernels using JIT compilation via NVRTC. Generates Fortran host code ( |
|
Fortran interop with HIP kernels using JIT compilation via the HIP RTC library. Generates Fortran host code ( |
Note
For Fortran applications, the c_cuda and c_hip JIT targets are the primary recommended GPU backends. The native Fortran cuda target (CUDA Fortran) is also available but requires the NVHPC compiler.
Choosing Between AOT and JIT GPU Targets
This applies to both C/C++ and Fortran applications:
Consideration |
AOT ( |
JIT ( |
|---|---|---|
Build-time GPU compiler required |
Yes ( |
Yes ( |
Target architecture fixed at build time |
Yes (via |
No — resolved at application start-up |
Application start-up overhead |
None |
Small (kernel compilation on first run and when constant values change at runtime) |
Fortran native GPU Fortran available |
Yes ( |
N/A — uses C interop layer |
Recommended for |
Deployments with a known GPU |
When a large number of constant values are determined at runtime and used in a kernel |
SoA Data Layout
By default, OP2 stores datasets in Array of Structs (AoS) layout. Struct of Arrays (SoA) layout can improve GPU memory access patterns and is often beneficial for CUDA and HIP targets.
To enable SoA layout for all datasets, choose one of:
Append
:soato individualtypestrings inop_decl_dat()calls in your source for per-dataset control. For example, to store adoubledataset in SoA layout:op_dat p_K = op_decl_dat(cells, 16, "double:soa", K, "p_K");
Without the suffix the dataset uses the default AoS layout. Note that the data supplied by the user should remain in AoS layout regardless — OP2 performs the conversion internally.
Pass the
-soaflag to the translator. When using the OP2 Makefiles, append it to theTRANSLATORvariable in the application Makefile before includingc_app.mk:TRANSLATOR += --force_soa include ../../../../../makefiles/common.mk include ../../../../../makefiles/c_app.mk
Or when invoking the translator manually:
python3 op2-translator -soa -t cuda myapp.cpp
Makefile Integration Variables
When using the OP2 Makefiles (makefiles/c_app.mk / makefiles/f_app.mk), the following Make variables can be set in your application Makefile before the include of the OP2 Makefile fragment to customise the build:
Variable |
Description |
|---|---|
|
Extra command-line flags appended to every translator invocation for the application. Useful for passing additional |
|
A Make pattern (default |
|
A Make pattern used to exclude matching build variants from the set of targets printed and built. For example, |
Example — restrict an application to CUDA variants only and pass an extra define:
VARIANT_FILTER := %cuda%
APP_EXTRA_TRANSLATOR_FLAGS := -DUSE_FEATURE_X
include path/to/makefiles/common.mk
include path/to/makefiles/c_app.mk
Legacy Translator
The v1 translator is a collection of standalone Python scripts located in translator/c/ (C/C++) and translator/fortran/ (Fortran). Each script targets a single parallelisation strategy.
Note
The legacy translator is retained for compatibility. For new projects, use the OP2 Translator.
C/C++ Targets
Scripts are located in translator/c/. To use, uncomment the desired generator inside op2.py and invoke:
cd translator/c
python3 op2.py path/to/myapp.cpp
Available generators:
Script |
Target |
|---|---|
|
Sequential (reference) |
|
OpenMP multi-threaded CPU |
|
OpenMP with SIMD vectorisation |
|
CUDA (Fermi) |
|
CUDA (Kepler and later, optimised) |
|
Hybrid OpenMP + CUDA |
|
MPI + SIMD vectorisation |
|
OpenACC |
|
OpenMP 4.0 device offload |
Fortran Targets
Scripts are located in translator/fortran/. To use, uncomment the desired generator inside op2_fortran.py and invoke:
cd translator/fortran
python3 op2_fortran.py path/to/myapp.F90
Available generators:
Script |
Target |
|---|---|
|
MPI + sequential host stubs |
|
MPI + sequential with Intel vectorisation |
|
OpenMP variants |
|
OpenMP with INC staging |
|
CUDA variants |
|
OpenACC |
|
OpenMP 4.0 device offload |