Code Generation

OP2 uses a code translator to transform a user’s sequential OP2 source files into parallelised variants targeting specific hardware backends. The current OP2 translator uses:

libclang to parse C/C++ source files.
fparser2 (the fparser PyPI package) to parse Fortran source files.
Jinja2 to render backend-specific kernel code from templates.

It is the recommended tool for all projects. A legacy translator is also retained for compatibility, consisting of a collection of standalone Python scripts.

Note

For most users the translator runs automatically. Any application that uses the standard OP2 Makefiles (makefiles/common.mk + makefiles/c_app.mk or f_app.mk) will invoke the translator transparently when you run make <app_name>_<variant>. You only need to be aware of the translator’s options if you are setting up a custom build system, running the translator in isolation for debugging, or need to override its behaviour.

OP2 Translator

Requirements

The translator and its dependencies are bundled inside translator-v2/ and are set up automatically by the OP2 Makefiles. If you need to run the translator outside the Makefile (e.g., in a custom CI pipeline), install the dependencies manually:

Python >= 3.8
Python packages: jinja2, fparser (fparser2 API), libclang, pcpp, sympy

cd translator-v2
pip install -r requirements.txt

Note

No system Clang installation is required. The libclang PyPI wheel (pinned to 18.1.1 in requirements.txt) is a self-contained manylinux wheel that bundles its own libclang.so — no apt install libclang-dev or equivalent is needed. The fparser package provides the fparser.two (fparser2) API used to parse Fortran source files.

Manual Usage

When invoked manually, the translator is called as a Python module:

python3 op2-translator [options] <source files ...>

Key options:

Option	Description
`-t <target>`, `--target <target>`	Code-generation target (see `Targets`_ below). Can be specified multiple times to generate several targets in a single invocation.
`-o <dir>`, `--out <dir>`	Output directory for generated files. Defaults to the directory of the first input source file.
`-soa`, `--force_soa`	Force Struct of Arrays data layout for all datasets. Equivalent to setting `OP_AUTO_SOA=1`.
`-I <dir>`	Add a directory to the include search path.
`-D <define>`	Add a preprocessor define.
`-c <json>`, `--config <json>`	Pass a JSON object of target-specific configuration options (can be repeated).
`-v`, `--verbose`	Enable verbose output.

Example — generate OpenMP and JIT CUDA variants:

python3 op2-translator -t openmp -t c_cuda airfoil.cpp

The translator produces a generated/ subdirectory in the output directory containing one subdirectory per target (e.g. generated/airfoil/openmp/, generated/airfoil/c_cuda/). Each subdirectory contains the generated op2_kernels.* file(s) ready to be compiled as part of the application build.

C/C++ Targets

The following code-generation targets are available for C/C++ applications (.cpp source files):

Target name	Build variant	Description
`seq`	`genseq`	Code-generated sequential CPU implementation. Produces a single `op2_kernels.cpp`. Preferable to the development sequential build for accurate benchmarking.
`openmp`	`openmp`	Multi-threaded CPU implementation using OpenMP. Produces `op2_kernels.cpp` with OpenMP pragmas. Set `OMP_NUM_THREADS` at runtime to control thread count.
`cuda`	`cuda`	Ahead-of-time compiled NVIDIA GPU implementation using CUDA. Produces `op2_kernels.cu` compiled offline by `nvcc` during the application build. Requires `CUDA_INSTALL_PATH` and `NV_ARCH` to be set.
`hip`	`hip`	Ahead-of-time compiled AMD GPU implementation using HIP. Produces `op2_kernels.cpp` compiled by `hipcc` during the application build. Requires `HIP_INSTALL_PATH` and `HIP_ARCH` to be set.
`c_cuda`	`c_cuda`	JIT-compiled NVIDIA GPU implementation. Produces `op2_kernels.cu` compiled by `nvcc` during the application build. Device kernel source strings are embedded in the binary via INCBIN and compiled at application start-up by NVRTC, so the GPU architecture is selected at runtime. Requires `CUDA_INSTALL_PATH` and `nvcc`.
`c_hip`	`c_hip`	JIT-compiled AMD GPU implementation. Produces `op2_kernels.cpp` compiled by `hipcc` during the application build. Device kernel source strings are embedded in the binary and compiled at application start-up by the HIP RTC library, so the GPU architecture is selected at runtime. Requires `HIP_INSTALL_PATH` and `hipcc`.

All targets have a corresponding distributed-memory MPI variant built automatically when an MPI-enabled OP2 library is available (e.g. the cuda target produces both <app>_cuda and <app>_mpi_cuda).

Note

The c_cuda and c_hip JIT targets use the same CUDA/HIP runtime libraries as the cuda and hip AOT targets, and both also require nvcc/hipcc at build time. The key difference is that device kernel source is embedded in the binary and compiled at application launch via NVRTC/HIP RTC, allowing the GPU architecture to be selected at runtime rather than fixed at build time.

Fortran Targets

The language is detected automatically from the file extension (.F90 or .f90). The same target name strings are used as for C/C++, and the translator selects the appropriate Fortran code-generation scheme.

Example — generate Fortran OpenMP and C_CUDA variants:

python3 op2-translator -t openmp -t c_cuda myapp.F90

The following targets are available for Fortran applications:

Target name	Description
`seq`	Code-generated sequential Fortran implementation. Produces a `master_kernel.F90` containing all loop host subroutines.
`openmp`	Multi-threaded Fortran implementation using OpenMP pragmas. Includes optional SIMD vectorisation of indirect loops.
`cuda`	Native Fortran CUDA implementation using CUDA Fortran (`.CUF`). Device kernels are written in Fortran with `attributes(device)` annotations. Requires a CUDA Fortran-capable compiler (NVHPC).
`c_seq`	Fortran interop with C sequential kernels. Generates both a Fortran host file (`.F90`) and a C kernel file (`.cpp`). Useful for transitioning Fortran applications to portable C kernel implementations.
`c_cuda`	Fortran interop with CUDA kernels using JIT compilation via NVRTC. Generates Fortran host code (`.F90`) and CUDA device kernel source (`.cu`). Device kernels are compiled at application start-up. This is the primary recommended GPU target for Fortran applications.
`c_hip`	Fortran interop with HIP kernels using JIT compilation via the HIP RTC library. Generates Fortran host code (`.F90`) and HIP device kernel source (`.hip.cpp`). Device kernels are compiled at application start-up.

Note

For Fortran applications, the c_cuda and c_hip JIT targets are the primary recommended GPU backends. The native Fortran cuda target (CUDA Fortran) is also available but requires the NVHPC compiler.

Choosing Between AOT and JIT GPU Targets

This applies to both C/C++ and Fortran applications:

Consideration	AOT (`cuda` / `hip`)	JIT (`c_cuda` / `c_hip`)
Build-time GPU compiler required	Yes (`nvcc` / `hipcc`)	Yes (`nvcc` / `hipcc`)
Target architecture fixed at build time	Yes (via `NV_ARCH` for NVIDIA; `HIP_ARCH` for AMD)	No — resolved at application start-up
Application start-up overhead	None	Small (kernel compilation on first run and when constant values change at runtime)
Fortran native GPU Fortran available	Yes (`cuda` target with NVHPC)	N/A — uses C interop layer
Recommended for	Deployments with a known GPU	When a large number of constant values are determined at runtime and used in a kernel

SoA Data Layout

By default, OP2 stores datasets in Array of Structs (AoS) layout. Struct of Arrays (SoA) layout can improve GPU memory access patterns and is often beneficial for CUDA and HIP targets.

To enable SoA layout for all datasets, choose one of:

Append :soa to individual type strings in op_decl_dat() calls in your source for per-dataset control. For example, to store a double dataset in SoA layout:
```
op_dat p_K = op_decl_dat(cells, 16, "double:soa", K, "p_K");
```
Without the suffix the dataset uses the default AoS layout. Note that the data supplied by the user should remain in AoS layout regardless — OP2 performs the conversion internally.
Pass the -soa flag to the translator. When using the OP2 Makefiles, append it to the TRANSLATOR variable in the application Makefile before including c_app.mk:
```
TRANSLATOR += --force_soa
include ../../../../../makefiles/common.mk
include ../../../../../makefiles/c_app.mk
```
Or when invoking the translator manually:
```
python3 op2-translator -soa -t cuda myapp.cpp
```

Makefile Integration Variables

When using the OP2 Makefiles (makefiles/c_app.mk / makefiles/f_app.mk), the following Make variables can be set in your application Makefile before the include of the OP2 Makefile fragment to customise the build:

Variable	Description
`APP_EXTRA_TRANSLATOR_FLAGS`	Extra command-line flags appended to every translator invocation for the application. Useful for passing additional `-I` include paths or `-D` defines that the translator needs to parse your source correctly, without altering the shared `TRANSLATOR` variable.
`VARIANT_FILTER`	A Make pattern (default `%`, matches everything) used to keep only the matching build variants. For example, set `VARIANT_FILTER := %cuda%` to build only CUDA-related variants.
`VARIANT_FILTER_OUT`	A Make pattern used to exclude matching build variants from the set of targets printed and built. For example, `VARIANT_FILTER_OUT := %hip%` suppresses all HIP variants.

Example — restrict an application to CUDA variants only and pass an extra define:

VARIANT_FILTER := %cuda%
APP_EXTRA_TRANSLATOR_FLAGS := -DUSE_FEATURE_X
include path/to/makefiles/common.mk
include path/to/makefiles/c_app.mk

Legacy Translator

The v1 translator is a collection of standalone Python scripts located in translator/c/ (C/C++) and translator/fortran/ (Fortran). Each script targets a single parallelisation strategy.

Note

The legacy translator is retained for compatibility. For new projects, use the OP2 Translator.

C/C++ Targets

Scripts are located in translator/c/. To use, uncomment the desired generator inside op2.py and invoke:

cd translator/c
python3 op2.py path/to/myapp.cpp

Available generators:

Script	Target
`op2_gen_seq.py`	Sequential (reference)
`op2_gen_openmp.py` / `op2_gen_openmp_simple.py`	OpenMP multi-threaded CPU
`op2_gen_omp_vec.py`	OpenMP with SIMD vectorisation
`op2_gen_cuda.py`	CUDA (Fermi)
`op2_gen_cuda_simple.py`	CUDA (Kepler and later, optimised)
`op2_gen_cuda_simple_hyb.py`	Hybrid OpenMP + CUDA
`op2_gen_mpi_vec.py`	MPI + SIMD vectorisation
`op2_gen_openacc.py`	OpenACC
`op2_gen_openmp4.py`	OpenMP 4.0 device offload

Fortran Targets

Scripts are located in translator/fortran/. To use, uncomment the desired generator inside op2_fortran.py and invoke:

cd translator/fortran
python3 op2_fortran.py path/to/myapp.F90

Available generators:

Script	Target
`op2_gen_mpiseq.py` / `op2_gen_mpiseq2.py` / `op2_gen_mpiseq3.py`	MPI + sequential host stubs
`op2_gen_mpivec.py`	MPI + sequential with Intel vectorisation
`op2_gen_openmp.py` / `op2_gen_openmp2.py` / `op2_gen_openmp3.py`	OpenMP variants
`op2_gen_openmpINC.py`	OpenMP with INC staging
`op2_gen_cuda.py` / `op2_gen_cudaINC.py` / `op2_gen_cuda_color2.py` / `op2_gen_cuda_permute.py`	CUDA variants
`op2_gen_openacc.py`	OpenACC
`op2_gen_openmp4.py`	OpenMP 4.0 device offload