Code Generation

OP2 uses a code translator to transform a user’s sequential OP2 source files into parallelised variants targeting specific hardware backends. The current OP2 translator uses:

  • libclang to parse C/C++ source files.

  • fparser2 (the fparser PyPI package) to parse Fortran source files.

  • Jinja2 to render backend-specific kernel code from templates.

It is the recommended tool for all projects. A legacy translator is also retained for compatibility, consisting of a collection of standalone Python scripts.

Note

For most users the translator runs automatically. Any application that uses the standard OP2 Makefiles (makefiles/common.mk + makefiles/c_app.mk or f_app.mk) will invoke the translator transparently when you run make <app_name>_<variant>. You only need to be aware of the translator’s options if you are setting up a custom build system, running the translator in isolation for debugging, or need to override its behaviour.


OP2 Translator

Requirements

The translator and its dependencies are bundled inside translator-v2/ and are set up automatically by the OP2 Makefiles. If you need to run the translator outside the Makefile (e.g., in a custom CI pipeline), install the dependencies manually:

  • Python >= 3.8

  • Python packages: jinja2, fparser (fparser2 API), libclang, pcpp, sympy

cd translator-v2
pip install -r requirements.txt

Note

No system Clang installation is required. The libclang PyPI wheel (pinned to 18.1.1 in requirements.txt) is a self-contained manylinux wheel that bundles its own libclang.so — no apt install libclang-dev or equivalent is needed. The fparser package provides the fparser.two (fparser2) API used to parse Fortran source files.

Manual Usage

When invoked manually, the translator is called as a Python module:

python3 op2-translator [options] <source files ...>

Key options:

Option

Description

-t <target>, --target <target>

Code-generation target (see `Targets`_ below). Can be specified multiple times to generate several targets in a single invocation.

-o <dir>, --out <dir>

Output directory for generated files. Defaults to the directory of the first input source file.

-soa, --force_soa

Force Struct of Arrays data layout for all datasets. Equivalent to setting OP_AUTO_SOA=1.

-I <dir>

Add a directory to the include search path.

-D <define>

Add a preprocessor define.

-c <json>, --config <json>

Pass a JSON object of target-specific configuration options (can be repeated).

-v, --verbose

Enable verbose output.

Example — generate OpenMP and JIT CUDA variants:

python3 op2-translator -t openmp -t c_cuda airfoil.cpp

The translator produces a generated/ subdirectory in the output directory containing one subdirectory per target (e.g. generated/airfoil/openmp/, generated/airfoil/c_cuda/). Each subdirectory contains the generated op2_kernels.* file(s) ready to be compiled as part of the application build.

C/C++ Targets

The following code-generation targets are available for C/C++ applications (.cpp source files):

Target name

Build variant

Description

seq

genseq

Code-generated sequential CPU implementation. Produces a single op2_kernels.cpp. Preferable to the development sequential build for accurate benchmarking.

openmp

openmp

Multi-threaded CPU implementation using OpenMP. Produces op2_kernels.cpp with OpenMP pragmas. Set OMP_NUM_THREADS at runtime to control thread count.

cuda

cuda

Ahead-of-time compiled NVIDIA GPU implementation using CUDA. Produces op2_kernels.cu compiled offline by nvcc during the application build. Requires CUDA_INSTALL_PATH and NV_ARCH to be set.

hip

hip

Ahead-of-time compiled AMD GPU implementation using HIP. Produces op2_kernels.cpp compiled by hipcc during the application build. Requires HIP_INSTALL_PATH and HIP_ARCH to be set.

c_cuda

c_cuda

JIT-compiled NVIDIA GPU implementation. Produces op2_kernels.cu compiled by nvcc during the application build. Device kernel source strings are embedded in the binary via INCBIN and compiled at application start-up by NVRTC, so the GPU architecture is selected at runtime. Requires CUDA_INSTALL_PATH and nvcc.

c_hip

c_hip

JIT-compiled AMD GPU implementation. Produces op2_kernels.cpp compiled by hipcc during the application build. Device kernel source strings are embedded in the binary and compiled at application start-up by the HIP RTC library, so the GPU architecture is selected at runtime. Requires HIP_INSTALL_PATH and hipcc.

All targets have a corresponding distributed-memory MPI variant built automatically when an MPI-enabled OP2 library is available (e.g. the cuda target produces both <app>_cuda and <app>_mpi_cuda).

Note

The c_cuda and c_hip JIT targets use the same CUDA/HIP runtime libraries as the cuda and hip AOT targets, and both also require nvcc/hipcc at build time. The key difference is that device kernel source is embedded in the binary and compiled at application launch via NVRTC/HIP RTC, allowing the GPU architecture to be selected at runtime rather than fixed at build time.

Fortran Targets

The language is detected automatically from the file extension (.F90 or .f90). The same target name strings are used as for C/C++, and the translator selects the appropriate Fortran code-generation scheme.

Example — generate Fortran OpenMP and C_CUDA variants:

python3 op2-translator -t openmp -t c_cuda myapp.F90

The following targets are available for Fortran applications:

Target name

Description

seq

Code-generated sequential Fortran implementation. Produces a master_kernel.F90 containing all loop host subroutines.

openmp

Multi-threaded Fortran implementation using OpenMP pragmas. Includes optional SIMD vectorisation of indirect loops.

cuda

Native Fortran CUDA implementation using CUDA Fortran (.CUF). Device kernels are written in Fortran with attributes(device) annotations. Requires a CUDA Fortran-capable compiler (NVHPC).

c_seq

Fortran interop with C sequential kernels. Generates both a Fortran host file (.F90) and a C kernel file (.cpp). Useful for transitioning Fortran applications to portable C kernel implementations.

c_cuda

Fortran interop with CUDA kernels using JIT compilation via NVRTC. Generates Fortran host code (.F90) and CUDA device kernel source (.cu). Device kernels are compiled at application start-up. This is the primary recommended GPU target for Fortran applications.

c_hip

Fortran interop with HIP kernels using JIT compilation via the HIP RTC library. Generates Fortran host code (.F90) and HIP device kernel source (.hip.cpp). Device kernels are compiled at application start-up.

Note

For Fortran applications, the c_cuda and c_hip JIT targets are the primary recommended GPU backends. The native Fortran cuda target (CUDA Fortran) is also available but requires the NVHPC compiler.

Choosing Between AOT and JIT GPU Targets

This applies to both C/C++ and Fortran applications:

Consideration

AOT (cuda / hip)

JIT (c_cuda / c_hip)

Build-time GPU compiler required

Yes (nvcc / hipcc)

Yes (nvcc / hipcc)

Target architecture fixed at build time

Yes (via NV_ARCH for NVIDIA; HIP_ARCH for AMD)

No — resolved at application start-up

Application start-up overhead

None

Small (kernel compilation on first run and when constant values change at runtime)

Fortran native GPU Fortran available

Yes (cuda target with NVHPC)

N/A — uses C interop layer

Recommended for

Deployments with a known GPU

When a large number of constant values are determined at runtime and used in a kernel

SoA Data Layout

By default, OP2 stores datasets in Array of Structs (AoS) layout. Struct of Arrays (SoA) layout can improve GPU memory access patterns and is often beneficial for CUDA and HIP targets.

To enable SoA layout for all datasets, choose one of:

  • Append :soa to individual type strings in op_decl_dat() calls in your source for per-dataset control. For example, to store a double dataset in SoA layout:

    op_dat p_K = op_decl_dat(cells, 16, "double:soa", K, "p_K");
    

    Without the suffix the dataset uses the default AoS layout. Note that the data supplied by the user should remain in AoS layout regardless — OP2 performs the conversion internally.

  • Pass the -soa flag to the translator. When using the OP2 Makefiles, append it to the TRANSLATOR variable in the application Makefile before including c_app.mk:

    TRANSLATOR += --force_soa
    include ../../../../../makefiles/common.mk
    include ../../../../../makefiles/c_app.mk
    

    Or when invoking the translator manually:

    python3 op2-translator -soa -t cuda myapp.cpp
    

Makefile Integration Variables

When using the OP2 Makefiles (makefiles/c_app.mk / makefiles/f_app.mk), the following Make variables can be set in your application Makefile before the include of the OP2 Makefile fragment to customise the build:

Variable

Description

APP_EXTRA_TRANSLATOR_FLAGS

Extra command-line flags appended to every translator invocation for the application. Useful for passing additional -I include paths or -D defines that the translator needs to parse your source correctly, without altering the shared TRANSLATOR variable.

VARIANT_FILTER

A Make pattern (default %, matches everything) used to keep only the matching build variants. For example, set VARIANT_FILTER := %cuda% to build only CUDA-related variants.

VARIANT_FILTER_OUT

A Make pattern used to exclude matching build variants from the set of targets printed and built. For example, VARIANT_FILTER_OUT := %hip% suppresses all HIP variants.

Example — restrict an application to CUDA variants only and pass an extra define:

VARIANT_FILTER := %cuda%
APP_EXTRA_TRANSLATOR_FLAGS := -DUSE_FEATURE_X
include path/to/makefiles/common.mk
include path/to/makefiles/c_app.mk

Legacy Translator

The v1 translator is a collection of standalone Python scripts located in translator/c/ (C/C++) and translator/fortran/ (Fortran). Each script targets a single parallelisation strategy.

Note

The legacy translator is retained for compatibility. For new projects, use the OP2 Translator.

C/C++ Targets

Scripts are located in translator/c/. To use, uncomment the desired generator inside op2.py and invoke:

cd translator/c
python3 op2.py path/to/myapp.cpp

Available generators:

Script

Target

op2_gen_seq.py

Sequential (reference)

op2_gen_openmp.py / op2_gen_openmp_simple.py

OpenMP multi-threaded CPU

op2_gen_omp_vec.py

OpenMP with SIMD vectorisation

op2_gen_cuda.py

CUDA (Fermi)

op2_gen_cuda_simple.py

CUDA (Kepler and later, optimised)

op2_gen_cuda_simple_hyb.py

Hybrid OpenMP + CUDA

op2_gen_mpi_vec.py

MPI + SIMD vectorisation

op2_gen_openacc.py

OpenACC

op2_gen_openmp4.py

OpenMP 4.0 device offload

Fortran Targets

Scripts are located in translator/fortran/. To use, uncomment the desired generator inside op2_fortran.py and invoke:

cd translator/fortran
python3 op2_fortran.py path/to/myapp.F90

Available generators:

Script

Target

op2_gen_mpiseq.py / op2_gen_mpiseq2.py / op2_gen_mpiseq3.py

MPI + sequential host stubs

op2_gen_mpivec.py

MPI + sequential with Intel vectorisation

op2_gen_openmp.py / op2_gen_openmp2.py / op2_gen_openmp3.py

OpenMP variants

op2_gen_openmpINC.py

OpenMP with INC staging

op2_gen_cuda.py / op2_gen_cudaINC.py / op2_gen_cuda_color2.py / op2_gen_cuda_permute.py

CUDA variants

op2_gen_openacc.py

OpenACC

op2_gen_openmp4.py

OpenMP 4.0 device offload