Skip to main content
Version: 6

How to use OpenCL 1.2 in iMX8 on Torizon

Introduction​

Torizon features a container runtime, Debian container images and a deb package repository that greatly eases the development process for embedded applications.

In this article, we will show how you can install and test OpenCL libraries optimized for GPUs available on NXP i.MX8/8X/8M Plus SoCs to integrate into your application. We will also obtain, build, and run OpenCL info and benchmarking tools to check if the software is set up correctly and observe the GPU performance.

Prerequisites​

Demo explained​

The full Dockerfile implementation is available in our samples repository on GitHub. In this section, some key aspects of it are explained.

Dockerfile​

Toradex provides Debian Containers for Torizon, which come with a Toradex-specific Debian package where you can find packages for the GPU that are not available in the Debian community feeds.

You can choose the one that is more suitable for your project's needs. For example, the sample is configured to use the Debian base, which is the smallest container variation we provide, and it is great for headless applications. In a comment on the Dockerfile, you can easily switch to a variant that comes with the Weston compositor installed, recommended for those who want to run OpenCL together with a GUI application.

The container image is built in two stages:

  • Build: the stage that cross-compiles the clpeak application, where the build time dependencies are installed, such as CMake and the OpenCL headers.
  • Deploy: the stage where the final binary, cross-compiled in the Build stage, and its runtime dependencies are installed, such as OpenCL and other libraries. It also installs the clinfo application provided by the Debian feeds.

clpeak​

clpeak is a benchmarking tool to measure the peak capabilities of OpenCL devices. It is cross-compiled from source, as explained in the previous section.

clinfo​

clinfo is a debugging tool that prints all available platforms and devices OpenCL info. It is installed from the Debian feeds.

How to use​

This section explains how to use sample container, including how to build and deploy it to the board. For more detailed instructions and deployment alternatives, read the article Deploying Container Images to TorizonCore .

Build and deploy​

Clone the source from our samples GitHub repository into your PC:

$ git clone -b bookworm https://github.com/toradex/torizon-samples.git
$ cd torizon-samples/opencl

Inside the opencl directory that contains the Dockerfile, on the host PC, build the image:

$ docker build -t <your-dockerhub-username>/opencl-image .

After the build, push the image to your Docker Hub account:

$ docker push <your-dockerhub-username>/opencl-image

Pull it from your Docker Hub account to the board. In the terminal of your board:

caution

These instructions assume that the Docker Hub credentials are already set up on the board. If you did not setup your credentials yet, execute docker login

# docker pull <your-dockerhub-username>/opencl-image

Run clpeak​

The clpeak application is configured as the entrypoint. In other words, it is the command ran by default when you start the container. To run it, execute the following command:

danger

Please, note that by executing the following line you are accepting the NXP's terms and conditions of the End-User License Agreement (EULA)

# docker run -e ACCEPT_FSL_EULA=1 -it --rm --name=clpeak-container --device /dev/galcore:/dev/galcore <your-dockerhub-username>/opencl-image

The above command allows the container to only access the GPU device /dev/galcore from the host system. If you want to run a graphical application alongside, you will need to grant additional permissions. Find them out in our Debian With Weston Wayland Compositor example, and learn more about granting hardware access and other types of permission in the Torizon Best Practices Guide.

This is the expected output from an Apalis iMX8 board:

Output
Platform: Vivante OpenCL Platform
Device: Vivante OpenCL Device GC7000XSVX.6009.0000
Driver version : OpenCL 1.2 V6.2.4.p4.190076 (Linux ARM64)
Compute units : 1
Clock frequency : 996 MHz

Global memory bandwidth (GBPS)
float : 5.81
float2 : 9.74
float4 : 10.63
float8 : 9.36
float16 : 8.00

Single-precision compute (GFLOPS)
float : 14.14
float2 : 28.18
float4 : 55.87
float8 : 62.15
float16 : 61.45

No half precision support! Skipped

No double precision support! Skipped

Integer compute (GIOPS)
int : 14.13
int2 : 14.09
int4 : 15.84
int8 : 15.73
int16 : 14.54

Transfer bandwidth (GBPS)
enqueueWriteBuffer : 1.43
enqueueReadBuffer : 0.08
enqueueMapBuffer(for read) : 301.68
memcpy from mapped ptr : 0.08
enqueueUnmap(after write) : 269.08
memcpy to mapped ptr : 1.43

Kernel launch latency : 97.56 us

Device: Vivante OpenCL Device GC7000XSVX.6009.0000
Driver version : OpenCL 1.2 V6.2.4.p4.190076 (Linux ARM64)
Compute units : 1
Clock frequency : 996 MHz

Global memory bandwidth (GBPS)
float : 5.59
float2 : 9.38
float4 : 10.33
float8 : 9.15
float16 : 7.85

Single-precision compute (GFLOPS)
float : 14.14
float2 : 28.18
float4 : 55.87
float8 : 62.15
float16 : 61.45

No half precision support! Skipped

No double precision support! Skipped

Integer compute (GIOPS)
int : 14.13
int2 : 14.09
int4 : 15.84
int8 : 15.73
int16 : 14.54

Transfer bandwidth (GBPS)
enqueueWriteBuffer : 1.42
enqueueReadBuffer : 0.08
enqueueMapBuffer(for read) : 238.44
memcpy from mapped ptr : 0.08
enqueueUnmap(after write) : 207.03
memcpy to mapped ptr : 1.44

Kernel launch latency : 126.82 us

Run clinfo​

To run the clinfo application, you must set it as the entrypoint in the docker run command. If you ran clpeak in the previous section, a small change is required:

danger

Please, note that by executing the following line you are accepting the NXP's terms and conditions of the End-User License Agreement (EULA)

# docker run -e ACCEPT_FSL_EULA=1 -it --rm --name=clpeak-container --device /dev/galcore:/dev/galcore --entrypoint=clinfo <your-dockerhub-username>/opencl-image

This is the expected output from a Colibri iMX8X board:

Output
clinfo: /usr/lib/aarch64-linux-gnu/libOpenCL.so.1: no version information available (required by clinfo)
Number of platforms 1
Platform Name Vivante OpenCL Platform
Platform Vendor Vivante Corporation
Platform Version OpenCL 1.2 V6.4.0.p2.234062
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd
Platform Extensions function suffix viv

Platform Name Vivante OpenCL Platform
Number of devices 1
Device Name Vivante OpenCL Device GC7000L.6214.0000
Device Vendor Vivante Corporation
Device Vendor ID 0x564956
Device Version OpenCL 1.2
Driver Version OpenCL 1.2 V6.4.0.p2.234062
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 1
Max clock frequency 850MHz
Device Partition (core)
Max number of sub-devices 0
Supported partition types (n/a)
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 1024
colibri-imx8x-06787861:~$ docker run -e ACCEPT_FSL_EULA=1 -it --rm --name=clpeak-container --device /dev/galcore:/dev/galcore --entrypoint=clinfo torizon-opencl-sample-debian:latest
clinfo: /usr/lib/aarch64-linux-gnu/libOpenCL.so.1: no version information available (required by clinfo)
Number of platforms 1
Platform Name Vivante OpenCL Platform
Platform Vendor Vivante Corporation
Platform Version OpenCL 1.2 V6.4.0.p2.234062
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd
Platform Extensions function suffix viv

Platform Name Vivante OpenCL Platform
Number of devices 1
Device Name Vivante OpenCL Device GC7000L.6214.0000
Device Vendor Vivante Corporation
Device Vendor ID 0x564956
Device Version OpenCL 1.2
Driver Version OpenCL 1.2 V6.4.0.p2.234062
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 1
Max clock frequency 850MHz
Device Partition (core)
Max number of sub-devices 0
Supported partition types (n/a)
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 1024
Preferred work group size multiple (kernel) 16
Preferred / native vector sizes
char 4 / 4
short 4 / 4
int 4 / 4
long 4 / 4
half 0 / 0 (cl_khr_fp16)
float 4 / 4
double 0 / 0 (n/a)
Half-precision Floating-point support <printDeviceInfo:86: get CL_DEVICE_HALF_FP_CONFIG : error -30>
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (n/a)
Address bits 32, Little-Endian
Global memory size 268435456 (256MiB)
Error Correction support Yes
Max memory allocation 134217728 (128MiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 2048 bits (256 bytes)
Global Memory cache type Read/Write
Global Memory cache size 8192 (8KiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 65536 pixels
Max 1D or 2D image array size 8192 images
Max 2D image size 8192x8192 pixels
Max 3D image size 8192x8192x8192 pixels
Max number of read image args 128
Max number of write image args 8
Local memory type Global
Local memory size 32768 (32KiB)
Max number of constant args 9
Max constant buffer size 65536 (64KiB)
Max size of kernel argument 1024
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1000ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
printf() buffer size 1048576 (1024KiB)
Built-in kernels (n/a)
Device Extensions cl_khr_byte_addressable_store cl_khr_gl_sharing cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics

NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [viv]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name Vivante OpenCL Platform
Device Name Vivante OpenCL Device GC7000L.6214.0000
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name Vivante OpenCL Platform
Device Name Vivante OpenCL Device GC7000L.6214.0000
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name Vivante OpenCL Platform
Device Name Vivante OpenCL Device GC7000L.6214.0000


Send Feedback!