Skip to main content
Version: 5.0

How to use OpenCL 1.2 in iMX8 on Torizon

Introduction​

Torizon features a container runtime, Debian container images and a deb package repository that greatly eases the development process for embedded applications. In this article, we will show how you can install and test OpenCL libraries optimized for GPUs available on NXP i.MX8/8X SoCs to integrate into your application. We will also obtain, build, and run an OpenCL benchmarking tool to check if the software is set up correctly and observe the GPU performance.

caution

as of February 2022, it was identified that this demo fails to run with a runtime error "clGetPlatformIDs (-1001) no platforms found".

Prerequisites​

info

Apalis iMX8X is phased out, and it is not available for purchase anymore. The latest supported BSP and TorizonCore version is 5.4.0.

Dockerfile explained​

Image base​

Toradex provides a basic Wayland image in its dockerhub page. You need to add the torizon/wayland-base-vivante to your image. It contains the repository package. You also need to get the Vivante's OpenCL Debian package for Torizon.

Building clpeak​

Clpeak is a benchmarking tool to measure the peak capabilities of OpenCL devices. We will build it from source for our system.

Run clpeak​

In this demo Dockerfile, we will run get the built clpeak from the previous stages and use it as an entry point.

Complete Dockerfile​

Now the full Dockerfile implementation should look something like this. The next section will show how to build and run this as a container.

Dockerfile instructions​

To get the most out of this article it is recommended you clone the source from our samples github repository.

$ git clone -b bullseye https://github.com/toradex/torizon-samples.git
$ cd torizon-samples/opencl

To build​

Inside the opencl directory that contains the Dockerfile on the host PC, build the image:

$ docker build -t <your-dockerhub-username>/opencl-image .

After the build, push the image to your Dockerhub account:

$ docker push <your-dockerhub-username>/opencl-image

To run​

First, pull it from your dockerhub account to the board. In the terminal of your board:

caution

These instructions assumes that the dockerhub credentials are already set up on the board. If you did not setup your credentials yet, execute docker login

# docker pull <your-dockerhub-username>/opencl-image

After the pull, run a container based on the image.

danger

Please, note that by executing the following line you are accepting the NXP's terms and conditions of the End-User License Agreement (EULA)

# docker run -e ACCEPT_FSL_EULA=1 -it --rm --name=clpeak-container --net=host --cap-add CAP_SYS_TTY_CONFIG \
-v /dev:/dev -v /tmp:/tmp -v /run/udev/:/run/udev/ \
--device-cgroup-rule='c 4:* rmw' --device-cgroup-rule='c 13:* rmw' --device-cgroup-rule='c 199:* rmw' --device-cgroup-rule='c 226:* rmw' \
<your-dockerhub-username>/opencl-image

Expected Output​

This is the expected output from an Apalis iMX8 board:

Output
Platform: Vivante OpenCL Platform
Device: Vivante OpenCL Device GC7000XSVX.6009.0000
Driver version : OpenCL 1.2 V6.2.4.p4.190076 (Linux ARM64)
Compute units : 1
Clock frequency : 996 MHz

Global memory bandwidth (GBPS)
float : 5.81
float2 : 9.74
float4 : 10.63
float8 : 9.36
float16 : 8.00

Single-precision compute (GFLOPS)
float : 14.14
float2 : 28.18
float4 : 55.87
float8 : 62.15
float16 : 61.45

No half precision support! Skipped

No double precision support! Skipped

Integer compute (GIOPS)
int : 14.13
int2 : 14.09
int4 : 15.84
int8 : 15.73
int16 : 14.54

Transfer bandwidth (GBPS)
enqueueWriteBuffer : 1.43
enqueueReadBuffer : 0.08
enqueueMapBuffer(for read) : 301.68
memcpy from mapped ptr : 0.08
enqueueUnmap(after write) : 269.08
memcpy to mapped ptr : 1.43

Kernel launch latency : 97.56 us

Device: Vivante OpenCL Device GC7000XSVX.6009.0000
Driver version : OpenCL 1.2 V6.2.4.p4.190076 (Linux ARM64)
Compute units : 1
Clock frequency : 996 MHz

Global memory bandwidth (GBPS)
float : 5.59
float2 : 9.38
float4 : 10.33
float8 : 9.15
float16 : 7.85

Single-precision compute (GFLOPS)
float : 14.14
float2 : 28.18
float4 : 55.87
float8 : 62.15
float16 : 61.45

No half precision support! Skipped

No double precision support! Skipped

Integer compute (GIOPS)
int : 14.13
int2 : 14.09
int4 : 15.84
int8 : 15.73
int16 : 14.54

Transfer bandwidth (GBPS)
enqueueWriteBuffer : 1.42
enqueueReadBuffer : 0.08
enqueueMapBuffer(for read) : 238.44
memcpy from mapped ptr : 0.08
enqueueUnmap(after write) : 207.03
memcpy to mapped ptr : 1.44

Kernel launch latency : 126.82 us
Send Feedback!