Version: BSP 7.x.y

Is this page helpful?

Building Machine Learning Software with Reference Images for Yocto Project

Introduction

In this article, we will show how to integrate to the Toradex Reference Images for Yocto Project Software the following AI runtimes:

Toradex BSP Version	meta-imx-ml version	AI Runtimes
Quarterly: 7.0.0	Based on NXP BSP L6.6.36_2.1.0 Download documentation (requires login)	TensorFlow Lite v2.16.2 Onnxruntime 1.17.1 (CPU only) Pytorch 2.0.0 (CPU only) OpenCV 4.6.0 (CPU only)

NXP eIQ

NXP eIQ software provides the basis for Machine Learning application optimized for i.MX SoCs, enabling Neural Network acceleration on NXP SoCs on the GPU or NPU through the OpenVX backend.

When executing inference on Cortex-A cores, NXP eIQ inference engines support multi-threaded execution.

You can find more detailed information on the features of eIQ for each specific version on the i.MX Machine Learning User's Guide available on the NXP's Embedded Linux Documentation.

TensorFlow Lite

As stated in the TensorFlow Lite Documentation:

TensorFlow Lite is a set of tools that enables on-device machine learning by helping developers run their models on mobile, embedded, and IoT devices.

In order to execute TensorFlow models with TensorFlow Lite, you need to use the TensorFlow Lite Converter. The TensorFlow Lite version needs to match the TensorFlow version used to design the model.

info

Not every TensorFlow model is directly convertible to TensorFlow Lite, because some TensorFlow operators (ops) do not have a TensorFlow Lite equivalent. However, in some situations, you can use a mix of TensorFlow and TensorFlow Lite ops by enabling the Select TensorFlow Ops feature. Please, see the TensorFlow Lite Documentation for more information about this feature and how to enable it.

Torizon OS

To learn how to use Tensorflow-lite with Torizon, read the following article:

Torizon Sample: Real Time Object Detection with Tensorflow Lite.

Pre-Requisites

One of the following Toradex SoM's:
- Verdin iMX8M Plus (CPU/GPU/NPU support).
- Apalis iMX8 (CPU/GPU support).
- Colibri iMX8QXP (CPU/GPU support).
- Other i.MX-based SoMs may have CPU support but are not tested. Use at your own risk.
A compatible Carrier Board.
Read the Build a Reference Image with Yocto Project article.

Adding eIQ recipes to Reference Images for Yocto Project

Cloning the Toradex BSP repository

In an empty directory, use git-repo to obtain the Toradex BSP on the version 7.0.0, as explained in the section First-time Configuration of the Build a Reference Image with Yocto Project article:

info

To improve the comprehension of this article, we will create a directory named ~/yocto-ml-build.

$ mkdir -p ~/yocto-ml-build/bsp-toradex
$ cd ~/yocto-ml-build/bsp-toradex
$ repo init -u git://git.toradex.com/toradex-manifest.git -b refs/tags/7.0.0 -m tdxref/default.xml
$ repo sync

Source the file export to setup the environment. On the first invocation, this also copies a sample configuration to build/conf/*.conf

$ . export

Getting eIQ

eIQ is provided on a Yocto layer called meta-imx/meta-imx-ml.

info

The next steps expect the current directory to be <project-folder>/build.

Git clone the meta-imx repository to your project directory:

$ git clone --depth 1 -b scarthgap-6.6.36-2.1.0 https://github.com/nxp-imx/meta-imx.git ../meta-imx

Copying the Recipes to your environment

First, create a layer named meta-imx-ml, add it to your environment and remove the example recipe:

info

This step may fail due to missing packages in your computer:
ERROR: The following required tools (as specified by HOSTTOOLS) appear to be unavailable in PATH, please install them in order to proceed: lz4c
If that is the case, install the required packages and repeat this step.

$ bitbake-layers create-layer ../layers/meta-imx-ml
$ bitbake-layers add-layer ../layers/meta-imx-ml
$ rm -rf ../layers/meta-imx-ml/recipes-example

Copy the recipes from meta-imx to your layer:

$ cp -r ../meta-imx/meta-imx-ml/recipes-* ../layers/meta-imx-ml/

Adjust Dependency for OpenCL

OpenCL is only used for GPU accelerated inference in the i.MX95. This is not needed for the i.MX8 modules, therefore an override needs to be added in order to remove this dependency from the tensorflow-lite recipe.

$ sed -i '/^RDEPENDS_OPENCL/s|= "opencl-icd-loader-dev"|= ""\nRDEPENDS_OPENCL:mx95-nxp-bsp = "opencl-icd-loader-dev"|' ../layers/meta-imx-ml/recipes-libraries/tensorflow-lite/tensorflow-lite_2.16.2.bb

Adding the recipes to your distribution

Add the meta-imx-ml recipes and some image processing libraries to your image:

$ echo 'IMAGE_INSTALL:append = " tensorflow-lite tensorflow-lite-vx-delegate opencv python3-pillow adwaita-icon-theme "' >> conf/local.conf

In order to build the image a little bit faster, for now, we will remove the Qt packages. Keep it if you are planning to use Qt in your image.

$ echo 'IMAGE_INSTALL:remove = " packagegroup-tdx-qt5 wayland-qtdemo-launch-cinematicexperience "' >> conf/local.conf

Configuring the Machine

If you want to build for a machine based on an NXP SoM, some downloads require you to read and accept the NXP/Freescale EULA available in <project-folder>/layers/meta-freescale/EULA.

You have to state your acceptance by appending the following line to your <project-folder>/build/conf/local.conf file:

ACCEPT_FSL_EULA = "1"

Select the SoM in <project-folder>/build/conf/local.conf by uncommenting (removing the # in) the line corresponding to your SoM. For example, for the Verdin iMX8MP:

MACHINE ?= "verdin-imx8mp"

Building

Build the tdx-reference-multimedia-image image for your target SoM as explained on the Build a Reference Image with Yocto Project article:

$ bitbake tdx-reference-multimedia-image

info

In some situations of internet or server instability, trying to build may fail with:
do_fetch: Fetcher failure for URL:
In most cases, this issue is solved by re-trying to build.

Building with reduced RAM usage

Your computer may run out of RAM while compiling some packages. To reduce the RAM usage, limit the number of threads used by BitBake and Make.

Flashing the image

To flash your image to the board, see the Quickstart Guide for your SoM.

Executing Demos

NXP provides an inference example, supporting CPU, GPU, and NPU.

To execute it, cd to the example's directory:

# cd /usr/bin/tensorflow-lite-2.16.2/examples/

This demo will take an arbitrary picture (grace_hopper.bmp) as an input of an image classification neural network based on Mobilenet V1 (224x224 input size). See more information about this demo on the NXP's i.MX Machine Learning User's Guide.

To run the demo:

Verdin iMX8M Plus

# USE_GPU_INFERENCE=0 ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt --external_delegate_path=/usr/lib/libvx_delegate.so

iMX95 Verdin EVK

# ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt --external_delegate_path=/usr/lib64/libneutron_delegate.so

Verdin iMX8M Plus

# USE_GPU_INFERENCE=1 ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt --external_delegate_path=/usr/lib/libvx_delegate.so

iMX95 Verdin EVK

# ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt --use_gpu=true --gpu_backend=cl --gpu_precision_loss_allowed=true --gpu_experimental_enable_quant=true --gpu_inference_for_sustained_speed=false

# ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt

See below a comparison of Inference Time executing this demo:

Som	HW accelerator	Inference Time	FPS (1/Inference Time)
Verdin iMX8M Plus	None (CPU)	34.40 ms	29.07 fps
Verdin iMX8M Plus	GPU	169.64 ms	5.89 fps
Verdin iMX8M Plus	NPU	3.05 ms	327.87 fps
Apalis iMX8Q Max	None (CPU)	38.87 ms	25.73 fps
Apalis iMX8Q Max	GPU	14.46 ms	69.16 fps
Colibri iMX8QX Plus	None (CPU)	87.83 ms	11.39 fps
Colibri iMX8QX Plus	GPU	125.93 ms	7.94 fps

Alternatively, you can run the same example using a Python implementation:

# USE_GPU_INFERENCE=0 python3 label_image.py -e /usr/lib/libvx_delegate.so

# USE_GPU_INFERENCE=1 python3 label_image.py -e /usr/lib/libvx_delegate.so

# USE_GPU_INFERENCE=0 python3 label_image.py

info

As explained on the NXP's Application Note AN12964, the i.MX 8M Plus SoC requires an Warmup Time of about 7 seconds to initiate before delivering its expected high performance. You will observe this extra time when starting an application with NPU support.

Additional Resources

See the version-specific NXP's i.MX Machine Learning User's Guide for more information about eIQ.

Building Machine Learning Software with Reference Images for Yocto Project

Introduction​

NXP eIQ​

TensorFlow Lite​

Torizon OS​

Pre-Requisites​

Adding eIQ recipes to Reference Images for Yocto Project​

Cloning the Toradex BSP repository​

Getting eIQ​

Copying the Recipes to your environment​

Adjust Dependency for OpenCL​

Adding the recipes to your distribution​

Configuring the Machine​

Building​

Building with reduced RAM usage​

Flashing the image​

Executing Demos​

Additional Resources​

Introduction

NXP eIQ

TensorFlow Lite

Torizon OS

Pre-Requisites

Adding eIQ recipes to Reference Images for Yocto Project

Cloning the Toradex BSP repository

Getting eIQ

Copying the Recipes to your environment

Adjust Dependency for OpenCL

Adding the recipes to your distribution

Configuring the Machine

Building

Building with reduced RAM usage

Flashing the image

Executing Demos

Additional Resources