Search by Tags

Building Machine Learning Software with Toradex Reference Images for Yocto Project

 

Article updated at 29 Apr 2021
Subscribe for this article updates

Introduction

NXP eIQ software provides the enablement software for Machine Learning application optimized for i.MX SoCs.

eIQ offers Neural Network acceleration on NXP SoCs on the GPU or NPU through the usage of OpenVX as backend. Also, when executing inference on Cortex-A cores, NXP eIQ inference engines support multi-threaded execution.

eIQ is provided on a Yocto layer called meta-imx/meta-ml.

In this article, we will show how to integrate to the Toradex Reference Images for Yocto Project Software the following AI runtimes:

Toradex BSP Version meta-ml version AI Runtimes
Quarterly:
5.2.0
Based on NXP BSP L5.4.70-2.3.0
Download documentation (requires login)
TensorFlow Lite v2.3.1
ONNX Runtime 1.1.2
OpenCV 4.4.0

The eIQ software based on NXP BSP L5.4.70-2.3.1 also offer support for the following AI Frameworks which we will add instructions soon:

  • PyTorch 1.6.0
  • Arm Compute Library 20.02.01
  • Arm NN 20.01

All the AI Runtimes (except OpenCV, as documented on the i. MX Machine Learning User's Guide) provided by eIQ supports OpenVX (GPU/NPU) on its backend.

You can find more detailed information on the features of eIQ for each specific version on the i.MX Machine Learning User's Guide available on the NXP's Embedded Linux Documentation. See the version-specific information on the links in the table above.

You can also adapt the instructions to build on newer versions of BSP / meta-ml. However, we didn't test it at the time of this writing.

Pre-Requisites

Adding eIQ recipes to Reference Images for Yocto Project

Clone the Toradex BSP repository

First, create a directory in your home named yocto-ml-build and use repo to obtain the Toradex BSP on the version 5.2.0, as explained in the section First-time Configuration of the Build a Reference Image with Yocto Project article:

Note: To facilitate the comprehension of this article, we will create a directory in home called ~/yocto-ml-build. You can, of course, use any name you want.

$ mkdir -p ~/yocto-ml-build/bsp-toradex && cd ~/yocto-ml-build/bsp-toradex
$ repo init -u https://git.toradex.com/toradex-manifest.git -b refs/tags/5.2.0 -m tdxref/default.xml
$ repo sync

Note: At the time of this writing, we tested the building with the latest monthly release of the Toradex BSP Layers and Reference Images for Yocto Project Software, version 5.2.0. You can use these instructions to build the meta-ml with newer BSP versions; however, we didn't test it.

Getting eIQ

Git clone the meta-imx repository to your ~/yocto-ml-build/ directory:

$ git clone -b zeus-5.4.70-2.3.1 git://source.codeaurora.org/external/imx/meta-imx ~/yocto-ml-build/meta-imx

Copying the Recipes to your environment

First, create a layer named meta-ml, add it to your environment and remove the example recipe:

$ bitbake-layers create-layer ../layers/meta-ml
$ bitbake-layers add-layer ../layers/meta-ml
$ rm -rf ../layers/meta-ml/recipes-example

Copy the recipes from meta-imx to your layer.

$ cp -r ../../meta-imx/meta-ml/recipes-* ../layers/meta-ml/
$ cp -r ../../meta-imx/meta-ml/classes/ ../layers/meta-ml/
$ cp -r ../../meta-imx/meta-bsp/recipes-support/opencv ../layers/meta-ml/recipes-libraries/

This version of meta-ml targets a version of Open-Embedded slightly different than the one of Toradex BSP version 5.2.0. In order to build the tensorflow-lite python API with the BSP 5.2.0, you need to make some adjustments on the python3native recipe:

$ echo 'export _PYTHON_SYSCONFIGDATA_NAME="_sysconfigdata"' >> ../layers/openembedded-core/meta/classes/python3native.bbclass
$ sed -i 's/inherit siteinfo python3native/inherit siteinfo python3native setuptools3/g' ../layers/openembedded-core/meta/classes/meson.bbclass

Adding the recipes to your distribution

Add the meta-ml recipes to your image:

$ echo 'IMAGE_INSTALL_append += "tensorflow-lite onnxruntime "' >> conf/local.conf

Add some image processing libraries to be able to execute additional image manipulations such as resize, crop, etc.:

$ echo 'IMAGE_INSTALL_append += "opencv python3-pillow adwaita-icon-theme "' >> conf/local.conf

In order to build the image a little bit faster, for now, we will remove the Qt packages. Keep it if you are planning to use Qt in your image.

$ echo 'IMAGE_INSTALL_remove += "packagegroup-tdx-qt5 wayland-qtdemo-launch-cinematicexperience "' >> conf/local.conf

Building

Build the tdx-reference-multimedia-image image for your target SoM as explained on the Build a Reference Image with Yocto Project article.

Note: In some situations of internet or server instability, the building may fail to clone some repository with an error similar to: do_fetch: Fetcher failure for URL:. In most cases, this issue will be resolved by re-trying the building.

Flashing the image

To flash your image to the board, see the Quickstart Guide for your SoM.

Executing Demos

NXP provides an example for executing inference with and without GPU/NPU support. You can compare the inference time of each.

To execute it, cd to the example's directory:

# cd /usr/bin/tensorflow-lite-2.3.1/examples/

This demo will take an arbitrary picture (grace_hopper.bmp) as an input of an image classification neural network based on Mobilenet V1 (224x224 input size). See more information about this demo on the NXP's i.MX Machine Learning User's Guide

To execute the demo:

# USE_GPU_INFERENCE=0 ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt -a 1
# USE_GPU_INFERENCE=1 ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt -a 1
# ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt

Alternatively, you can run the same example using a Python implementation:

# python3 label_image.py

Please, be aware of the following limitation for the Python implementation, as stated on the NXP's i.MX Machine Learning User's Guide:

The TensorFlow Lite Python API does not contain functions for switching between execution on CPU and GPU/NPU hardware accelerator. By default, GPU/NPU hardware accelerator is used for hardware acceleration. The backend selection depends on the availability of the libneuralnetworks.so or libneuralnetworks.so.1 in the /usr/lib directory. If the library is found by the shared library search mechanism, then the GPU/NPU backend is used.

Therefore, if you want to evaluate the Python script without GPU/NPU support, rename the /usr/lib/libneuralnetworks.so.1 to /usr/lib/libneuralnetworks.so.1.backup and execute it again.

Note: As explained on the NXP's Application Note AN12964, the i.MX 8M Plus SoC requires an Warmup Time of about 7 seconds to initiate before delivering its expected high performance. You will observe this extra time when starting an application with NPU support.

Additional Resources

See the version-specific NXP's i.MX Machine Learning User's Guide for more information about eIQ enablement.