AI, Computer Vision and Machine Learning on Toradex i.MX 95-based Modules
Introduction
In this article, you will learn how to set up a build of Torizon OS Minimal with Yocto Walnascar, for enablement of NPU acceleration on i.MX 95-based modules. Note that NPU acceleration support is currently available for Aquila, Verdin, and SMARC iMX95 devices based on B0 silicon, and limited to the Walnascar release.
Torizon OS does not have a regular release schedule for walnascar. Therefore, please expect that the torizon/default.xml manifest won't be updated regularly.
Prerequisites
Download the NXP eIQ toolkit using the link below. Please note you have to be logged in to have access:
Configure and Build the Image
-
Initialize and synchronize the
walnascarmanifest using the following commands:$ repo init -u git://git.toradex.com/toradex-manifest.git -b scarthgap-7.x.y -m torizon/default.xml
$ repo sync -j 10 -
Source the
setup-environmentfile using the command below, replacing<family>with the corresponding family name inaquila,verdin, ortoradex-smarc:$ MACHINE=<family>-imx95 . setup-environment imx95-build -
Add the NXP Machine Learning dependencies from the vendor package group:
conf/local.conf$ IMAGE_INSTALL:append = " packagegroup-fsl-ml" -
Add
meta-multimediatoconf/bblayers.conf:conf/bblayers.conf# These layers hold recipe metadata not found in OE-core, but lack any machine or distro content
BASELAYERS ?= " \
${OEROOT}/layers/meta-openembedded/meta-oe \
${OEROOT}/layers/meta-openembedded/meta-networking \
${OEROOT}/layers/meta-openembedded/meta-filesystems \
${OEROOT}/layers/meta-openembedded/meta-python \
${OEROOT}/layers/meta-openembedded/meta-perl \
+ ${OEROOT}/layers/meta-openembedded/meta-multimedia \
${OEROOT}/layers/meta-virtualization \
${OEROOT}/layers/meta-updater \
${OEROOT}/layers/meta-cyclonedx \
" -
Finally, build your custom image based on the Torizon Minimal image, which includes the eIQ Machine Learning software and its dependencies:
$ bitbake torizon-minimalOptimize Yocto Build RAM and CPU Usage
Your computer may run out of RAM while compiling some packages (such as Qt, in the
tdx-reference-multimedia-image).To reduce RAM usage, set the environment variables
PARALLEL_MAKEandBB_NUMBER_THREADSto a limited number of threads.make.PARALLEL_MAKE: Number of threads used bymakeBB_NUMBER_THREADS: Number of threads used bybitbake. SettingBB_NUMBER_THREADSalso limits the number of download and configuration threads.
$ PARALLEL_MAKE="-j 4" BB_NUMBER_THREADS="6" bitbake <image>warningIf a build fails due to lack of RAM, some files could be corrupted. Trying to build again may not solve this issue. In this situation:
- Delete the corrupted package
state - Delete all temporary files
- Attempt to build the image again
After the build completes, you can flash the generated image to your SoM using Toradex Easy Installer.
Convert the TensorFlow Model to Neutron
The i.MX 95 Neutron NPU requires Machine Learning models to be specifically targeted for its architecture. Follow the steps below to convert a TensorFlow Lite model using the eIQ Neutron SDK:
-
Create a new directory and extract eIQ Neutron Toolkit:
$ mkdir ~/eiq
$ cd ~/eiq
$ mv ~/Downloads/eiq-neutron-sdk-linux-3.0.1.zip .
$ unzip eiq-neutron-sdk-linux-3.0.1.zip -
Boot up the image built on the Configure and Build the Image section, and transfer the model
mobilenet_v1_1.0_224_quant.tflitefrom/usr/bin/tensorflow-lite-2.19.0/examplesto your host machine with the following command:$ scp torizon@<device-ip>:/usr/bin/tensorflow-lite-2.19.0/examples/mobilenet_v1_1.0_224_quant.tflite . -
Convert the generic TensorFlow Lite model to an i.MX 95-specific model using eIQ Toolkit:
$ ./bin/neutron-converter --input mobilenet_v1_1.0_224_quant.tflite --output mobilenet_v1_1.0_224_quant_converted.tflite --target imx95 -
Transfer the converted file,
mobilenet_v1_1.0_224_quant_converted.tflite, back to the device:$ scp mobilenet_v1_1.0_224_quant_converted.tflite torizon@<device-ip>:~/ -
On the device, go into
/usr/bin/tensorflow-lite-2.19.0/examplesand run the benchmark application to ensure the conversion was successful:# sudo ./benchmark_model --graph=/home/torizon/mobilenet_v1_1.0_224_quant_converted.tflite --num_runs=1000 --external_delegate_path=/usr/lib/libneutron_delegate.so
If the model conversion was successful, you should see an output like the following:
INFO: STARTING!
INFO: Log parameter values verbosely: [0]
INFO: Min num runs: [1000]
INFO: Graph: [/home/torizon/mobilenet_v1_1.0_224_quant_converted.tflite]
INFO: Signature to run: []
INFO: External delegate path: [/usr/lib/libneutron_delegate.so]
INFO: Loaded model /home/torizon/mobilenet_v1_1.0_224_quant_converted.tflite
INFO: EXTERNAL delegate created.
INFO: NeutronDelegate delegate: 1 nodes delegated out of 3 nodes with 1 partitions.
INFO: Neutron delegate version: v1.0.0-f24d08e5, zerocp enabled.
INFO: Explicitly applied EXTERNAL delegate, and the model graph will be partially executed by the delegate w/ 1 delegate kernels.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: The input model file size (MB): 4.33426
INFO: Initialized session in 9.328ms.
INFO: Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
INFO: count=389 first=1225 curr=1282 min=1109 max=1422 avg=1218.38 std=62 p5=1141 median=1195 p95=1327
INFO: Running benchmark for at least 1000 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
INFO: count=1000 first=1224 curr=1325 min=1171 max=1560 avg=1290.35 std=34 p5=1263 median=1281 p95=1338
INFO: Inference timings in us: Init: 9328, First inference: 1225, Warmup (avg): 1218.38, Inference (avg): 1290.35
INFO: Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
INFO: Memory footprint delta from the start of the tool (MB): init=7.17578 overall=7.30078
Next Steps
After following the steps presented in this article, you should now have a working setup with hardware acceleration for Machine Learning on the Toradex i.MX 95-based modules. For more information on developing and deploying Machine Learning applications, refer to Build Machine Learning Applications with NXP eIQ Software.
Toradex provides additional resources to help you get started with Machine Learning on our System-on-Modules: