AI, Computer Vision and Machine Learning on Toradex i.MX 95-based Modules

Introduction

In this article, you will learn how to set up a build of Torizon OS Minimal with Yocto Walnascar, for enablement of NPU acceleration on i.MX 95-based modules. Note that NPU acceleration support is currently available for Aquila, Verdin, and SMARC iMX95 devices based on B0 silicon, and limited to the Walnascar release.

warning

Torizon OS does not have a regular release schedule for walnascar. Therefore, please expect that the torizon/default.xml manifest won't be updated regularly.

Read more about engineering pre-releases on the Embedded Linux Support Strategy article.

Prerequisites

Download the NXP eIQ toolkit using the link below. Please note you have to be logged in to have access:

NXP eIQ toolkit

Configure and Build the Image

Initialize and synchronize the walnascar manifest using the following commands:

$ repo init -u git://git.toradex.com/toradex-manifest.git -b walnascar -m torizon/default.xml
$ repo sync -j 10

Source the setup-environment file using the command below, replacing <family> with the corresponding family name in aquila, verdin, or toradex-smarc:
```
$ MACHINE=<family>-imx95 . setup-environment imx95-build
```
Add the NXP Machine Learning dependencies from the vendor package group:
conf/local.conf
```
$ IMAGE_INSTALL:append = " packagegroup-fsl-ml"
```

Add meta-multimedia to conf/bblayers.conf:

conf/bblayers.conf
 # These layers hold recipe metadata not found in OE-core, but lack any machine or distro content
 BASELAYERS ?= " \
   ${OEROOT}/layers/meta-openembedded/meta-oe \
   ${OEROOT}/layers/meta-openembedded/meta-networking \
   ${OEROOT}/layers/meta-openembedded/meta-filesystems \
   ${OEROOT}/layers/meta-openembedded/meta-python \
   ${OEROOT}/layers/meta-openembedded/meta-perl \
+  ${OEROOT}/layers/meta-openembedded/meta-multimedia \
   ${OEROOT}/layers/meta-virtualization \
   ${OEROOT}/layers/meta-updater \
   ${OEROOT}/layers/meta-cyclonedx \
 "

Finally, build your custom image based on the Torizon Minimal image, which includes the eIQ Machine Learning software and its dependencies:
```
$ bitbake torizon-minimal
```
Optimize Yocto Build RAM and CPU Usage
Your computer may run out of RAM while compiling some packages (such as Qt, in the tdx-reference-multimedia-image).
To reduce RAM usage, set the environment variables PARALLEL_MAKE and BB_NUMBER_THREADS to a limited number of threads. make.

PARALLEL_MAKE: Number of threads used by make

BB_NUMBER_THREADS: Number of threads used by bitbake. Setting BB_NUMBER_THREADS also limits the number of download and configuration threads.

$ PARALLEL_MAKE="-j 4" BB_NUMBER_THREADS="6" bitbake <image>
warning
If a build fails due to lack of RAM, some files could be corrupted. Trying to build again may not solve this issue. In this situation:

Delete the corrupted package state

Delete all temporary files

Attempt to build the image again

After the build completes, you can flash the generated image to your SoM using Toradex Easy Installer.

Convert the TensorFlow Model to Neutron

The i.MX 95 Neutron NPU requires Machine Learning models to be specifically targeted for its architecture. Follow the steps below to convert a TensorFlow Lite model using the eIQ Neutron SDK:

Create a new directory and extract eIQ Neutron Toolkit:

$ mkdir ~/eiq
$ cd ~/eiq
$ mv ~/Downloads/eiq-neutron-sdk-linux-3.0.1.zip .
$ unzip eiq-neutron-sdk-linux-3.0.1.zip

Boot up the image built on the Configure and Build the Image section, and transfer the model mobilenet_v1_1.0_224_quant.tflite from /usr/bin/tensorflow-lite-2.19.0/examples to your host machine with the following command:
```
$ scp torizon@<device-ip>:/usr/bin/tensorflow-lite-2.19.0/examples/mobilenet_v1_1.0_224_quant.tflite .
```

Convert the generic TensorFlow Lite model to an i.MX 95-specific model using eIQ Toolkit:

$ ./bin/neutron-converter --input mobilenet_v1_1.0_224_quant.tflite --output mobilenet_v1_1.0_224_quant_converted.tflite --target imx95

Transfer the converted file, mobilenet_v1_1.0_224_quant_converted.tflite, back to the device:
```
$ scp mobilenet_v1_1.0_224_quant_converted.tflite torizon@<device-ip>:~/
```

On the device, go into /usr/bin/tensorflow-lite-2.19.0/examples and run the benchmark application to ensure the conversion was successful:

# sudo ./benchmark_model --graph=/home/torizon/mobilenet_v1_1.0_224_quant_converted.tflite --num_runs=1000 --external_delegate_path=/usr/lib/libneutron_delegate.so

If the model conversion was successful, you should see an output like the following:

INFO: STARTING!                                                                                                                                                                                                    
INFO: Log parameter values verbosely: [0]                                                                                                                                                                          
INFO: Min num runs: [1000]                                                                                                                                                                                         
INFO: Graph: [/home/torizon/mobilenet_v1_1.0_224_quant_converted.tflite]                                                                                                                                           
INFO: Signature to run: []                                                                                                                                                                                         
INFO: External delegate path: [/usr/lib/libneutron_delegate.so]                                                                                                                                                    
INFO: Loaded model /home/torizon/mobilenet_v1_1.0_224_quant_converted.tflite                                                                                                                                       
INFO: EXTERNAL delegate created.                                                                                                                                                                                   
INFO: NeutronDelegate delegate: 1 nodes delegated out of 3 nodes with 1 partitions.                                                                                                                                
                                                                                                                                                                                                                   
INFO: Neutron delegate version: v1.0.0-f24d08e5, zerocp enabled.                                                                                                                                                   
INFO: Explicitly applied EXTERNAL delegate, and the model graph will be partially executed by the delegate w/ 1 delegate kernels.                                                                                  
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.                                                                                                                                                            
INFO: The input model file size (MB): 4.33426                                                                                                                                                                      
INFO: Initialized session in 9.328ms.                                                                                                                                                                              
INFO: Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.                                                                                                 
INFO: count=389 first=1225 curr=1282 min=1109 max=1422 avg=1218.38 std=62 p5=1141 median=1195 p95=1327                                                                                                             
                                                                                                                                                                                                                   
INFO: Running benchmark for at least 1000 iterations and at least 1 seconds but terminate if exceeding 150 seconds.                                                                                                
INFO: count=1000 first=1224 curr=1325 min=1171 max=1560 avg=1290.35 std=34 p5=1263 median=1281 p95=1338                                                                                                            
                                                                                                                                                                                                                   
INFO: Inference timings in us: Init: 9328, First inference: 1225, Warmup (avg): 1218.38, Inference (avg): 1290.35                                                                                                  
INFO: Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.              
INFO: Memory footprint delta from the start of the tool (MB): init=7.17578 overall=7.30078

Next Steps

After following the steps presented in this article, you should now have a working setup with hardware acceleration for Machine Learning on the Toradex i.MX 95-based modules. For more information on developing and deploying Machine Learning applications, refer to Build Machine Learning Applications with NXP eIQ Software.

Toradex provides additional resources to help you get started with Machine Learning on our System-on-Modules:

AI - Computer Vision - Machine Learning on Toradex Computer on Modules Documentation Overview

Additional Resources

NXP's i.MX Machine Learning User's Guide

Introduction​

Prerequisites​

Configure and Build the Image​

Convert the TensorFlow Model to Neutron​

Next Steps​

Additional Resources​