Skip to main content

AI, Computer Vision and Machine Learning on Toradex i.MX 95-based Modules

Introduction

In this article, you will learn how to set up a build of Torizon OS Minimal with Yocto Walnascar, for enablement of NPU acceleration on i.MX 95-based modules. Note that NPU acceleration support is currently available for Aquila, Verdin, and SMARC iMX95 devices based on B0 silicon, and limited to the Walnascar release.

warning

Torizon OS does not have a regular release schedule for walnascar. Therefore, please expect that the torizon/default.xml manifest won't be updated regularly.

Prerequisites

Download the NXP eIQ toolkit using the link below. Please note you have to be logged in to have access:

Configure and Build the Image

  1. Initialize and synchronize the walnascar manifest using the following commands:

    $ repo init -u git://git.toradex.com/toradex-manifest.git -b scarthgap-7.x.y -m torizon/default.xml
    $ repo sync -j 10
  2. Source the setup-environment file using the command below, replacing <family> with the corresponding family name in aquila, verdin, or toradex-smarc:

    $ MACHINE=<family>-imx95 . setup-environment imx95-build
  3. Add the NXP Machine Learning dependencies from the vendor package group:

    conf/local.conf
    $ IMAGE_INSTALL:append = " packagegroup-fsl-ml"
  4. Add meta-multimedia to conf/bblayers.conf:

    conf/bblayers.conf
     # These layers hold recipe metadata not found in OE-core, but lack any machine or distro content
    BASELAYERS ?= " \
    ${OEROOT}/layers/meta-openembedded/meta-oe \
    ${OEROOT}/layers/meta-openembedded/meta-networking \
    ${OEROOT}/layers/meta-openembedded/meta-filesystems \
    ${OEROOT}/layers/meta-openembedded/meta-python \
    ${OEROOT}/layers/meta-openembedded/meta-perl \
    + ${OEROOT}/layers/meta-openembedded/meta-multimedia \
    ${OEROOT}/layers/meta-virtualization \
    ${OEROOT}/layers/meta-updater \
    ${OEROOT}/layers/meta-cyclonedx \
    "
  5. Finally, build your custom image based on the Torizon Minimal image, which includes the eIQ Machine Learning software and its dependencies:

    $ bitbake torizon-minimal
    Optimize Yocto Build RAM and CPU Usage

    Your computer may run out of RAM while compiling some packages (such as Qt, in the tdx-reference-multimedia-image).

    To reduce RAM usage, set the environment variables PARALLEL_MAKE and BB_NUMBER_THREADS to a limited number of threads. make.

    • PARALLEL_MAKE: Number of threads used by make
    • BB_NUMBER_THREADS: Number of threads used by bitbake. Setting BB_NUMBER_THREADS also limits the number of download and configuration threads.
    $ PARALLEL_MAKE="-j 4" BB_NUMBER_THREADS="6" bitbake <image>
    warning

    If a build fails due to lack of RAM, some files could be corrupted. Trying to build again may not solve this issue. In this situation:

    1. Delete the corrupted package state
    2. Delete all temporary files
    3. Attempt to build the image again

After the build completes, you can flash the generated image to your SoM using Toradex Easy Installer.

Convert the TensorFlow Model to Neutron

The i.MX 95 Neutron NPU requires Machine Learning models to be specifically targeted for its architecture. Follow the steps below to convert a TensorFlow Lite model using the eIQ Neutron SDK:

  1. Create a new directory and extract eIQ Neutron Toolkit:

    $ mkdir ~/eiq
    $ cd ~/eiq
    $ mv ~/Downloads/eiq-neutron-sdk-linux-3.0.1.zip .
    $ unzip eiq-neutron-sdk-linux-3.0.1.zip
  2. Boot up the image built on the Configure and Build the Image section, and transfer the model mobilenet_v1_1.0_224_quant.tflite from /usr/bin/tensorflow-lite-2.19.0/examples to your host machine with the following command:

    $ scp torizon@<device-ip>:/usr/bin/tensorflow-lite-2.19.0/examples/mobilenet_v1_1.0_224_quant.tflite .
  3. Convert the generic TensorFlow Lite model to an i.MX 95-specific model using eIQ Toolkit:

    $ ./bin/neutron-converter --input mobilenet_v1_1.0_224_quant.tflite --output mobilenet_v1_1.0_224_quant_converted.tflite --target imx95
  4. Transfer the converted file, mobilenet_v1_1.0_224_quant_converted.tflite, back to the device:

    $ scp mobilenet_v1_1.0_224_quant_converted.tflite torizon@<device-ip>:~/
  5. On the device, go into /usr/bin/tensorflow-lite-2.19.0/examples and run the benchmark application to ensure the conversion was successful:

    # sudo ./benchmark_model --graph=/home/torizon/mobilenet_v1_1.0_224_quant_converted.tflite --num_runs=1000 --external_delegate_path=/usr/lib/libneutron_delegate.so

If the model conversion was successful, you should see an output like the following:

INFO: STARTING!                                                                                                                                                                                                    
INFO: Log parameter values verbosely: [0]
INFO: Min num runs: [1000]
INFO: Graph: [/home/torizon/mobilenet_v1_1.0_224_quant_converted.tflite]
INFO: Signature to run: []
INFO: External delegate path: [/usr/lib/libneutron_delegate.so]
INFO: Loaded model /home/torizon/mobilenet_v1_1.0_224_quant_converted.tflite
INFO: EXTERNAL delegate created.
INFO: NeutronDelegate delegate: 1 nodes delegated out of 3 nodes with 1 partitions.

INFO: Neutron delegate version: v1.0.0-f24d08e5, zerocp enabled.
INFO: Explicitly applied EXTERNAL delegate, and the model graph will be partially executed by the delegate w/ 1 delegate kernels.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: The input model file size (MB): 4.33426
INFO: Initialized session in 9.328ms.
INFO: Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
INFO: count=389 first=1225 curr=1282 min=1109 max=1422 avg=1218.38 std=62 p5=1141 median=1195 p95=1327

INFO: Running benchmark for at least 1000 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
INFO: count=1000 first=1224 curr=1325 min=1171 max=1560 avg=1290.35 std=34 p5=1263 median=1281 p95=1338

INFO: Inference timings in us: Init: 9328, First inference: 1225, Warmup (avg): 1218.38, Inference (avg): 1290.35
INFO: Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
INFO: Memory footprint delta from the start of the tool (MB): init=7.17578 overall=7.30078

Next Steps

After following the steps presented in this article, you should now have a working setup with hardware acceleration for Machine Learning on the Toradex i.MX 95-based modules. For more information on developing and deploying Machine Learning applications, refer to Build Machine Learning Applications with NXP eIQ Software.

Toradex provides additional resources to help you get started with Machine Learning on our System-on-Modules:

Additional Resources

Send Feedback!