Build Machine Learning Applications with NXP eIQ Software
Introduction
In this article, you will learn how to run Machine Learning demos using the NXP eIQ Machine Learning software on Toradex System-on-Modules (SoMs) based on NXP processors. The NXP eIQ software provides a complete Machine Learning development environment, including the eIQ Toolkit workflow tool, inference engines, neural network compilers, and optimized libraries.
NXP eIQ
NXP eIQ software provides a Machine Learning software stack optimized for NXP i.MX SoCs. It enables the deployment and execution of neural network models on embedded devices using hardware accelerators.
On supported platforms, neural network inference can be accelerated on the GPU or NPU through the OpenVX backend. If hardware acceleration is not available, inference can also run on Arm Cortex-A CPU cores, where eIQ supports multi-threaded execution to improve performance.
More details about supported frameworks, hardware acceleration, and deployment workflows can be found in the i.MX Machine Learning User's Guide, available in the NXP's Embedded Linux Documentation.
Prerequisites
-
Hardware Prerequisites:
- An NXP-based Toradex SoM with eIQ support
- A compatible Carrier Board
-
Software Prerequisites:
- Setup the environment for hardware acceleration as NXP-Based Processors Setup for Machine Learning guide
Demonstration
After setting up the environment and building a reference image with the eIQ Machine Learning software integrated, you can run the provided demos to test the NPU capabilities of your SoM.
This article guides you through running a sample image classification demo using the MobileNet V1 model on the i.MX 8M Plus and the i.MX 95 based modules. The demo is included as part of the eIQ software package.
First, get into the demo's directory on the target device:
# cd /usr/bin/tensorflow-lite-2.16.2/examples/
The version of TensorFlow Lite included in the reference image may change over time. Make sure to check the version available in your image and adjust the path accordingly.
The NXP demo uses the input image grace_hopper.bmp to perform image classification with a neural network based on the MobileNet V1 model. Check the NXP's i.MX Machine Learning User's Guide for more details.
Run the command below to execute the NXP demonstration on the desired hardware accelerator:
iMX8M Plus
# USE_GPU_INFERENCE=0 ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt --external_delegate_path=/usr/lib/libvx_delegate.so
iMX95
# ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt --external_delegate_path=/usr/lib/libneutron_delegate.so
iMX8M Plus
# USE_GPU_INFERENCE=1 ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt --external_delegate_path=/usr/lib/libvx_delegate.so
iMX95
# ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt --use_gpu=true --gpu_backend=cl --gpu_precision_loss_allowed=true --gpu_experimental_enable_quant=true --gpu_inference_for_sustained_speed=false
iMX8M Plus / iMX95
# ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt
Performance Comparison
The following tables show the inference time and frames per second (FPS) for the MobileNet V1 demo running on different Toradex SoMs with NXP-based processors, using NPU, GPU, and CPU acceleration.
| SoM | Inference Time | FPS (1/Inference Time) |
|---|---|---|
| Verdin iMX8M Plus | 3.08 ms | 324.67 FPS |
| SMARC iMX8M Plus | 3.16 ms | 316.45 FPS |
| Verdin iMX95 | 0.16 ms | 6172.84 FPS |
| SMARC iMX95 | 0.28 ms | 3571.42 FPS |
| iMX95 Verdin EVK | 1.32 ms | 754.14 FPS |
| SoM | Inference Time | FPS (1/Inference Time) |
|---|---|---|
| Verdin iMX8M Plus | 170.11 ms | 5.87 FPS |
| SMARC iMX8M Plus | 170.06 ms | 5.88 FPS |
| Verdin iMX95 | 523.76 ms | 1.90 FPS |
| SMARC iMX95 | 511.39 ms | 1.95 FPS |
| iMX95 Verdin EVK | 27.54 ms | 36.31 FPS |
| SoM | Inference Time | FPS (1/Inference Time) |
|---|---|---|
| Verdin iMX8M Plus | 35.42 ms | 28.23 FPS |
| SMARC iMX8M Plus | 35.99 ms | 27.78 FPS |
| Verdin iMX95 | 13.20 ms | 75.75 FPS |
| SMARC iMX95 | 12.92 ms | 77.40 FPS |
| iMX95 Verdin EVK | 16.51 ms | 60.54 FPS |
As described in NXP Application Note AN12964, the i.MX 8M Plus SoC requires a warm-up period of approximately 7 seconds before the NPU reaches its expected performance.
Additional Resources
- Build TensorFlow Lite Applications With NPU Support on i.MX 8M Plus-Based Modules on Torizon OS
- Run TensorFlow Lite Demo Application with NPU Support on i.MX 8M Plus-based Modules on Torizon OS
- AI, Computer Vision, and Machine Learning on Toradex Computer on Modules
- NPU Usage with Reference Images for Yocto Project
- NXP's i.MX Machine Learning User's Guide