NPU Usage with Reference Images for Yocto Project
Introduction
This section provides examples of using the Neural Processing Unit (NPU) with Toradex BSP Reference Images for the Yocto Project. It includes guides on building Machine Learning software with reference images and integrating hardware acceleration for Machine Learning on specific System-on-Modules (SoMs).
To get started, you should have a basic understanding of Machine Learning concepts and familiarity with the Yocto Project for building custom Linux images.
TensorFlow Lite
Toradex SoMs support TensorFlow Lite for running Machine Learning models on the NPU. From the TensorFlow Lite Documentation:
TensorFlow Lite is a set of tools that enables on-device Machine Learning by helping developers run their models on mobile, embedded, and IoT devices.
TensorFlow Lite models are created from TensorFlow models using the TensorFlow Lite Converter. Note that the TensorFlow Lite version needs to match the TensorFlow version used to design the model.
Not every TensorFlow model is directly convertible to TensorFlow Lite, because some TensorFlow operators (ops) do not have a TensorFlow Lite equivalent.
Torizon OS
To get started with TensorFlow Lite on Torizon OS, refer to Run the TensorFlow Lite Demo Application With NPU Support on Torizon OS.
NPU Trade-offs
There are several trade-offs to consider when evaluating NPU usage for Machine Learning tasks:
- Performance: The NPU can significantly accelerate Machine Learning inference, but the performance gain depends on the specific model and workload. Some models may not be fully optimized for NPU execution, leading to suboptimal performance.
- Power Consumption: While the NPU can reduce the overall power consumption for Machine Learning tasks, it may still consume more power than a CPU for certain workloads. It's important to evaluate the power consumption of the NPU in relation to the performance benefits it provides.
- Memory Footprint: Using the NPU can reduce memory footprint, but it may require additional optimization and tuning of Machine Learning models.
- Compatibility: Not all Machine Learning models and frameworks are compatible with the NPU. You may need to modify your models or use specific libraries to leverage the NPU's capabilities, which can increase the complexity of your development process.
AI-Enabled Toradex SoMs
Toradex SoMs feature integrated NPUs that can be used for AI applications. These SoMs provide hardware acceleration for Machine Learning tasks, allowing for faster inference and improved performance. Some of the AI-enabled Toradex SoMs include:
| System on Module | Execution Frameworks | Dedicated Hardware Units | Expected Performance* |
|---|---|---|---|
| Verdin iMX8M Plus Toradex SMARC iMX8M Plus | TensorFlow Lite | NPU - VeriSilicon Up to 2 TOPs | 1.0x |
| Verdin iMX95 Toradex SMARC iMX95 Aquila iMX95 | TensorFlow Lite | NPU - Neutron Up to 2 TOPs, optimized design | 2.0x ~ 2.5x |
| Aquila AM69/TDA4 | TensorFlow Lite ONNXRuntime TVM | 4x NPUs (4xC7x DSPs + 4xMMAv2) Up to 32 TOPs total - 8 TOPs each | 5.0x ~ 10.0x |
* The expected performance is relative to Verdin iMX8M Plus and is based on the performance of the NPU in each SoM. Actual performance may vary depending on the specific model and workload being executed.
Vendors and Partners
Toradex collaborates with vendors and partners to provide support and tools for developing AI and Machine Learning applications on our SoMs. Refer to the following documentation for more information about developing Machine Learning applications on our SoMs:
Build Machine Learning Applications With NXP eIQ Software
Learn how to run Machine Learning applications using NXP eIQ Software with the Reference Images for Yocto Project.
Build Machine Learning Applications With TI Edge AI Stack
Learn how to build Machine Learning applications using TI Edge AI and TensorFlow Lite with the Reference Images for Yocto Project.