Skip to main content
Version: BSP 7.x.y

NPU Usage with Reference Images for Yocto Project

Introduction

This section provides examples of using the Neural Processing Unit (NPU) with Toradex BSP Reference Images for the Yocto Project. It includes guides on building Machine Learning software with reference images and integrating hardware acceleration for Machine Learning on specific System-on-Modules (SoMs).

To get started, you should have a basic understanding of Machine Learning concepts and familiarity with the Yocto Project for building custom Linux images.

TensorFlow Lite

Toradex SoMs support TensorFlow Lite for running Machine Learning models on the NPU. From the TensorFlow Lite Documentation:

TensorFlow Lite is a set of tools that enables on-device Machine Learning by helping developers run their models on mobile, embedded, and IoT devices.

TensorFlow Lite models are created from TensorFlow models using the TensorFlow Lite Converter. Note that the TensorFlow Lite version needs to match the TensorFlow version used to design the model.

info

Not every TensorFlow model is directly convertible to TensorFlow Lite, because some TensorFlow operators (ops) do not have a TensorFlow Lite equivalent.

Torizon OS

To get started with TensorFlow Lite on Torizon OS, refer to Run the TensorFlow Lite Demo Application With NPU Support on Torizon OS.

NPU Trade-offs

There are several trade-offs to consider when evaluating NPU usage for Machine Learning tasks:

  • Performance: The NPU can significantly accelerate Machine Learning inference, but the performance gain depends on the specific model and workload. Some models may not be fully optimized for NPU execution, leading to suboptimal performance.
  • Power Consumption: While the NPU can reduce the overall power consumption for Machine Learning tasks, it may still consume more power than a CPU for certain workloads. It's important to evaluate the power consumption of the NPU in relation to the performance benefits it provides.
  • Memory Footprint: Using the NPU can reduce memory footprint, but it may require additional optimization and tuning of Machine Learning models.
  • Compatibility: Not all Machine Learning models and frameworks are compatible with the NPU. You may need to modify your models or use specific libraries to leverage the NPU's capabilities, which can increase the complexity of your development process.

AI-Enabled Toradex SoMs

Toradex SoMs feature integrated NPUs that can be used for AI applications. These SoMs provide hardware acceleration for Machine Learning tasks, allowing for faster inference and improved performance. Some of the AI-enabled Toradex SoMs include:

System on ModuleExecution FrameworksDedicated Hardware UnitsExpected Performance*
Verdin iMX8M Plus
Toradex SMARC iMX8M Plus
TensorFlow LiteNPU - VeriSilicon
Up to 2 TOPs
1.0x
Verdin iMX95
Toradex SMARC iMX95
Aquila iMX95
TensorFlow LiteNPU - Neutron
Up to 2 TOPs, optimized design
2.0x ~ 2.5x
Aquila AM69/TDA4TensorFlow Lite
ONNXRuntime
TVM
4x NPUs
(4xC7x DSPs + 4xMMAv2)
Up to 32 TOPs total - 8 TOPs each
5.0x ~ 10.0x

* The expected performance is relative to Verdin iMX8M Plus and is based on the performance of the NPU in each SoM. Actual performance may vary depending on the specific model and workload being executed.

Vendors and Partners

Toradex collaborates with vendors and partners to provide support and tools for developing AI and Machine Learning applications on our SoMs. Refer to the following documentation for more information about developing Machine Learning applications on our SoMs:

Send Feedback!