Version: BSP 7.x.y

NPU Usage with Reference Images for Yocto Project

Introduction

This section provides examples of using the Neural Processing Unit (NPU) with Toradex BSP Reference Images for the Yocto Project. It includes guides on building Machine Learning software with reference images and integrating hardware acceleration for Machine Learning on specific System-on-Modules (SoMs).

To get started, you should have a basic understanding of Machine Learning concepts and familiarity with the Yocto Project for building custom Linux images.

TensorFlow Lite

Toradex SoMs support TensorFlow Lite for running Machine Learning models on the NPU. From the TensorFlow Lite Documentation:

TensorFlow Lite is a set of tools that enables on-device Machine Learning by helping developers run their models on mobile, embedded, and IoT devices.

TensorFlow Lite models are created from TensorFlow models using the TensorFlow Lite Converter. Note that the TensorFlow Lite version needs to match the TensorFlow version used to design the model.

info

Not every TensorFlow model is directly convertible to TensorFlow Lite, because some TensorFlow operators (ops) do not have a TensorFlow Lite equivalent.

Torizon OS

To get started with TensorFlow Lite on Torizon OS, refer to Run the TensorFlow Lite Demo Application With NPU Support on Torizon OS.

NPU Trade-offs

There are several trade-offs to consider when evaluating NPU usage for Machine Learning tasks:

Performance: The NPU can significantly accelerate Machine Learning inference, but the performance gain depends on the specific model and workload. Some models may not be fully optimized for NPU execution, leading to suboptimal performance.
Power Consumption: While the NPU can reduce the overall power consumption for Machine Learning tasks, it may still consume more power than a CPU for certain workloads. It's important to evaluate the power consumption of the NPU in relation to the performance benefits it provides.
Memory Footprint: Using the NPU can reduce memory footprint, but it may require additional optimization and tuning of Machine Learning models.
Compatibility: Not all Machine Learning models and frameworks are compatible with the NPU. You may need to modify your models or use specific libraries to leverage the NPU's capabilities, which can increase the complexity of your development process.

AI-Enabled Toradex SoMs

Toradex SoMs feature integrated NPUs that can be used for AI applications. These SoMs provide hardware acceleration for Machine Learning tasks, allowing for faster inference and improved performance. Some of the AI-enabled Toradex SoMs include:

System on Module	Execution Frameworks	Dedicated Hardware Units	Expected Performance*
Verdin iMX8M Plus Toradex SMARC iMX8M Plus	TensorFlow Lite	NPU - VeriSilicon Up to 2 TOPs	1.0x
Verdin iMX95 Toradex SMARC iMX95 Aquila iMX95	TensorFlow Lite	NPU - Neutron Up to 2 TOPs, optimized design	2.0x ~ 2.5x
Aquila AM69/TDA4	TensorFlow Lite ONNXRuntime TVM	4x NPUs (4xC7x DSPs + 4xMMAv2) Up to 32 TOPs total - 8 TOPs each	5.0x ~ 10.0x

* The expected performance is relative to Verdin iMX8M Plus and is based on the performance of the NPU in each SoM. Actual performance may vary depending on the specific model and workload being executed.

Vendors and Partners

Toradex collaborates with vendors and partners to provide support and tools for developing AI and Machine Learning applications on our SoMs. Refer to the following documentation for more information about developing Machine Learning applications on our SoMs:

Build Machine Learning Applications With NXP eIQ Software

Learn how to run Machine Learning applications using NXP eIQ Software with the Reference Images for Yocto Project.

Build Machine Learning Applications With TI Edge AI Stack

Learn how to build Machine Learning applications using TI Edge AI and TensorFlow Lite with the Reference Images for Yocto Project.

Introduction​

TensorFlow Lite​

Torizon OS​

NPU Trade-offs​

AI-Enabled Toradex SoMs​

Vendors and Partners​