Build TensorFlow Lite Applications With NPU Support on SL1680-Based Modules on Torizon OS
Introduction
In this article, you will learn the basic workflow to build containerized TensorFlow Lite applications with Neural Processing Unit (NPU) support on SL1680-based modules running Torizon OS.
Prerequisites
-
Hardware Prerequisites:
- A SL1680-based Toradex Single Board Computer (SBC)
- A compatible Carrier Board
-
Software Prerequisites:
- A Toradex SBC with Torizon OS installed
- A configured build environment, as described in the Configure Build Environment for Torizon Containers article
- A quantized
.tflitemodel supported by the VX delegate- CPU fallback remains available for models that are not compatible with the NPU
Build TensorFlow Lite Applications
On your host machine, create a Dockerfile to set up the necessary dependencies to run TensorFlow Lite with NPU support. The produced container will include the runtime provided by Synaptics, which is designed to run on any hardware accelerator available on the system.
You do not need to use Toradex's packages to train your models. Toradex recommends using the upstream TensorFlow libraries for training.
ARG IMAGE_ARCH=linux/arm64
FROM torizon/debian-sl1680:4
# Define build arguments for the application
ARG TF_LITE_MODEL=""
ARG IMAGE_DATASET=""
ARG LABELMAP_FILE=""
ARG INFERENCE_SCRIPT=""
ARG WORKDIR_PATH="/app"
# Environment setup
ENV DEBIAN_FRONTEND=noninteractive \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1
WORKDIR ${WORKDIR_PATH}
# Install dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
\
# Python dependencies
python3 \
python3-venv \
python3-pip \
\
# Python libraries
python3-numpy \
python3-pil \
\
# TensorFlow Lite dependencies
libtensorflow-lite2.15.0-1 \
tflite-vx-delegate \
tim-vx \
\
# Synaptics dependencies
synap-runtime \
synap-prebuilts-libs \
synap-vxk \
&& rm -rf /var/lib/apt/lists/*
# Set up a Python virtual environment and install the TensorFlow Lite runtime
RUN python3 -m venv ${WORKDIR_PATH}/.venv --system-site-packages
# Install Python dependencies
RUN . ${WORKDIR_PATH}/.venv/bin/activate && \
pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir tflite-runtime
# Copy the application code into the container
COPY ${TF_LITE_MODEL} ${WORKDIR_PATH}
COPY ${LABELMAP_FILE} ${WORKDIR_PATH}
COPY ${IMAGE_DATASET} ${WORKDIR_PATH}
COPY ${INFERENCE_SCRIPT} ${WORKDIR_PATH}
As an example, we are going to use the NXP-provided object detection demo for the remainder of this article. Download the required resources using the commands below:
# Download the input image
$ curl -L https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/lite/examples/label_image/testdata/grace_hopper.bmp \
-o grace_hopper.bmp
# Download and extract the model
$ curl -L https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_1.0_224_quant.tgz \
| tar xz --wildcards --no-anchored '*.tflite'
# Download and extract the labels file
$ curl -L https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_1.0_224_frozen.tgz \
| tar xz --strip-components=1 --wildcards --no-anchored '*/labels.txt'
# Download the inference script
$ curl -L https://raw.githubusercontent.com/nxp-imx/tensorflow-imx/lf-6.12.49_2.2.0/tensorflow/lite/examples/python/label_image.py \
-o label_image.py
Define all required build arguments in the Dockerfile before building the image. The TF_LITE_MODEL, IMAGE_DATASET, LABELMAP_FILE, and INFERENCE_SCRIPT arguments must be set to valid file paths.
Finally, build the container image using the command below. Make sure to replace <username> and <image-name> with your Docker Hub username and the name for your image, respectively.
$ docker build -t <username>/<image-name>
$ docker push <username>/<image-name>
Run the Application
Once you have built and pushed your Docker image to the container registry, you can run the container on your SL1680-based module. Use the following command to start and attach to the container with the necessary permissions and environment variables to access the NPU:
# docker run -it \
-v /dev:/dev \
--device-cgroup-rule "c 10:119 rmw" \
<username>/<image-name>
Finally, execute the TensorFlow Lite demo application inside the container:
## source .venv/bin/activate
## python3 label_image.py --model_file mobilenet_v1_1.0_224_quant.tflite --image grace_hopper.bmp --label_file labels.txt --ext_delegate=/usr/lib/aarch64-linux-gnu/libvx_delegate.so
Run the following commands to execute the application using the other hardware accelerators available on the system:
CPU:
## python3 label_image.py --model_file mobilenet_v1_1.0_224_quant.tflite --image grace_hopper.bmp --label_file labels.txt
Performance Comparison
The following table provides a performance comparison of the inference times for the CPU and NPU on the Luna SL1680 module when running the TensorFlow Lite demo application:
| SoM | HW accelerator | Inference Time | FPS (1/Inference Time) |
|---|---|---|---|
| Luna SL1680 | None (CPU) | 74.10 ms | 13.49 FPS |
| Luna SL1680 | NPU | 2.10 ms | 476.19 FPS |