Skip to main content
Version: Torizon OS 7.x.y

Run TensorFlow Lite with the NPU on i.MX 8M Plus Based Modules on Torizon OS

Introduction

This article will guide you on how to run TensorFlow Lite with the NPU (VeriSilicon/VX) on i.MX 8M Plus based Toradex System on Modules (SoMs) on Torizon OS. This guide serves as a basic setup for using the NPU inside Torizon containers using libvx_delegate.so, with a CPU fallback.

Requirements

Hardware requirements:

Software Requirements:

  • A Toradex System on Module with Torizon OS installed with NPU/VX support.
  • A configured build environment, as described in the Configure Build Environment for Torizon Containers article.
  • A quantized .tflite model supported by the VX delegate (CPU fallback remains available)

Run TensorFlow Lite on the NPU

From Torizon example

To run using an example model and dataset from Torizon:

  1. On your target, runs docker run command:
docker run --rm \
--name "tflite-example" \
-v /dev:/dev \
-v /tmp:/tmp \
--device-cgroup-rule "c 199:0 rmw" \
torizon/tensorflow-lite-imx8:4
  1. After run you should expect the following results:
    1. Running using NPU:
    INFO: Copying output 362, tfl.dequantize
    INFO: Copying output 380, convert_scores1
    /app/images/validation/colibri_wifi4_jpg.rf.5374189653b8a1ed91b6225899687528.jpg: som (0.0299s)
    INFO: Delegate::Invoke node: 0x9959a00
    INFO: Copying input 0: serving_default_input:0
    INFO: Invoking graph
    INFO: Copying output 362, tfl.dequantize
    INFO: Copying output 380, convert_scores1
    /app/images/validation/colibrinowifi5_jpg.rf.2447307ad48bf09ff77c3a1d2db22b36.jpg: som (0.0307s)
    INFO: Delegate::Invoke node: 0x9959a00
    INFO: Copying input 0: serving_default_input:0
    INFO: Invoking graph
    INFO: Copying output 362, tfl.dequantize
    INFO: Copying output 380, convert_scores1
    /app/images/validation/colibrinowifi7_jpg.rf.f92a43a199f8593af6175bfaffd424e8.jpg: som (0.0301s)
    INFO: Delegate::Invoke node: 0x9959a00
    INFO: Copying input 0: serving_default_input:0
    INFO: Invoking graph
    INFO: Copying output 362, tfl.dequantize
    INFO: Copying output 380, convert_scores1
    /app/images/validation/verdin11_jpg.rf.40e36cc5e58bac016920773d25c4402d.jpg: som (0.0296s)
    INFO: Delegate::Invoke node: 0x9959a00
    INFO: Copying input 0: serving_default_input:0
    INFO: Invoking graph
    INFO: Copying output 362, tfl.dequantize
    INFO: Copying output 380, convert_scores1
    /app/images/validation/verdin5_jpg.rf.6c565df4d8282284fff93a7ea1322ed2.jpg: som (0.0300s)
    INFO: Delegate::Invoke node: 0x9959a00
    INFO: Copying input 0: serving_default_input:0
    INFO: Invoking graph
    INFO: Copying output 362, tfl.dequantize
    INFO: Copying output 380, convert_scores1
    /app/images/validation/verdin7_jpg.rf.e1d2e776604747c5ceea49fb1b3aebe4.jpg: som (0.0302s)

    Images processed: 35
    Mean inference time: 0.030091490064348494
    Images/s: 33.23198677970322
    Std deviation: 0.00035271071249791896
    1. Running using CPU:
    INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
    /app/images/validation/20170202_130713_jpg.rf.02141be2c7a33a178eb732e60d931b08.jpg: som (0.3679s)
    /app/images/validation/20170202_130748_jpg.rf.59ba7f77331c66f87e1f84d681ae24af.jpg: som (0.3665s)
    /app/images/validation/20170427_091556_jpg.rf.04de51bcf68c7dcaf9cef819dfb77f96.jpg: som (0.3668s)
    /app/images/validation/20171215_120124_jpg.rf.42ab3a816917acbbc52f0085baca0ebc.jpg: som (0.3661s)
    /app/images/validation/20180207_110215_jpg.rf.545bf8c1a4f8879453e8d50433a98186.jpg: som (0.3666s)
    /app/images/validation/20200124_092137_jpg.rf.6fd46b3b2342b41af2731e965e096eec.jpg: som (0.3664s)
    /app/images/validation/20211003_124958_jpg.rf.247d395827b04d847f6d1dd72763e72c.jpg: som (0.3666s)
    /app/images/validation/20240208_180942_jpg.rf.50bfcc530043756d1afe42e34fae1899.jpg: som (0.3669s)
    /app/images/validation/20240208_181002_jpg.rf.8daf107e086842d887652ad65e85d845.jpg: som (0.3664s)
    /app/images/validation/20240208_181018_jpg.rf.d81032846504ab42fcd07caea1457f3d.jpg: som (0.3669s)
    /app/images/validation/20240209_074516_jpg.rf.f63a60a8423ea7faa6d4cfb70a2fbcb4.jpg: som (0.3665s)
    /app/images/validation/20240209_074532_jpg.rf.6563ef4a988952c3c6030120a6379827.jpg: som (0.3664s)
    /app/images/validation/20240209_074541_jpg.rf.3125e79e48dc249c228c3dc78b601338.jpg: som (0.3669s)
    /app/images/validation/20240209_074559_jpg.rf.63413dac635d4525c4a568dee4dc5bc0.jpg: som (0.3667s)
    /app/images/validation/20240209_074622_jpg.rf.1367ecac766633d291c1e298297c5fea.jpg: som (0.3668s)
    /app/images/validation/20240209_093936_jpg.rf.2e1e4301a5d3e7df4de4aee6f278c23c.jpg: som (0.3665s)
    /app/images/validation/20240220_101428_jpg.rf.aaba70e29b442df34864333c81f317ce.jpg: som (0.3666s)
    /app/images/validation/20240220_101437_jpg.rf.6cba29955ce6a40f6be9190084bd38cd.jpg: som (0.3665s)
    /app/images/validation/20240228_101455_jpg.rf.7a8999b086694135841ca8bf0ea983e4.jpg: som (0.3656s)
    /app/images/validation/20240229_081302_jpg.rf.c4e7ed154cb6cdf27940a738df3f37d0.jpg: som (0.3663s)
    /app/images/validation/20240229_081333_jpg.rf.3aa011c33beba0cc76d367b9a348d9dd.jpg: som (0.3668s)
    /app/images/validation/20240229_114225_jpg.rf.cbe35cee56aae023d9612166bf21ac0f.jpg: som (0.3667s)
    /app/images/validation/20240229_120443_jpg.rf.d22060b22ae2c200ee607b5f2971a795.jpg: som (0.3668s)
    /app/images/validation/DSC_0055_JPG.rf.846f69306b8ee55d4a0e02c0289b8aff.jpg: som (0.3666s)
    /app/images/validation/colibri_it11_jpg.rf.e87591ae0966de61a44aa129f5fbcf76.jpg: som (0.3668s)
    /app/images/validation/colibri_it13_jpg.rf.a2744fa933e6fc5950216967114a4cd0.jpg: som (0.3666s)
    /app/images/validation/colibri_it2_jpg.rf.6c28a3a84c9bcaef3545adc488866eca.jpg: som (0.3666s)
    /app/images/validation/colibri_it5_jpg.rf.7f4be685d931afed8e84306d3112bef0.jpg: som (0.3666s)
    /app/images/validation/colibri_it7_jpg.rf.51b04122bd907e7eb255ac1e9387b6c4.jpg: som (0.3668s)
    /app/images/validation/colibri_wifi1_jpg.rf.c4d21bffe663b54682c8b2ece7fc337a.jpg: som (0.3668s)
    /app/images/validation/colibri_wifi4_jpg.rf.5374189653b8a1ed91b6225899687528.jpg: som (0.3667s)
    /app/images/validation/colibrinowifi5_jpg.rf.2447307ad48bf09ff77c3a1d2db22b36.jpg: som (0.3663s)
    /app/images/validation/colibrinowifi7_jpg.rf.f92a43a199f8593af6175bfaffd424e8.jpg: som (0.3667s)
    /app/images/validation/verdin11_jpg.rf.40e36cc5e58bac016920773d25c4402d.jpg: som (0.3664s)
    /app/images/validation/verdin5_jpg.rf.6c565df4d8282284fff93a7ea1322ed2.jpg: som (0.3665s)
    /app/images/validation/verdin7_jpg.rf.e1d2e776604747c5ceea49fb1b3aebe4.jpg: som (0.3664s)

    Images processed: 35
    Mean inference time: 0.3665829249790737
    Images/s: 2.7278957416172203
    Std deviation: 0.00025898585318932067

As we can see from the above results, the NPU had a performance ten times faster than using CPU proving the proper behavior of NPU usage:

CPU: Mean inference time: 0.3665829249790737
NPU: Mean inference time: 0.030091490064348494

From scratch

To run your .tflite model, follow the instructions below:

  1. On your host machine, create a Dockerfile as shown below. The Dockerfile will download the necessary runtime packages from Toradex/NXP's feeds and install them in your image.

    tip

    You do not need to use Toradex's packages to train your models. Toradex recommends using the upstream TensorFlow libraries for training.

    Dockerfile
    ARG IMAGE_ARCH=linux/arm64

    FROM --platform=$IMAGE_ARCH torizon/debian-imx8:4

    ARG TF_LITE_MODEL=""
    ARG IMAGE_DATASET=""
    ARG WORKDIR_PATH="/app"
    ARG LABELMAP_FILE=""

    # variables used by example.py script
    ENV DATA_DIR=${IMAGE_DATASET}
    ENV MODEL=${TF_LITE_MODEL}
    ENV LABELS=${LABELMAP_FILE}
    ENV APP_ROOT=${WORKDIR_PATH}

    ENV DEBIAN_FRONTEND=noninteractive \
    PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

    WORKDIR ${WORKDIR_PATH}

    RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    python3 python3-venv python3-pip \
    python3-numpy python3-pil \
    libtim-vx \
    libtensorflow-lite2.16.2 \
    tflite-vx-delegate-imx \
    imx-gpu-viv-wayland \
    imx-gpu-viv-wayland-dev

    RUN python3 -m venv ${WORKDIR_PATH}/.venv --system-site-packages

    RUN . ${WORKDIR_PATH}/.venv/bin/activate && \
    pip3 install --upgrade pip && \
    pip3 install --no-cache-dir tflite-runtime

    COPY ${TF_LITE_MODEL} ${WORKDIR_PATH}/model/
    COPY ${LABELMAP_FILE} ${WORKDIR_PATH}
    COPY ${IMAGE_DATASET} ${WORKDIR_PATH}/${IMAGE_DATASET}/

    COPY example.py ${WORKDIR_PATH}/example.py
  2. In the same directory where your Dockerfile is, create an example python script to run your .tflite model. The example below uses the CPU as a fallback. Note that model quantization is required for acceptable performance in most cases.

    example.py
    #!/usr/bin/env python3
    import os, glob, time
    import numpy as np
    from PIL import Image

    import tflite_runtime.interpreter as tfl


    def find_vx_delegate():
    for p in (
    os.getenv("VX_DELEGATE", "/usr/lib/libvx_delegate.so"),
    "/usr/lib/libvx_delegate.so.2",
    "/usr/lib/libvx_delegate.so.1",
    ):
    if os.path.exists(p):
    return p
    return None


    def preprocess(path, size, dtype):
    img = Image.open(path).convert("RGB")
    w, h = img.size
    s = max(w, h)
    canvas = Image.new("RGB", (s, s))
    canvas.paste(img, ((s - w) // 2, (s - h) // 2))
    canvas = canvas.resize((size, size))
    x = np.asarray(canvas, dtype=dtype)
    return np.expand_dims(x, 0)


    def main():
    root = os.getenv("APP_ROOT", "/app")
    data_dir = os.getenv("DATA_DIR", f"{root}/images/validation")
    model = os.getenv("MODEL", f"{root}/ssd_detect_quant_only_som.tflite")
    labels_fn = os.getenv("LABELS", f"{root}/labelmap.txt")
    limit = int(os.getenv("LIMIT", "0"))

    files = sorted(glob.glob(f"{data_dir}/*.jpg"))

    if limit > 0:
    files = files[:limit]
    if not files:
    raise SystemExit(f"no images under {data_dir}")

    labels = [l.strip() for l in open(labels_fn, "r", encoding="utf-8") if l.strip()]

    delegates = []
    vx = find_vx_delegate()
    if vx:
    try:
    delegates.append(tfl.load_delegate(vx))
    except Exception:
    delegates = []

    itp = (
    tfl.Interpreter(model_path=model, experimental_delegates=delegates)
    if delegates
    else tfl.Interpreter(model_path=model)
    )
    itp.allocate_tensors()

    in0 = itp.get_input_details()[0]
    out0 = itp.get_output_details()[0]
    size = int(in0["shape"][1])
    dtype = in0["dtype"]

    times = []
    for i, p in enumerate(files):
    itp.set_tensor(in0["index"], preprocess(p, size, dtype))
    t1 = time.time()
    itp.invoke()
    t2 = time.time()
    y = itp.get_tensor(out0["index"])[0]
    top = int(np.argmax(y))
    print(
    f"{p}: {labels[top] if top < len(labels) else top} ({t2-t1:.4f}s)",
    flush=True,
    )
    if i != 0:
    times.append(t2 - t1)

    if times:
    a = np.array(times, dtype=np.float64)
    m = float(a.mean())
    print("\nImages processed:", len(a))
    print("Mean inference time:", m)
    print("Images/s:", (1.0 / m) if m > 0 else 0.0)
    print("Std deviation:", float(a.std()))


    if __name__ == "__main__":
    main()

  3. Build the image with the command below. Replace <username>/<image-name> with your Docker Hub username and an image name of your choice.

    $ docker build -t <username>/<image-name> \
    --build-arg TF_LITE_MODEL=<user-model-file> \
    --build-arg IMAGE_DATASET=<user-images-folder> \
    --build-arg LABELMAP_FILE=<user-label-file> .

    $ docker push <username>/<image-name>
  4. Run the container inside your target device:

    # docker run -it \
    -v /dev:/dev \
    -e VX_DELEGATE=/usr/lib/libvx_delegate.so \
    --device-cgroup-rule "c 199:0 rmw" \
    <username>/<image-name>
  5. Inside the container in your target, run the example script:

    #. .venv/bin/activate && python3 example.py

    Expected output:

    images/validation/colibri_it7_jpg.rf.51b04122bd907e7eb255ac1e9387b6c4.jpg: 663 (0.0034s)
    INFO: Delegate::Invoke node: 0x35921270
    INFO: Copying input 88: input
    INFO: Invoking graph
    INFO: Copying output 87, MobilenetV1/Predictions/Reshape_1
    images/validation/colibri_wifi4_jpg.rf.5374189653b8a1ed91b6225899687528.jpg: 845 (0.0037s)
    INFO: Delegate::Invoke node: 0x35921270
    INFO: Copying input 88: input
    INFO: Invoking graph
    INFO: Copying output 87, MobilenetV1/Predictions/Reshape_1
    images/validation/colibrinowifi7_jpg.rf.f92a43a199f8593af6175bfaffd424e8.jpg: 482 (0.0033s)
    INFO: Delegate::Invoke node: 0x35921270
    INFO: Copying input 88: input
    INFO: Invoking graph
    INFO: Copying output 87, MobilenetV1/Predictions/Reshape_1
    images/validation/verdin11_jpg.rf.40e36cc5e58bac016920773d25c4402d.jpg: 663 (0.0034s)
    INFO: Delegate::Invoke node: 0x35921270
    INFO: Copying input 88: input
    INFO: Invoking graph
    INFO: Copying output 87, MobilenetV1/Predictions/Reshape_1
    images/validation/verdin5_jpg.rf.6c565df4d8282284fff93a7ea1322ed2.jpg: 663 (0.0034s)

    Images processed: 34
    Mean inference time: 0.0036329311483046588
    Images/s: 275.2598271692155
    Std deviation: 0.0003194358571149672
Send Feedback!