Search by Tags

TorizonCore Boot Time Optimization

 

Article updated at 23 Aug 2021
Subscribe for this article updates

Select the version of your OS from the tabs below. If you don't know the version you are using, run the command cat /etc/os-release or cat /etc/issue on the board.



Remember that you can always refer to the Torizon Documentation, there you can find a lot of relevant articles that might help you in the application development.

Torizon 5.4.0

Introduction

For an Operating System, it is always desirable to have a shorter boot time and to have the applications run as soon as possible.

The leeway one has in making optimizations for a general purpose distribution, such as TorizonCore, is quite limited as compared to a system that is designed to do a limited set of tasks.

A general purpose distribution, such as TorizonCore, decides and defines many features at the distribution level. This in turn may decide:

  • What device drivers are available
  • What filesystems are supported
  • Which Bootloader/Kernel features are enabled and so on

To support a wide clientele that is making diverse products using TorizonCore, Toradex enables all the features that most of our customers are going to need. This in turn can increase the size of the resulting boot artifacts(Bootloader, Kernel, Rootfs) for a Special Purpose Linux system that a customer may be deploying in the end.

Toradex does the distribution level optimizations and the customer are free to use different tools, such as the TorizonCore Builder Tool, to further optimize the system for their specific needs.

This article focuses mostly on the distribution level optimizations.

Toradex also provides technical resources for the customers to enable them to optimize the boot time further for their final application(s). Please see the blog post linked at the end of this article.

This article complies to the Typographic Conventions for the Toradex Documentation.

Prerequisites

A Toradex SoM with Torizon installed.

TorizonCore Boot Time Measurements

How to measure TorizonCore boot time

There are many ways and tools to determine the boot performance of a Linux system. As TorizonCore uses systemd as an init system, systemd-analyze can be used to determine and analyze system boot-up performance statistics.

  • systemd-analyze time

This command gives the time spent in kernel before userspace is reached and the time userspace took to spawn all services.

# systemd-analyze time
Startup finished in 3.629s (kernel) + 6.978s (userspace) = 10.607s 
multi-user.target reached after 6.947s in userspace


  • systemd-analyze blame

If the time measurements given by systemd-analyze time seem too long we can use the blame option of systemd-analyze to see where most of the time was spent in userspace initialization.

# systemd-analyze blame | head
3.019s plymouth-quit-wait.service          
2.078s systemd-logind.service              
1.836s dev-disk-by\x2dlabel-otaroot.device 
1.720s docker.service                      
1.318s plymouth-quit.service               
1.115s NetworkManager-wait-online.service  
1.001s udisks2.service                     
 920ms mwifiex-delay.service               
 816ms systemd-networkd.service            
 765ms systemd-resolved.service            
verdin-imx8mm-06827785:~$


  • systemd-analyze critical-chain [UNIT]

The critical-chain option reports a tree of time critical-chain of units. By default it reports this for the default target. In case of TorizonCore the default target is multi-user.target. The default boot target can be retrieved by systemctl get-default.

To print critical-chain for the default target:

# systemd-analyze critical-chain 
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

multi-user.target @6.947s
└─getty.target @6.947s
  └─serial-getty@ttymxc0.service @6.945s
    └─systemd-user-sessions.service @3.881s +26ms
      └─network.target @3.858s
        └─wpa_supplicant.service @5.466s +73ms
          └─basic.target @2.046s
            └─sockets.target @2.046s
              └─sshd.socket @2.020s +24ms
                └─sysinit.target @2.005s
                  └─systemd-timesyncd.service @1.285s +718ms
                    └─systemd-tmpfiles-setup.service @1.247s +31ms
                      └─systemd-journal-flush.service @1.225s +17ms
                        └─systemd-journald.service @581ms +639ms
                          └─systemd-journald.socket @550ms
                            └─-.mount @525ms
                              └─system.slice @525ms
                                └─-.slice @525ms

To print critical-chain for a particular target, say docker.service:

# systemd-analyze critical-chain docker.service           
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

docker.service +1.720s
└─network-online.target @3.858s
  └─network.target @3.858s
    └─wpa_supplicant.service @5.466s +73ms
      └─basic.target @2.046s
        └─sockets.target @2.046s
          └─sshd.socket @2.020s +24ms
            └─sysinit.target @2.005s
              └─systemd-timesyncd.service @1.285s +718ms
                └─systemd-tmpfiles-setup.service @1.247s +31ms
                  └─systemd-journal-flush.service @1.225s +17ms
                    └─systemd-journald.service @581ms +639ms
                      └─systemd-journald.socket @550ms
                        └─-.mount @525ms
                          └─system.slice @525ms
                            └─-.slice @525ms

There are some important points to consider to interpret the measurement as reported by these commands. Please take a look at man systemd-analyze and take these considerations into account when interpreting the measurements. Also note that systemd-analyze does not report the time spent in the bootloader and the time it reports is certainly not from when the power button is pressed on the device.

Current measurements as of TorizonCore 5.4.0-devel-20210824+build.387

All measurements are in seconds as reported by systemd-analyze time. The nightly image is loaded on the module and systemd-analyze time is run to take the measurements for the 1st boot. The board is then rebooted and systemd-analyze time is run again to take the measurements for the second boot. Practically there is a variance of around 3 seconds in the overall boot time for successive boots.

Without pre-provisioned containers

Module Kernel(1st boot) Userspace(1st boot) Overall(1st boot) Kernel(2nd boot) Userspace(2nd boot) Overall(2nd boot)
Colibri iMX6 4.715 18.834 23.550 5.107 16.842 21.950
Colibri iMX6ULL 8.377 46.892 55.269 8.593 46.914 55.508
Colibri iMX7 (eMMC) 4.240 20.600 24.841 4.163 18.646 22.810
Colibri iMX7 (NAND) 4.693 22.923 27.617 4.458 22.906 27.365
Colibri iMX8X 3.606 14.871 18.477 3.611 12.888 16.499
Apalis iMX6 6.238 20.591 26.830 6.428 18.622 25.050
Apalis iMX8 4.211 8.753 12.965 4.224 6.749 10.974
Apalis iMX8X 4.078 12.922 17.001 4.060 10.909 14.970
Verdin iMX8MM 2.629 9.044 11.673 2.608 6.982 9.591
Verdin iMX8MP 3.353 12.420 15.774 3.292 10.760 14.052

With pre-provisioned containers

Module Kernel(1st boot) Userspace(1st boot) Overall(1st boot) Kernel(2nd boot) Userspace(2nd boot) Overall(2nd boot)
Colibri iMX6 4.639 18.842 23.481 5.135 18.801 23.936
Colibri iMX7 (eMMC) 4.188 20.568 24.757 4.170 18.642 22.813
Colibri iMX8X 3.651 14.882 18.533 3.552 12.884 16.437
Apalis iMX6 5.796 20.562 26.358 6.556 16.595 23.152
Apalis iMX8 4.177 8.901 13.079 4.303 7.006 11.310
Apalis iMX8X 4.048 12.948 16.996 4.069 12.916 16.986
Verdin iMX8MM 2.611 9.070 11.682 2.640 7.014 9.654
Verdin iMX8MP 3.273 13.308 16.582 3.336 10.852 14.188

Blacklisting WiFi drivers to optimize boot time

Some of the Toradex SoMs use a Marvell chipset for WiFi. The mwifiex drivers for these devices are enabled as loadable kernel modules by default and are loaded at boot time. It was observed that the boot time could vary depending upon when the WiFi drivers were loaded during the boot process. If the WiFi drivers loaded early in the boot, the boot time would be longer and vice versa. To always have a shorter boot time, we blacklist these drivers by default and force their loading at a later stage in boot.

The drivers(modules) are blacklisted by creating a conf file in /etc/modprobe.d/ (See $ man modprobe.conf) and then loaded later in the boot using a systemd service. The dependencies of this service also manage at what time these drivers are loaded.

See this commit as an example of how we did this in TorizonCore. Same approach may also be applied to any other WiFi driver that exhibit the same issue. Unneeded modules can also be blacklisted to optimize the boot time further. Modules currently loaded into the system can be checked using # lsmod.

Running container applications earlier in the boot process

One of the main challenges in running the container applications earlier in the boot process is to initialize the container engine/daemon earlier.

The container daemon in case of TorizonCore is dockerd which is launched by a systemd service docker.service.

Let us take a look at how we can optimize the boot time for a containerized application that does not depend on network. We will use a sample Qt application for demonstration purposes.

If you are running a Torizon image with evaluation containers, please disable the docker-compose service.

# sudo systemctl disable docker-compose

Next, launch a Qt5 OpenGL cube example using docker run --restart unless-stopped so that it is automatically run at boot. For example, on iMX8

# docker run -d --restart unless-stopped --net=host --name=qt5-wayland-examples \
    -e QT_QPA_PLATFORM=eglfs \
    --cap-add CAP_SYS_TTY_CONFIG -v /dev:/dev -v /tmp:/tmp -v /run/udev/:/run/udev/ \
    --device-cgroup-rule='c 4:* rmw'  --device-cgroup-rule='c 13:* rmw' \
    --device-cgroup-rule='c 199:* rmw' --device-cgroup-rule='c 226:* rmw' \
    torizon/qt5-wayland-examples-vivante:$CT_TAG_QT5_WAYLAND_EXAMPLES_VIVANTE \
    /usr/bin/kms-setup.sh /usr/lib/aarch64-linux-gnu/qt5/examples/opengl/cube/cube

You will see a dice on the connected display. Reboot the board and the application will load automatically. Use journalctl to find the time it took to launch the application:

# journalctl | head -n1
-- Logs begin at Tue 2021-08-31 08:31:54 UTC, end at Tue 2021-08-31 08:32:07 UTC. --
# journalctl | grep --color  "Now running.*cube" | tail -n1
Aug 31 08:32:05 verdin-imx8mm-06827785 9aa3037ebe11[688]: Now running "/usr/lib/aarch64-linux-gnu/qt5/examples/opengl/cube/cube"

Above log shows that it took 11s(08:32:05 - 08:31:54) to launch the application.

Also take a systemd-analyze time reading for reference:

# systemd-analyze time
Startup finished in 3.518s (kernel) + 11.009s (userspace) = 14.528s 
multi-user.target reached after 10.981s in userspace

Now let us see the dependencies of the docker.service

# systemctl cat docker.service | grep "^After\|^Wants\|^Requires"
After=network-online.target docker.socket firewalld.service usermount.service
Wants=network-online.target
Requires=docker.socket usermount.service

Say our application does not require the network, firewall and usermount services and we remove them so that the final dependencies in the docker.service look like this:

# systemctl cat docker.service | grep "^After\|^Wants\|^Requires"
After=docker.socket
Wants=network-online.target
Requires=docker.socket

As we changed a systemd service/unit file, run following to reload systemd units

# sudo systemctl daemon-reload

and reboot the target.

Now journalctl shows 9s(08:45:45 - 08:45:36) which is an improvement of two seconds from before.

# journalctl | head -n1
-- Logs begin at Tue 2021-08-31 08:45:36 UTC, end at Tue 2021-08-31 08:46:03 UTC. --
# journalctl | grep --color  "Now running.*cube" | tail -n1
Aug 31 08:45:45 verdin-imx8mm-06827785 9aa3037ebe11[574]: Now running "/usr/lib/aarch64-linux-gnu/qt5/examples/opengl/cube/cube"

systemd-analyze time also confirms the same

# systemd-analyze time
Startup finished in 3.571s (kernel) + 9.017s (userspace) = 12.588s 
multi-user.target reached after 8.989s in userspace