Search by Tags

Thermal Management

 

Compare with Revision




Subscribe for this article updates

Introduction

Thermal management is the area concerned with making sure that the system and its components operate within defined temperature ranges, in order to guarantee the reliable operation of the whole system. It involves concepts such as power consumption, heat dissipation and system temperature, which are related to the following topics:

  • Hardware: the heat generation is directly proportional to the power consumption of the hardware. In addition, each different component has its own operating temperature ranges and limits. Usually, one is concerned with the System on Chip (SoC) heat dissipation and the (System on Module) SoM operating temperature ranges. Optimal design of the carrier board may help reduce power consumption of the system, this may or may not be reflected on a significant heat reduction.
  • Software: the power consumption - and thus the heat generation - of the system and its components is affected by the software load put on the hardware components, such as CPU cores and GPU, and the use of peripherals. It is directly related to the use-case application but also affected by the BSP version and its fine tuning.
  • Enclosing: the mechanical enclosing of the system - a box, for instance - must be taken into account when it comes to thermal management. A low-power SoM may fit a constrained enclosing, whereas a high-performance system may require a well-designed airflow or ventilation scheme for proper operation.
  • Environment: heat flux intensity depends on a difference of temperature, which in our case is the difference between the hardware - usually the SoC in this case - and the environment. Knowing the environment temperature range where the system will operate is important for designing a thermal management solution that both satisfies the system requirements and uses the least complex cooling mechanism possible.

This article provides an overview of thermal management solutions applied to Toradex system SoMs. It goes through the hardware and software BSP specifics as well as cooling solutions.

Hardware

This section goes through the hardware specifics related to thermal management.

Note: Additional information can be found in the respective Toradex SoMs datasheets, under the Thermal Specification section. The datasheets are available in the respective product pages, under the Datasheets tab.

SoC (CPU) Limits

The following table provides the maximum junction temperature for a specific SoC. Notice that this is the maximum temperature at the semiconductor level, measured by a sensor internal to the SoC die. This temperature is higher than the die/case temperature and is monitored by the underlying operating system and/or additional hardware mechanisms. If temperature throttling mechanisms fail to keep the SoC from reaching the junction temperature, a forced system shutdown is issued to prevent permanent damage.

Browse the drop-down table below for information about maximum junction temperature sorted by SoC:

Maximum Junction Temperature by SoC

SoM Limits

The operating temperature range of a SoM is limited by its components. It may happen that the most critical component regarding temperature limits may not be the SoC, but instead another component. The dropdown tables below list the operating temperature range by Toradex SoM:

Apalis Family

Maximum Operating Temperature by Apalis SoM

Colibri Family

Maximum Operating Temperature by Colibri SoM

Software and BSP

This section goes through the software and BSP specifics related to thermal management.

Dynamic Voltage and Frequency Scaling (DVFS) and Thermal Throttling

Dynamic Voltage and Frequency Scaling is a mechanism in which the operating system optimizes power consumption by adjusting the CPU clocks and voltage based on demand. A side-effect of power consumption optimization is that less heat is generated for system workloads that don't make full use of the CPU.

Thermal Throttling is a mechanism implemented in the operating system to preserve the integrity of the processor. It forces reduction of the system clock when it reaches certain temperatures, independent of DVFS.

The following dropdown table lists the availability of DVFS and thermal throttling for Toradex SoMs and operating systems:

Note: DVFS is disabled by default on WinCE. Please see DVFS on Windows Embedded Compact for further information.

DVFS and thermal throttling availability

Linux

DVFS can be disabled and the CPU frequency manually set. See the CPU Frequency (Linux) article.

Temperature can be monitored from userspace. How to read it and which sensors are available is module-dependent. See the Temperature Sensor (Linux) article and Apalis/Colibri T30 Temperature Monitoring for additional information.

Thermal throttling is executed by the Linux kernel and it can be accessed via generic thermal sysfs. The section below provides information about how to set temperature trip points in Linux:

How to Set Temperature Trip Points

iMX SoCs

There are two temperature trip points used on iMX SoC's.

passive This is the point where Linux starts to throttle the CPU.

critical This is the point where Linux shuts itself down in order to protect the CPU.

Toradex decided to use the T_junction_max stated in the datasheet for the critical temperature and 10°C less for the passive temperature.

The following patch should be a guideline how to change these trip-points for iMX related SoC's:

diff --git a/drivers/thermal/imx_thermal.c b/drivers/thermal/imx_thermal.c
index 28072a7..591d6be 100644
--- a/drivers/thermal/imx_thermal.c
+++ b/drivers/thermal/imx_thermal.c
@@ -656,10 +656,10 @@ static int imx_get_sensor_data(struct platform_device *pdev)
    }
 
    /*
-    * Set the critical trip point at 5C under max
+    * Set the critical trip point at max
     * Set the passive trip point at 10C under max (can change via sysfs)
     */
-  data->temp_critical = data->temp_max + (1000 * 10);
+  data->temp_critical = data->temp_max;
    data->temp_passive = data->temp_max - (1000 * 10);
 
    return 0;

data->temp_max in this driver is used for the T_junction_max that is read out from the fuses.

data->temp_passive and data->temp_critical are the temperatures described above that should be set with the desired temperature in milli-degree Celsius.

WinCE

DVFS and temperature throttling settings can be customized. See the Resource Manager Registry Settings and Apalis/Colibri iMX6 DVFS on Windows Embedded Compact articles. As an alternative to tweaking resources, as well as a means to monitor system frequency and temperature, Toradex provides a software tool named Toradex Task Manager. Notice that DVFS is supported from WinCE Image 1.3b4 onwards.

For additional temperature monitoring information, see Apalis/Colibri T30 Temperature Monitoring and SoC Temperature Readout (WinCE).

CPU Hotplug

It may be possible to enable/disable CPU cores dynamically if both the SoC and operating system support it, which saves power thus generating less heat.

The following dropdown table lists the availability of CPU hotplug for a specific Toradex SoM and operating system:

CPU Hotplug Support

Linux

See the article CPU (Linux) for detailed information on supported modules.

WinCE

See the article Resource Manager Registry Settings for detailed information on supported modules.

Tips

This section has some tips on how to save power, which may help reduce heat generation, and other aspects of software that may affect thermal management.

  • If peak performance is required for short duration, heat dissipation is not a matter of concern because of the advanced power management.
  • Cooling solutions may optimize system performance.
  • Colling solutions can be passive or active.
  • When full CPU / Graphics performance is required for a longer period, it is highly recommended to test the system thermal behavior in the given condition.
  • Always refer to the Thermal Specification section in the respective module datasheet.
  • Thermal throttling configuration, also referred to as temperature trip points, can be adjusted in the BSP.

Note: We recommend you to measure the power consumption of the system, before and after making the changes. It helps in getting a better understanding of the power management of the system.

Linux

  • Disable unused Display Interfaces

  • Use a Lower Frequency

    • See the CPU Frequency (Linux) article to change the CPU frequency to test system performance and power consumption.
  • Avoid Toggling Pins

    • Make sure none of the pins are unnecessarily toggling. Also, make sure all input pins are on a defined state. The GPIO Tool will be helpful in testing and the Device Tree Customization helps on tweaking the SoC pins configuration for device tree enabled modules.
  • Use Low Power Modes

    • Enter Suspend mode during idle time or even consider switching off the module completely. See the Suspend/Resume (Linux) article for reference.
  • Check CPU Load

    • Linux has many tools to monitor CPU load, such as top, htop, etc. If the load is unexpectedly high then check the application software. Some easy modifications may help to lower the CPU load. e.g Use interrupts instead of polling, sleep instead of busy waits, etc.
  • Disable unused Drivers

    • For this step you should measure the power consumption, in some cases disabling drivers may have a negative impact on power consumption. For such purpose, you may have to recompile the Linux kernel and modules. See the article Build U-Boot and Linux Kernel from Source Code for reference.

WinCE

  • Disable unused Display Interfaces

    • The Tegra modules have three display interfaces, make sure only you only enable the used display interface and disable the unused ones. For example: if only the LCD interface is used. Please use the Tegra specific registry settings as mentioned in this article.
BootupStyle = 1
HDMIHotplugBehavior = 2
  • Use a Lower Frequency

    • One can use the Toradex Task Manager to change the CPU frequency to test system performance and power consumption.
  • Avoid Toggling Pins

    • Make sure none of the pins are unnecessarily toggling. Also, make sure all input pins are on a defined voltage level. The GPIO Config tool will be helpful in testing.
  • Use Low Power Modes

    • Enter Suspend mode during idle time or even switch off the module completely. Toradex WinCE images offer fast boot (in some cases boot time is less than 0.5 Seconds).
  • Check CPU Load

    • Use Toradex Task Manager to check the CPU Load. If the load is unexpectedly high then check the application software. Some easy modifications may help to lower the CPU load. e.g Use interrupts instead of polling, sleep instead of busy waits, etc.
  • Disable unused Drivers

    • For this step you should measure the power consumption, in some cases disabling drivers may have a negative impact on power consumption. To learn more about how to disable drivers, please refer to this article.

Cooling Solutions

Colling solutions usually target the SoC and can be either passive or active. Passive means that the natural convection is used to transport the heat from the surface to the air and includes simply having the SoC exposed to the environment or using a heatsink. The efficiency of natural convection is dependent on the housings and the environment. This solution has no moving parts and does not produce noise. If the passive cooling is not sufficient, the most common active cooling solution for embedded systems is the use of a DC fan on top of the heat sink. This increases efficiency dramatically.

If the hardware is enclosed in a box or similar, often the design of the box is thought for optimized air flow, the box itself is thermally coupled to the SoC or both. The temperature inside the enclosing has to respect the SoM operating temperature range.

Colibri

The Colibri family of SoMs does not have a cooling solution officially provided by Toradex. Nevertheless, we have tested a few off-the-shelf heatsink solutions available in the market with Colibri T20 and T30 modules. For more details, please refer to the following test reports:

Apalis

The Apalis family has a robust, rigid mounting mechanism to support thermal solutions. It is ready-to-use on Toradex carrier boards and, if you plan to design your own carrier board, the thermal solution implementation guidelines are available in the Apalis Carrier Board Design Guide.

The optimized Apalis Heatsink is available for each version of the Toradex Apalis module. The following table shows the compatibility of the available Apalis heatsinks:

Apalis Heatsink Type Compatible Module
Type 1 Apalis iMX6Q IT
Apalis iMX6D IT
Type 2 Apalis T30
Type 3 Apalis iMX6Q
Apalis iMX6D
Apalis TK1
Type 4 Apalis iMX8QM

The Apalis heatsink has four holes intended for mounting a fan on top of it. Specifics are available in the Apalis Heatsink Fan article. In addition, a 3D CAD model of the heatsink is provided in the 3D CAD models page.


  • Apalis Heatsink

    Apalis Carrier Boards Heatsink

Legacy Information

Colibri PXAxxx

Colibri PXAxx modules run at a fixed frequency. Toradex provides ways to manually change the system frequency in order to tweak or optimize the system performance using software configurations. In most of the use cases, a cooling solution should not be required. The maximum temperature is limited by the case temperature of the PXA processor which must not exceed 85°C. For more details, please refer to the respective Colibri module datasheet and Marvell’s EMTS.


  1. No WinCE available ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎

  2. Single core processor ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎