Thermal Analysis | Thermal Modeling is a key part of a thermal design. These days, one can model almost anything with the analysis and modeling tools available in the market. When used correctly, these state-of-the-art analysis tools can give you accurate results quickly and cost-effectively. Gone are the days when thermal engineers labored for weeks, or even months, designing thermal experiments to get limited information on a given layout. Now, one can build a model with a few clicks and see results almost instantly. Many of these thermal analysis tools have made great progress over the years in robustness, accuracy, and ease of use. You don’t have to be a CFD (Computation Fluid Dynamics) specialist to use them.

However, regardless of how sophisticated these analysis tools have become, it is important that we use them properly for the scenario at hand. Otherwise, as is often termed in the industry, it will be “garbage in, garbage out.” Since thermal design is controlled by many variables, there are many things that can go wrong (or specified incorrectly) as well. When that happens with a couple of inputs, the results can be totally invalid, even when everything else is done correctly. Therefore, it is important that thermal engineers follow some key steps meticulously, so their thermal modeling exercise is successful.

Thermal analysis requires a great deal of organization. The end result is the outcome of several activities combined. To begin, the thermal model has to have a physical model built within the tool itself, or imported from a CAD system. The physical model must closely represent the real system we are trying to model. The thermal model must also have numerous properties and boundary conditions assigned to it. These properties and boundary conditions define the nature of the system and its connection with its environment. And then the entire system model must be discretized and analyzed to achieve a stable and accurate solution. Based on the first solution, we refine the model and re-run the model for another solution. The process continues until the design is complete. All of this requires a step by step approach, so the whole process is done correctly.

Beyond the above broad outline, the thermal modeling exercise involves several more steps. At the minimum, they include the following: basic parameters setting, domain sizing, boundary conditions, physical model details, thermal properties, power loading, meshing, and post-processing.

Basic parameters of a thermal analysis model are things like ambient temperature, the variables to be solved (temperature and/or flow), gravity effects, radiation parameters, whether flow is laminar or turbulent, steady or transient, transient settings, etc. These parameters define the nature of the thermal system we would like to model. In conduction-only model, we turn off flow-related specifications. In a highly convective environment with fans and the like, the effects of natural convection (gravity) and radiation may be neglected (and hence turned off). If modeling transient, we need to specify additional parameters, such as time steps, start time and end time.

As part of basic parameters specification, we may also specify how the governing equations are to be solved, such as discretization schemes and relaxation factors. Relaxation factors control the speed by which the solution is achieved. Relaxation factors are necessary because the governing thermo-fluid equations are highly non-linear and cannot be solved in a few steps. Although the default settings are OK in most cases, we may need to adjust the relaxation factors for more complicated cases, so our solution does not diverge.

All of the forms where the basic parameters are specified must be checked properly to make sure that our model or problem is well-defined. Otherwise, we will not get accurate results from our thermal modeling exercise.

Thermal systems can be broadly classifieds into two groups: passively-cooled systems and actively-cooled systems.

In passively-cooled systems, there is no air or fluid mover around, such as fans or blowers. The system is cooled due air or fluid movement as a result of temperature differences with the ambient. This is called natural convection cooling. Here, the amount of heat loss to the ambient is proportional to product of the exposed surface areas of the system and the temperature differences with ambient fluid. In additional to natural convection cooling, passively-cooled systems also lose heat by radiation. Here, heat transfer takes place due to temperature differences between the surfaces of the system and the surroundings (walls, air, etc.). In radiation, as in natural convection, the extent of heat transfer depends on the exposed surface area of the system and its temperature difference with ambient. In passively cooled systems, typically, the effect of radiation can be 30-50 percent, depending on how high the system temperature is relative to the surroundings.

In actively-cooled systems, there is a fluid mover inside or around the system. Fans, blowers, and pumps are all examples of fluid movers. In such systems, the effects of radiation and natural convection are generally small and may be neglected. In modeling actively-cooled systems, it is important that the flow be modeled accurately. We must know whether the flow is predominantly laminar or turbulent, as these two flows are modeled somewhat differently. It is also important that we model physical components in sufficient detail, so flow obstructions and turns are captured accurately, especially in the vicinity of the air or fluid mover.

A thermal model must have a domain within which the analysis must be conducted. The domain should include the key elements of the system that need to be modeled, including the device itself. The domain also determines how the system being modeled interacts with the environment. Therefore, it is critical that we use the right domain size and shape for our model. In general, domain boundaries are chosen so that either a given variable has a fixed value at the boundary, or the spatial change of a variable is close to zero at the boundary (adiabatic or symmetry boundary conditions). Therefore, when we establish a domain, we must ensure that such assumptions are not violated, at least not by a whole lot.

In domain sizing, we must also strike a balance between unnecessarily large domain size and model accuracy. Often, larger domain size means bigger mesh count and longer run time. This can be a problem when you would like to know whether you are on the right track or not quickly, especially in early stages of thermal modeling. Unnecessarily large mesh sizes will also be a problem in transient simulations, where the wait time can a lot more. The seasoned thermal engineer would know the appropriate domain size by experience, typically from prior models. Otherwise, one may conduct sensitivity analysis with 2-3 variations of domain sizes to determine the effect of domain size on key variables.

By model details, we mean the physical details of the system being analyzed. If we are modeling a laptop computer, for example, the model will have the key components such as the enclosure, display, PCB with components, Hard Drive and/or SSD, wireless components, power supply, interface materials, any cooling solutions, etc. Today’s thermal analysis tools can import the physical model in its entirety. However, it important that we exclude minor details from the model so that the mesh size is not too large. In many cases, small pins, protrusions and curves do not matter much in thermal modeling, especially when they are far away from the main heat sources inside the system under consideration and/or when they are away from the main airflow paths.

When building a model, a special attention must be given to areas in and around major heat sources, such as PCB and IC packages. The model must have the right details in these areas to capture the expected large gradients of the key variables in these areas. The larger the gradients, the finer the mesh should be. The physical system may be modeled suspended in air or sitting on (or next to) some surface with the appropriate gap or boundary conditions. Whatever components we include, our assumptions must be realistic and the boundary conditions consistent with the real situation we are trying to analyze.

Thermal properties include variables like thermal conductivity, specific heat and density. In steady-state analysis, thermal conductivity is the main variable to consider. In transient analysis, density and specific heat will also be important, in addition to thermal conductivity. All components in the thermal model must be assigned the right thermal properties. These properties govern how heat is transferred from one component to the other, and ultimately to the environment. Poor choice of thermal properties will, therefore, lead to poor results as well.

In some cases, a thermal property of a component or space may vary spatially. It may also depend on another variable, such as temperature. In those cases, we may define the appropriate profile of that property and assign it to the components in question. In some cases, such as in PCB modeling, we may need to break the component itself into layers or parts, so we can specify more exact properties for each part or layer.

Electronic devices become hot because heat is generated within some components. This heat comes from the power each component draws from the power supply. Almost all power consumed by a given component within an electronic system is converted into heat. Therefore, it is very important that we know the exact power load on every component in our system. Our solution is as good as the accuracy of these inputs. In case of uncertainty, it is customary to err on the side of more conservative inputs. At times, we may also establish plots or contours for a range of power loads to see scenarios for different loads.

In addition to the overall power loads, it is also important to know the exact location where each power load is being dissipated. In IC Packages, for example, the power consumed by the die is not distributed throughout the die. Rather, much of the power comes from a few regions within the die. In general, the power is dissipated on one side or surface of the die. And within that surface, there are areas where power dissipation is high, such as logic areas, and there are areas where the power density is low. So, it is important that we specify the exact power load based on location. In today’s thermal analysis tools, these are relatively easy things to specify, provided one has the right data.

Thermal analysis is conducted by digitizing the entire model domain into small areas called mesh elements or cells. Essentially, we break up the entire model domain into thousands of small volumetric cells. Within each cell, typically, we assume an average value for each variable such as temperature. The variables are supposed to vary between neighboring cells according to certain assumed function and governed by partial differential equations. The partial differential equations are, in turn, influenced by the thermal properties we discussed above, such as thermal conductivity, density, specific heat, etc.

In thermal modeling, when it comes to meshing, one thing is critical: mesh refinement. In general, a model will have areas where the mesh is fine, and areas where the mesh is coarse. We need a fine mesh in areas where the changes in variables (gradients) are high, and a coarse mesh in areas where the variations are low. This is because, using large cells, we cannot capture rapidly changing variables in a given space or time. In general, we should have much finer mesh close to solid objects or surfaces, as these areas are likely to have high gradients of variables.

Mesh refinement is easily handled in modern computational tools. It is usually a matter of just specifying some values and growth metrics in a few forms. We may also embed one mesh cluster within another to take care of components with highly disparate sizes, e.g. IC Packages on PCB. The mesh lines do not have to conform at the boundary since almost all modern analysis tools have non-conformal meshing capability. In non-conformal mesh, one cell can interface with two or more cells in the same direction. The values at such cells are determined based on the appropriate interpolation of its neighboring cell values.

When the meshing is done, it is always a good idea to examine the mesh on planes and surfaces, so the mesh looks consistent with our expectations. Any areas where the mesh needs improvement must be addressed promptly, including areas where the mesh is too coarse or too fine, cells are distorted, elements have bad aspect ratio (very long on one side and short on the other), etc.

Thermal engineers use various mesh refinement levels in their analysis. For quick runs, to get ballpark estimates or to know the general trends quickly, one may use coarse meshes. For final results, on the other hand, we may use fine meshes. Run times are directly proportional to the number of mesh elements in a model. Thus, whereas smaller models may take minutes to run, models with tens of million of mesh elements may take days or even weeks to finish on a modern server.

The last step in thermal modeling is what is called post-processing. When a model finishes its run, it is time to check the solution. With today’s state-of-the-art thermal analysis tools, there are numerous ways to display results. One can display temperature or any other variable contours on a point, plane, or surface. We may also build derivatives or functions of variables and display them on points, planes and surfaces.There is no limit to how much we can slice and dice the thermal solution. In transient simulations, we can also make animations to show how the thermal profile of a system develops over time, which is especially useful to explain things to the uninitiated, including upper management and general customers.

While examining thermal modeling results, in general, it is a good practice to view the results critically. It is very easy to get carried away with pretty pictures at times and take their accuracy for granted. This can be a big mistake, especially in the early stages of thermal modeling. Therefore, we must check our solution for consistency and against expectations to make sure that our solution is not erroneous.

In thermal modeling, one run is rarely sufficient to get the end results we want. Typically, we go through a series of runs for a given model. In subsequent runs, we adjust basic parameters, mesh refinement level, domain sizes and run times, until we are satisfied with the results.

There are several thermal analysis software packages in the market today. Some are more comprehensive than others. However, the main general-purpose thermal analysis tools widely used in the electronics industry today are Icepak, from Ansys Inc., and FloTHERM, from Mentor. Any of these two packages can be used to do pretty much any flow and thermal analysis tasks you may have. Both have come a long way from what they used to be several years ago to what they are today in accuracy, ease of use and automation.

In addition to the above two general purpose thermal analysis tools, there are also other tools especially geared to solve certain aspects of thermal modeling, such as pure conduction or phase change, or to serve niche industries, such as data centers and power plants. Almost all CAD solid modeling packages have some aspect of thermal analysis capability built inside them.