Combining GaN and Liquid Cooling for Greater Energy Efficiency in AI Data Centers

Author:
By: Paul Wiener, Vice President Strategic Marketing, Julian Styles, Director Application Business Development Both at Infineon Technologies

Date
04/30/2024

 PDF
Realize a solution synergy that naturally reduces power losses and increases ROI

Click image to enlarge

Figure 1: Projected electricity consumption of data centers (2015 – 2030) [6]

The data center industry is at a crossroads, facing unprecedented transformation due to the surge in generative AI and other emerging technologies. This surge has massively increased the power consumption on servers, putting a strain on data centers around the world. Data centers already account for about 2 percent [1] of the global energy demands — demands that are set to increase up to 7 percent by 2030 — according to several models. To keep pace, data centers need to rely on innovations on every level, including cooling solutions, wide-bandgap (WBG) semiconductors such as gallium nitride (GaN), and efficient voltage regulators.

In a moment that silenced a room of over 100 attendees at a recent PowerAmerica conference, Greg Ratcliff, Chief Innovation Officer of Vertiv, delivered a critical revelation. "We’ve been talking about the future of wide-bandgap semiconductors", he said, "but we haven't mentioned that the hardware and power supplies within data centers are going to become liquid-cooled. With this change, the operating temperature of the power electronics will be much lower – and at that point, GaN power semiconductors become much more important to our future".

What do we do about the increasing energy consumption and heat challenges?

Modern GPUs, essential for accelerating AI and ML computational processes, are power hungry and generate a lot of heat. Higher computational loads also lead to higher power density requirements, heating the data centers up even more.

Training an AI model like ChatGPT alone takes up about 10 GWh [2], about as much energy consumed by a thousand US households in a year. This upfront energy cost amounts to only a fraction of its total energy consumption. An average ChatGPT query consumes 50 to 100 times more energy than a comparable Google search. According to Sajjad Moazeni, assistant professor of electrical and computer engineering at the University of Washington, it can amount up to 1 GWh of energy use every day, with hundreds of millions of daily queries. This is comparable to around 33,000 US households, underscoring the intensifying demands on data center resources.

As a result, data center rack power requirements, which remained between 6 and 15 kW before, now require between 20 and 40 kW [3] on average – even skyrocketing towards 200 kW and more. The biggest data centers generate more than 100 MW [4] of energy – enough power to heat entire cities – in fact, some of them already are.

While data center power demands and densities have seen a massive spike, their power usage effectiveness (PUE) has not been able to keep up. The global PUE levels of data centers have remained fairly constant since 2018, a 2023 report [5] finds. If this trend continues, global data center energy needs can go up to 2000 TWh by 2030, an alarming statistic.

In addition, overheating can cause power outages, equipment failure, and contribute to climate change. Hence, cooling is a necessity that can take up about 40 percent of a data center’s power consumption. Traditional air-cooling methods struggle to manage the heat generated by these high-density racks. Air-cooling also needs many additional tools to function, including chillers, air pumps, cables, and humidity control, filtration, and backup systems. This is necessitating operators to switch to more efficient cooling technologies – and one of the best options is liquid cooling. 

Liquid cooling

Liquid conducts heat up to a thousand times better than air [7] and also is able to capture server-produced heat far more efficiently. This results in a significantly lower energy consumption and operating temperature for data centers, resulting in tremendous benefits.

Even a partial (75 percent) transition to liquid cooling can yield up to 27 percent reduction in facility power consumption, found a 2022 study [8].

Liquid cooling infrastructure is also comparatively simpler — it does not need chiller or floor-based coolant distribution units. It also does not require as much power backup as air-cooling. Ironically enough, liquid cooling also ends up using less water than air-cooling. Therefore, adopting liquid-cooling technologies is the most reasonable way to meet the growing heat management and energy-efficiency requirements.

Cold plate liquid cooling is a method of cooling electronic components by directly transferring heat to a liquid coolant circulating through a cold plate attached to the heat-generating device. An efficient way to implement localized cooling, cold plate liquid cooling is anticipated to gain momentum followed by a rapid increase over the next 10 years with a projected compound annual growth rate (CAGR) of 16 percent. This technology allows data center integrators and server suppliers to increase the performance while reducing cooling power requirements because of operations at lower temperatures.

But what after 10 years? Should data centers redo everything all over again in attempts to provide more cooling? What if there was a way to the other direction and reduce heat generation in the first place? To keep up with these demands in the long run, data centers need to address the core of the issue — the power supply units and devices that generate the heat.

The power supply question

To understand the other side of the issue, it is important to consider how power travels to its destination within servers, whether it be the CPU, GPU, or any other critical component. Many servers still rely on AC-DC converters that:

  • Are prone to higher switching losses at higher frequencies
  • Generate heat in high-density environments
  • Offer limited power density

Due to these constraints, traditional converters operate at 90 percent efficiency or less. While this might seem commendable, it translates to a significant loss of 10 percent or more energy within a data center. The repercussions include increased costs, elevated CO2 emissions, and additional waste heat.

Furthermore, with such a massive increase in power consumption, increasing power density is not optional anymore. As data centers aim to increase rack density to accommodate higher energy demands, the efficient use of physical space becomes a crucial factor. The existing Si-based converters — due to their limited power density and higher power losses – end up taking too much space and operating at efficiencies that create too much heat.

GaN enables smaller, highly efficient, and more economical solutions

WBG semiconductors such as GaN, due to their larger electron bandgap, become the optimal choice for the high-voltage, high-density, and high-frequency requirements of current data centers. GaN, in particular, offers unparalleled efficiency at very high switching frequencies, enabling smaller, more compact converters. Higher switching frequencies achieve substantially lower switching losses, leading to improved circuit efficiency.

We are already seeing how GaN is making an impact in data center markets, where server makers are able to increase their storage and data processing capacities by utilizing the extra area freed up by GaN power supplies. Figure 2 demonstrates the impact GaN has on the size and efficiency of a server PSU. It shows a 3 kW PSU with 2.2 times increase in power density, resulting in more power in a smaller form factor and a titanium efficiency level, reducing power losses by 33 percent.

Click image to enlarge

Figure 2: GaN-based PSU compared to Si-based PSU

 

Infineon’s GaN-based CoolGaN™ solutions [9] offer over 99 percent system efficiency for PFC topologies, compared to the 90 percent efficiency of Si-based converters.

As industry pushes to higher power to meet the needs of AI computing, changes in topology are required. Figure 3 illustrates an interleaved PFC, 3-phase resonant LLC, and synchronous rectifier (SR) featuring integrated GaN plus driver solutions on the PFC fast leg of the supply. Infineon CoolMOS™ devices are utilized on the PFC slow leg and Infineon OptiMOS™ on the SR – a very high performing, high-efficiency, and cost-effective solution to power the AI data center servers. With Infineon’s comprehensive set of solutions, viable options also include silicon carbide (SiC) designs to provide power designers the ultimate set of design choices with wide-bandgap semiconductors. 

Click image to enlarge

Figure 3: Block diagram of AI PSU for power levels of 10 kW and higher

 

 

Click image to enlarge

Figure 4: Global data center energy saving potential with CoolGaN™ devices

 

Seeing how even a one percent increase in efficiency can significantly reduce energy consumption for data centers, this equates to massive savings. GaN-based systems can also double the power density of racks, saving a great deal on expensive real estate. As a result, for every 10 racks in a data center, switching to GaN can increase profit by 3 million USD and decrease carbon emissions by 100 tons every year [10].

The ultimate winning combination: liquid cooling + GaN

The key realization in understanding the value of this combination is to recall GaN devices’ temperature coefficient of resistance behavior. As the junction temperature (Tj) of a power device increases, its RDS(on)  increases with it. On-resistance translates to conduction losses, which make up a sizable proportion of the losses in a typical AC-DC converter.

This brings us back to the point Greg Ratcliffe made at PowerAmerica. In an air-cooled data center environment, electronics run hot, and liquid-cooling adoption to bring down the operating temperatures for the server and power supply units is imminent.

As illustrated in Figure 5, at lower operating temperatures, the RDS(on)  of all power semiconductor technologies is similar, so then the dominant factor of PSU efficiency and power density becomes switching losses. At the transistor device level, GaN, with its lower switching losses, is the most efficient at lower operating temperatures amongst all semiconductor technologies. 

Thus, the adoption of liquid cooling with GaN not only reduces power consumption significantly, it also enables power racks to accommodate higher power densities without the risk of overheating.

 

Click image to enlarge

Figure 5: Increasing GaN performance with decreasing operating temperature

 

Conclusion

On the one hand, traditional converters have hit a plateau on how much voltage, power density, and switching frequency they can handle — issues that AI data centers need to address to meet contemporary needs. On the other hand, operators need to lower power consumption by limiting heat generation and improving power efficiency.

As tech giants have been racing to deploy AI technology, it has been creating a surge in energy demand. AI power supplies are growing from 3 kW, 5 kW, and 10 kW up to 30 kW and beyond. This demand will compound the power challenges in data centers, leading operators to explore every option for efficiency, density, and environmental improvements. For this, the solution lies in GaN-based designs.

Combining cold-plate liquid-cooling technologies with GaN, with its undeniable advantage at lower junction temperatures, presents a massive opportunity for data centers to maximize efficiency, address rising power demands, and overcome the challenges posed by the increasing heat-generation of servers.

GaN Predictions 2024 – Download exclusive eBook now!

References:

[1] The green potential of data centers, Infineon Technologies AG. Read more online – click here.

[2] Q&A: UW researcher discusses just how much energy ChatGPT uses, University of Washington, July 27, 2023, https://www.washington.edu/news/2023/07/27/how-much-energy-does-chatgpt-use/

[3] Addressing the Data Center Power Challenge, EEPower, Sep 17, 2023, https://eepower.com/industry-articles/addressing-the-data-center-power-challenge/

[4] Your data could warm you up this winter, here’s how, World Economic Forum, Aug 8, 2022, https://www.weforum.org/agenda/2022/08/sustainable-data-centre-heating/

[5] New Research Reveals Persistent Array of Data Center Industry Challenges, Data Center Frontier, July 2023, https://www.datacenterfrontier.com/colocation/article/33008641/new-research-reveals-persistent-array-of-data-center-industry-challenges

[6] IEA: https://www.iea.org/energy-system/buildings/data-centres-and-data-transmission-networks, 2023 + Infineon assumption and calculation

[7] Liquid cooling vs. air cooling in the data center, TechTarget, May 03, 2022, https://www.techtarget.com/searchdatacenter/feature/Liquid-cooling-vs-air-cooling-in-the-data-center

[8] Power Usage Effectiveness Analysis of a High-Density Air-Liquid Hybrid Cooled Data Center, ASME 2022 International Technical Conference and Exhibition on Packaging and Integration of Electronic and Photonic Microsystems, Oct 25–27, 2022, https://asmedigitalcollection.asme.org/InterPACK/proceedings-abstract/InterPACK2022/86557/V001T01A014/1153400

[9] GaN transistors (GaN HEMTs), Infineon Technologies AG. Visit dedicated website – click here.

[10] GaN: Solving the Dual Challenge of Sustainability and Profitability in the Data Center, Power Systems Design, https://www.powersystemsdesign.com/articles/gan-solving-the-dual-challenge-of-sustainability-and-profitability-in-the-data-center/22/19561

RELATED

 

-->