Danny Clavette, Director, Systems Applications, Infineon Technologies
Humans are smart, achieving wisdom, experience and intelligence through years of learning and data accumulation. Computers seem smart due to their ability to retain information but until recently they lacked the capability to autonomously learn in order to perform tasks or make decisions. While a human brain consumes 20 to 30 W of power, the latest learning systems are consuming power at levels that would support a small town as they learn to become artificially intelligent. The requirements for powering this new generation of supercomputer have changed dramatically.
The high power consumption of today's AI is driving changes in processor types and the associated computing architecture in an effort to reduce power needs. Traditional Central Processing Units (CPUs) are designed to be very flexible and able to support a wide variety of programs, yet AI learning repeats relatively mundane tasks many times over.
Most AI functions can be performed by Graphics Processing Units (GPUs) that are designed to repeatedly perform complex math very efficiently. These GPUs can be paralleled to further increase computing power. Modern GPUs process data considerably faster while using the same power as a comparable CPU. NVIDIA dominated the early AI market; their DX1 GPU super computer contains eight Tesla P100 GPUs, each capable of 21.2 TeraFLOPs, and requires 3200 W of total system power. Parallel DX1s will form an effective neural network.
Moving beyond GPUs, Tensor Processing Units (TPUs) are ASICs that have been developed specifically for AI learning. Based on GPUs, reduced floating-point accuracies and the removal of rasterization and texture mapping further improve computation efficiency.
The ability to sense is central to the learning process. Connected to central AI servers via high-speed wireless connections, these low power sensors are the eyes, ears and hands of the neural network. By 2020, it is predicted that there will be over 50 billion network-connected sensors.
AI Brings Challenges for Power System Designers
In order to approach the processing power of a human brain, various researchers have suggested that an AI system must perform somewhere in the region of 40 thousand trillion operations per second (or 40 PetaFLOPS). A typical server farm with this level of AI computing power would require approximately 1800 NVIDIA DX1s, consuming nearly 6 Megawatts of power. The human brain requires only 20 W of power to perform the same tasks.
Delivering and managing megawatts of power is challenging enough. These days, efficiency is paramount as energy costs are rising. Additionally, every watt of energy dissipated requires more air conditioning in the datacenter, further increasing setup and operational costs.
Click image to enlarge
Figure 2. Modern datacenters require advanced power solutions
As datacenters can contain thousands of processing units, size is important. Yet, smaller sizes rapidly increases power density requirements and reduces the surface area available for dissipating heat, making thermal management one of the most significant challenges in designing power for this new generation of AI super computers.
Computing systems are not static loads. During learning they run at full power, but the power requirement drops in line with processor activity. Modern power standards mean efficiency is required to remain high throughout the power band, requiring that today's multi-phase power solutions often include a provision for dynamically controlling the number of phases used.
Digital vs Analog Control
Clearly, power solution sophistication will increase and, as a result, Infineon has introduced products with advanced digital control techniques, replacing the legacy analog-based solutions.
Digital control increases overall system flexibility and adaptability when designing high-end power solutions. A digital approach allows controllers to be customized without costly and time-consuming silicon spins and simplifies designing and building the scalable power solutions required for AI. Even with all of the included functionality and precision delivery of power, digital solutions are now price competitive with the analog solutions they will replace.
Click image to enlarge
Figure 3. Infineon has a broad and integrated DC-DC power portfolio suitable for AI
Multi-Rail & Multiphase Digital Controllers
Central to Infineon's offering for AI-enabled servers is a complete product family of multi-rail / multiphase digital controllers. These advanced controllers are compliant with Intel and AMD requirements as well as supporting PMBUS with AVS (Adaptive Voltage Scaling) for voltage set-point control and system telemetry.
Infineon solutions can be programmed to provide one, two, or three fully digitally controlled voltage rails with multiple phases. A range of doubling ICs and drivers can further increase phase count, if required.
Programmable autonomous phase addition or shedding ensures high efficiency across a wide load range. The digitally programmable load line eliminates external load line setting components. PID loop compensation and digital temperature compensation can also be programmed.
Digital control enables non-linear control algorithms and provides excellent transient response with reduced output capacitance. Most Infineon controllers also support programmable cycle-by-cycle per phase current limit for superior dynamic current limiting.
These devices are easily configurable using Infineon's optimized Graphical User Interface (GUI) tools. Configuration settings can be stored in the controller’s on-chip NVRAM.
Infineon controllers have fault detection and protection built-in including IUVP, IOVP, CFP, OUVP and OOVP (Input Undervoltage Protection, Input Overvoltage Protection, Catastrophic Fault Protection, Output Undervoltage Protection and Output Overvoltage Protection). Over Current Protection (OCP) is provided as an instantaneous value, averaged for total current, by channel as well as pulse-to-pulse. There are also several Over Temperature Protection (OTP) thresholds for thermal protection.
IFX’s latest multiphase digital controller IR35219 is the only 10-phase controller in the market that supports up to 600 A load current. It is a dual-loop controller with flexible phase configuration and is available in in small 48-pin 6 x 6 mm package. Besides having the standard IFX digital controller features, the IR35219 has built-in phase fault detection and protection features that allow multiphase VR to continue operation even with a failed phase. This means it can be used in redundant VR design for mission-critical applications.
OptiMOS Power Stages
To deliver the power density required by AI-enabled servers, Infineon has developed high efficiency, high power density power stages. The power stages contain a low quiescent current synchronous buck gate-driver IC, high-side and low-side MOSFETs and a Schottky diode in the same package to further improve efficiency. The package is optimized for PCB layout, heat transfer, driver/MOSFET control timing and minimal switch node ringing when layout guidelines are followed. The paired gate driver and MOSFET combination enables higher efficiency at lower output voltages required by cutting edge CPU, GPU and DDR memory designs.
The TDA21472 70A power stage’s internal MOSFET current sense algorithm with temperature compensation achieves superior current sense accuracy versus best-in-class controller based inductor DCR sense methods. Protection includes cycle-by-cycle OCP with programmable threshold, VCC/VDRV UVLO protection, phase fault detection, IC temperature reporting and thermal shutdown.
The power stages also feature a deep-sleep power saving mode, which greatly reduces the power consumption when the multiphase system is disabled.
Operation of up to 1.5 MHz switching frequency enables high performance transient response, allowing miniaturization of output inductors, as well as input and output capacitors while maintaining industry-leading efficiency.
When combined with Infineon’s digital controllers, the TDA21472 power stage incorporates the Body-Braking feature through PWM tri-state that enables reduction of output capacitors. This quickly disables both internal MOSFETs in order to enhance transient performance or provide a high impedance output. The power stage is optimized for processor core and memory power delivery in server applications.
Click image to enlarge
Figure 4. Typical size and layout for a 10-phase VR solution with a TDA21472 power stage