AI

Power-hungry AI chips face a reckoning, as chipmakers promise ‘efficiency’

Nvidia’s newest megachip, Blackwell, is by all accounts a modern-day miracle. It has 200 billion transistors and promises enough processing power to handle the largest AI models when thousands of these GPUs are ganged together in a mega data center.

But Blackwell and other powerful accelerator chips coming to market are making people nervous-- especially data center operators and electric utilities, even regulators around the globe. One version of a single Blackwell chip for data center use draws 1,200 watts of electricity, an insane amount of power compared to just a few years ago. Largely as a result of accelerator chip growth, some data centers are building their own power plants to handle the load while regulators in Amsterdam and other cities in Europe are telling data centers they cannot expand due to limited electric supply.

It’s not just Nvidia’s GPUs that are gargantuan. Blackwell is part of a trend ranging across all chip design firms. Even hyperscalers and carmakers like Tesla are designing their own custom chips, often pushing the laws of physics to increase energy efficiency with 3D designs and chiplets. Tesla’s Dojo chip has 25 chiplets. These chip design approaches are helping increase power efficiency, but data centers meanwhile are still growing to support AI, including GenAI. Currently, 1.5 % to 2% of the world’s electricity is used by data centers and the vast majority of that energy is used by chips and circuit boards that support them. Growth in data center energy consumption is a hockey stick.

“The trend is not sustainable”

“The chip industry has been on a trend that’s not sustainable,” said long-time chip industry insider Henri Richard, president of Rapidus in the Americas. The company is erecting a 2nm process node chip fab in northern Japan with billions in support from the Japanese government.

“Years ago, we were saying you can’t go up to 150 watts, and now we’re at 1,200 watts! Something needs to change. If you think about taking that growth curve and projecting into the future, we just can’t have 3 kilowatt chips,“ Richard said in an interview with Fierce Electronics from his US office in Santa Clara, Calif.

Shrinking chip process nodes from 10nm to 5nm to 2n is part of the solution, he said. With Moore’s Law’s decreasing benefits, however, ”there’s a need to architect the systems and chips in a different way that deals with the concentration of power and deals with the amount of cooling you can do,” he added. “Even immersion cooling makes it hard to feed the chips with electricity. Chiplets will be one way to balance between the front end and back end.”

RELATED: Japan’s Rapidus fab to compete with TSMC, Samsung for 2nm

In a blog that woke up some elements of the AI-fixated world, Arm CEO Rene Haas wrote recently about future AI workloads becoming larger, pressing the need for more compute and more power. “Finding ways to reduce the power requirements for these large data centers is paramount to achieving the societal breakthroughs and realizing the AI promise,” he said. “In other words, no electricity, no AI.”  

RELATED: Arm sounds alarm over power-hungry AI chips; cites Neoverse CPU value

What data centers face with power consuming chips

In a data center with thousands of Blackwell chips and other processors, the electricity load becomes enormous, sending engineers scurrying to find available power in locales where there isn’t enough juice readily available, even with the help of renewables from solar, wind, hydroelectric or geothermal. Once there is enough power pumped to developable land in an area like Loudon County, Va., west of Washington, D.C., the anxiety is compounded over what happens inside dozens of hot server racks. Engineers are proposing new ways to keep the circuit boards and chips cool enough to keep from catching fire or melting down, causing a catastrophe for vital data, expensive equipment and corporate bottom lines.

An entire industry has emerged to cool data centers to guard against the heat generated by servers and their power-hungry chips. Liquid cooling of server racks has become an art form; one of the latest approaches is immersion of entire data centers, prompting the delicate proposition of how a data center connects electricity underwater with humans around.  Meanwhile, hyperscalers are planning ways to build small nuclear reactors or other power generators near their data center hubs to ensure a reliable and plentiful energy supply.

Investors are going bonkers for more power for data centers: OpenAI CEO Sam Altman just invested $20 million in Exowatt, an energy startup focusing on AI data centers. Keeping chips cool enough to operate optimally also may require air-cooling technology that gulps down more power, amplifying the problem. Even so, as a rule of thumb, half the electricity needed by a data center goes to light up the processors--from GPUs to CPUs to NPUs, and whatever becomes the next chip TLA . Related circuits and boards raise the energy draw.

Nvidia’s Jensen Huang defines the long view for AI accelerators

Nvidia CEO Jensen Huang and many other semiconductor leaders justify, perhaps rightly so, the power mongering of modern accelerator chips like Blackwell when matched against the enormous compute power of AI and GenAI and the impact such technologies will have on future generations of companies and customers with the creation of new pharmaceuticals, climate analysis, autonomous vehicles and robots and more.   He and his engineering teams speak often about the Laws of Physics and recognize what metals and other materials and chip architectures can distribute heat generated from electricity traversing a server rack, and then, across acres of server racks.

Modern chip designs from Nvidia, Intel, AMD, Qualcomm, cloud providers and a growing army of smaller design firms are adding an enormous density to circuit boards so that servers and server racks take up less floor space, while cranking out many times more teraflops per server than just a year ago. Performance per watt metric is usually expressed as TFLOPS/watt to make it easy to compare systems and chips from different vendors.

Huang’s CadenceLIVE discourse on longitudinality

Huang talked about this density and its related power draw at CadenceLIVE Silicon Valley in April, speaking in the abstract about how this computing density is justified by the advantages of AI across an entire population of users.  “Remember, you design a chip once, but you ship it a trillion times,” he said in a fireside chat. “You design a data center once, but you save 6% power…that is enjoyed by a billion people.” Huang was, of course, speaking about the entire ecosystem, far beyond the wattage of a single Blackwell or other GPU used in a broader category of accelerated computing. He took a few sentences to make his point, but it is worth a read:

“The power usage of accelerated computing is incredibly high because the computers are incredibly dense,” Huang said. “Whatever optimization we can do for power utilization translates directly into more performance, measures as more productivity, generating revenue or directly into savings. For the same amount of performance you could get something smaller. Power management in accelerated computing directly translates into all the things you care about.

“Accelerated computing took tens of thousands of general purpose servers and consumed 10x, 20x more cost and 20x, 30x more energy and reduced it into something that is incredibly dense. So the density of accelerated computing is the reason why people will think it’s power hungry and costs a lot money. But if you look at from an ISO [an international standard] of work done or throughput, in fact you save an enormous amount of money. That’s the reason why it is essential as CPU scaling has slowed that we have to move towards accelerated computing because you’re not going to continue to scale out that traditional way anyways.  Accelerated computing is essential.”

Later in the same conversation with Cadence CEO Anirudh Devgan, Huang added: “AI actually helps people save energy…How would we have found 6% more savings [in one example from Cadence] or 10x more savings that wasn’t possible without AI? So you invest in the training of the model once and then millions of engineers can benefit from it and billions of people across decades will get to enjoy the savings.

“That’s the way to think about cost and investments, not just on an instance-by-instance basis but, in healthcare speak, longitudinally. You have to … look at money savings, energy savings longitudinally, across the entire span of not just the products you are building but the way you are designing the products, the products you build and the impact of the products being felt. When you look at it longitudinally like that, AI is going to be utterly transformative in helping us with climate change, using less power, being more energy efficient and so on.”

Voices outside of Nvidia

Other luminaries than Huang in chip design and production of chips have also recently weighed in. TSMC CEO CC Wei in the company’s latest earnings call put it this way: “Almost all the AI innovators are working with TSMC to address the insatiable AI-related demand for energy-efficient computing power.”  The key word: “insatiable.”

Cadence CEO Devgan noted in his onstage conversation with Huang that AI models can have 1 trillion parameters, which compares to 100 trillion synapses, or connections, in the human brain. He projected that it is only a matter of time before somebody builds an AI model that is very big, on the order of the human brain. Doing so will require “a huge amount of software compute, the whole data search infrastructure and the whole energy infrastructure,” he said.

Cadence makes and supports a number of ways to improve designs for energy efficiency for accelerators (which Nvidia used to develop Blackwell) and has developed a digital twin system to help data centers design their operations more efficiently.

Over at AMD, the company has set a goal of delivering an increase of 30x in the energy efficiency of its product by 2025, based on a 2020 baseline of accelerated compute node. Last year’s introduction of the MI300X accelerator put the company even closer to that goal. A blog posted last year by AMD’s Sam Naffziger, senior vice president and product technology architect, describes the progress.

Naffziger warned that the industry can’t rely solely on smaller transistors, and needs a holistic design perspective that includes packaging, architecture, memory, software and more. 

Intel’s neuromorphic push

Intel has also made an aggressive push into energy efficiency, most recently announcing it has built the world’s largest neuromorphic system to enable sustainable AI. Code-named Hala Point, it uses Intel’s Loihi 2 processor and can support up to 20 qauadrillion operations per second, rivaling GPUs and CPUs.  Its application is clearly for research so far.

for research purposes

Intel’s description of Hala Point claims the entire system consumes a maximum of 2,600 watts of power, little more than double that of Nvidia;s Blackwell: “Hala Point packages 1,152 Loihi 2 processors produced on Intel 4 process node in a six-rack-unit data center chassis the size of a microwave oven. The system supports up to 1.15 billion neurons and 128 billion synapses distributed over 140,544 neuromorphic processing cores, consuming a maximum of 2,600 watts of power. It also includes over 2,300 embedded x86 processors for ancillary computations.”

Jennifer Huffstetler, chief product sustainability officer at Intel, told Fierce Electronics via email, “Intel is looking at future computing technologies as a solution for AI workloads, namely neuromorphic, that promise to deliver greater computing performance at much lower power consumption. Computing demands are only increasing, especially with new AI workloads. To deliver on the performance desired, the power consumption of GPUs and CPUs is also increasing.”

Intel already has a three-pronged approach to greater efficiency that includes optimization of AI models, software and hardware. With hardware, Intel innovations have saved 1000 terawatt hours from 2010-2020, Huffstetler estimated. Gaudi accelerators provide about a doubling in energy efficiency while Xeon Scalable processors provide a 2.2x increase in energy efficiency.  (Xeons are designed for data center, edge and workstation workloads.) The upcoming Gaudi 3 accelerators deliver 50% on average better inference and 40% average better inference power efficiency, she claimed. Intel is also in the liquid cooling business, which can provide a 30% improvement in energy savings over air cooling inside the data center.

Yes, greater 'efficiency,' but….

Despite all the efforts of major chip designers, the power dilemma is still real. Yes, a data center might have fewer racks with the latest accelerators, resulting in lower power draw, but growth in AI means companies will only seek to expand compute capabilities—more servers, more racks, more energy suck. “Newer chips have more performance per watt, yes, but the AI models are also growing, so it’s not clear that the overall requirement for power is going down all that much,” said Jack Gold, founding analyst at J. Gold Associates.

While Blackwell in the GB200 form factor with liquid cooled racks sucks down 1200 watts per chip, Gold noted that a typical AI chip uses just half-- 650 watts of power. He tallied up the energy draw this way: Add in memory, interconnect and a CPU controller, and that figure can jump to 1 kilowatt for each module. In the recent example of Meta, which at one point deployed 10,000 modules (with many more to come), that amount alone would require 10 megawatts of power. A city the size of Cleveland with 3 million people uses about 5,000 megawatts, so in essence a single data center of that Meta size would take 2% of the city’s power. A typical power plant might generate about 500 megawatts.

“The bottom line is that AI data centers are indeed [facing problems] in trying to find areas where there is enough power and power that is low cost enough to provide for their needed consumption,” Gold said. The cost of power is the single biggest expense in a data center after the capital cost for equipment.

Bob O’Donnell, founding analyst at Technalysis, said he somewhat understands Huang’s “longitudinal” argument in favor of power consumption for AI chips laid out at the Cadence event. “Accelerator chips do take more power, but in the long run have more positive benefits for the environment, pharma and other areas because of all you learn,” he told Fierce. “They are extraordinarily dense, but compared to other options they are more power efficient.“

“The summary is that power for AI chips is getting a huge amount of focus and attention by a lot of different players. It’s not going to be solved or go away with an enormous demand for more power. But the capabilities of GenAI are so great that people feel a need to pursue it.”