AI

Musk could spend $9B on Blackwell GPUs, as GenAI gold rush pushes chips, data centers skyward

GenAI has created a race to the top--not only in terms of more compute-capable GPUs from Nvidia and others, but also in terms of the monstrous electricity-appetite required by such accelerators in data centers. Also, there’s the plain-old upfront price tag to purchase GPUs from chip designers being assessed by heavyweight customers.

For example, Elon Musk on X just suggested he would “probably” buy about 300,000 Nvidia Blackwell GPUs for use at xAI by next summer.  That would amount to a $9 billion price tag at the lower end of Blackwells expected to go for $30,000 apiece.

“There is a huge race going on to find and fund the acquisition of ever more powerful AI chips,” remarked Jack Gold, founder and analyst at J. Gold Associates. “People often forget we are still pretty early on in the AI lifecycle. Right now it’s primarily a game for those who can afford to pay for the processing costs.”

GPU news from Computex by Nvidia and AMD

To complicate matters, at Computex on Sunday vendors announced next-gen GPUs with greater compute capability and power efficiency. Nvidia CEO Jensen Huang announced the Rubin platform to succeed Blackwell, just announced three months prior at GTC. Rubin will have new GPUs and a new Arm-based CPU, Vera, alongside advanced networking.

Also at Computex, AMD CEO Lisa Su laid out plans for a roadmap for its AMD Instinct accelerators, with a new AMD Instinct MI325X accelerator for fourth quarter 2024. In 2025, an Instinct MI350 series is expect based on AMD CDNA 4 architecture.

The announcements by both companies focus on an annual cadence of products supporting AI training and inference (in different ways, obviously).

While the promises of better performance are staggering and energy consumption is still somewhat unclear (although electricity demands from data centers have threatened available power generation capabilities in many parts of the world) there is already plenty of speculation over the enormous pricetag for the latest accelerators.

In a show of investment prowess, Musk wrote on X on Sunday that his xAI investment next summer would “probably” be about 300,000 B200s with CX8 networking. At the low-end of the Huang’s previously-mentioned cost of Blackwell GPUs at $30,000 that would put the total cost to xAI at $9 billion. 

just text from x

While Musk didn’t exactly “pledge” to spend around $9 billion for the B200s for xAI, as stated by WCCFtech, the website gets credit for reporting the Musk tweet early.   The website noted that the $9 billion figure would be half of xAI’s net worth, based on an $18 billion valuation in a new funding round.

In April, Technology Brother on X posted a chart depicting Meta’s plan to use 350,000 Nvidia H100s for data center needs, far ahead of other HPC and cloud users mentioned in the chart.  Musk quickly asserted the chart was not accurate and should have put Tesla and X/xAI as second and third behind Meta.

On Sunday, Musk also revealed that xAI will have 100,000 Nvidia H100s in a liquid-cooled training cluster to “be online in a few months.”

Whether Musk spends $9 billion on Blackwells or talks Nvidia down to $5 billion or $1 billion, there is little question Nvidia and Huang need many partners to further the goal to “Accelerate Everything,” a theme Huang mentioned at Computex.  Nvidia is clearly on a track to find partners that are investors or customers that can use Nvidia GPUs as  competition to efforts by major cloud providers to build their own customer GPUs. (OpenAI is serious about building its own chips, as Dylan Patel at SemiAnalysis has reported.) Perhaps Nvidia would be willing to cut its costs to work with a well-financed relative newcomer like xAI and Musk.

Nvidia and xAI have not commented (aside from tweets), but the reality of Nvidia’s need for customers and investors is understood by Gartner analysts and others. In one example, CoreWeave has grown its data center footprint with $12 billion in backing and has Nvidia as an investor. “Nvidia is likely using CoreWeave as a hedge against the hyperscalers, who are all large Nvidia customers but also potential competitors in that all three have some version of their own AI silicon,” AvidThink analyst Roy Chua told Fierce Network.

RELATED:  CoreWeave stocks GPU fire with $8.6B war chest

Chua told Fierce Electronics on Monday that he’s been following the friendly exchanges in the press and on social between Huang and Musk. “Jensen’s a smart leader and very adept at maintaining relationships and drumming up support for more GPU purchases,” Chua added. A deal between Nvidia and xAI for Blackwell GPUs would be “mutually beneficial—Musk gets allocation of in-demand B200s and Huang gets billion of dollars in revenue along with additional bragging rights for Nvidia,” Chua said.

“Musk needs more computing ASAP for xAI since OpenAI/Microsoft and other are ahead in terms of access to GPU computing resources…Musk’s xAI will need plenty of Nvidia GPUs for the foreseeable future,” Chua added.

Power and TCO for GPUs is on everybody’s minds

Beyond the astronomical cost of everything for GenAI, and AI more generally, Huang has asserted that “accelerated computing is sustainable computing” and noted at Computex that GPUs and CPUs can deliver up to a 100x speedup for AI and other tasks while increasing power consumption by 3x. That would be 25x more performance per watt over CPUs.

In case it wasn’t clear that Huang is still a salesman acutely aware of TCO for data center users everywhere, he added, “The more you buy, the more you save.”

Even Musk has demonstrated an understanding of the power dilemma, which becomes less dramatic as new-generation accelerators emerge with greater power efficiency. He remarked in his Sunday tweet, which was a reaction to an online poll asking who would likely build the first 1 million H100 GPU data center: “Given the pact of technology improvement, it’s not worth sinking 1GW of power into H100s.”  (H100s use up to 700 watts apiece, while Blackwells use up to 1,200 watts apiece.)

Nvidia’s view is that GenAI will go far beyond chatbots, and already has in many ways.  At GTC, Huang talked about using GenAI as directions for robots of all types, including true self-driving vehicles, which offers an insight into why Meta or xAI (and many others) would care about investing multiple billions into next-gen accelerators that can take chatbots to realms just emerging.

What’s the future of AI and data centers?

“People often forget we are still pretty early on in the AI lifecycle,” Gold said. “We need to see more efficient chips but also more efficient algorithms/models, an issue that doesn’t get enough focus. If you’re going to be use AI to tell a car or robot what to do, and scale to thousands and millions of them, you can’t do that simply by building out massive data centers with 30,000 general purpose GPUs at 1,000 watts each. It will take a full suite of products at various price and performance points and locations.”

Gold expects a three-to-five-year progression where AI workloads become more distributed and capabilities get better with more power/performance efficiency. “And where the number of massive data centers grow, but slowly,” he added.