AI Revolution Hub
Posts
21 | Sam Altman: Compute as the Future Currency. But at what cost?💧⚡

21 | Sam Altman: Compute as the Future Currency. But at what cost?💧⚡

Understanding the basics and economics of Compute as a Currency.

Wilson CELY
April 05, 2024

Our Menu :)

Compute as the Future Currency
The Hidden Cost of Training Large Language Models
Understanding how an LLM is Trained and its cost
Fight for the Current Datacenter Capacity

Read time: 8 Minutes

GEN AI PROTECTION

Compute as the Future Currency: OpenAI and Microsoft's Massive Investments

Sam Altman in podcast Lex Fridman

According to Sam Altman, the CEO of OpenAI, and others in the technology industry, computing power is poised to become the most valuable commodity in the future, surpassing traditional currencies like dollars or even Bitcoin.

As artificial intelligence (AI) becomes increasingly prevalent and integral to powering various aspects of the economy, the demand for compute resources is expected to skyrocket.

This realization has led to a compute arms race among tech giants, with OpenAI and Microsoft at the forefront of massive investments in this domain.

The Commoditization of AI Models and the Importance of Compute

While AI models like large language models (LLMs) are becoming commoditized rapidly*, with open-source alternatives matching or surpassing the quality of proprietary models like GPT-4, the true value lies in the computing power required to train and run these models.

*As these models are being open-sourced or replicated by various organizations and research groups, they are transforming from proprietary, highly guarded intellectual property into more commonly available and standardized offerings, akin to a commodity.

Altman recognizes that without owning the chips and infrastructure powering these models, profit margins will continue to shrink over time.

According to Altman, there are four main categories of winners in the AI space:

Chips: The actual GPUs or specialized silicon powering AI models, where Nvidia has emerged as a clear leader.
Infrastructure: Tools for LLM monitoring, content delivery networks (CDNs), data visualization, and agent frameworks.
Models: The AI models themselves, such as ChatGPT, Claude, and Llama, which are becoming increasingly commoditized.
Applications: AI-powered applications built from the ground up or existing applications integrated with AI capabilities.

Project Stargate: OpenAI and Microsoft's $100 Billion Supercomputer

In a major push to secure their position in the AI compute race, Microsoft and OpenAI have announced Project Stargate, a planned $100 billion data center project that will house an AI supercomputer set to launch in 2028. While the term "supercomputer" might be a misnomer, as it's essentially a massive data center, the scale of the investment is staggering, even for tech giants like Microsoft.

The project is part of a five-phase plan by OpenAI and Microsoft to roll out a new generation of computing centers for AI. Stargate represents the fifth and final phase, with Microsoft working on a smaller fourth-phase supercomputer for OpenAI, scheduled for launch around 2026.

The Insatiable Demand for Electricity and Compute

As Altman and Elon Musk have pointed out, the demand for computing power is rapidly outpacing the available electricity supply and infrastructure. Musk predicts that the next shortage after chips and transformers will be electricity itself, as data centers struggle to find enough power to run their AI computing resources.

To address this challenge, the industry is exploring innovative solutions, such as co-locating data centers with nuclear power plants or other large-scale energy sources. This modular and secure approach could help ensure a reliable and abundant supply of electricity to fuel the ever-increasing demand for computing power.

Attracting Top AI Talent with Compute Resources

Beyond the immediate benefits of increased computing power, companies like Meta, Google, and others are investing heavily in acquiring state-of-the-art GPUs and custom chips to gain a competitive edge in attracting top AI researchers and developers.

These highly skilled professionals are naturally drawn to organizations with the most powerful computing resources, as it enables them to push the boundaries of their research and development.

Source: Business Insider

As the AI revolution continues to gather momentum, compute power is emerging as the most critical resource driving innovation and progress in this field.

Tech giants like OpenAI and Microsoft are leading the charge with massive investments in data centers, specialized chips, and infrastructure to secure their position in the computing arms race.

However, this race also raises concerns about the concentration of power and potential negative impacts on the broader economy. As the demand for computing resources continues to soar, innovative solutions for energy generation and distribution will be crucial to sustaining this rapidly evolving landscape.

GEN AI PROTECTION
The Hidden Cost of Training Large Language Models

A few weeks ago we witness the NVIDIA GTC where its CEO presented a new family of chips that offers more compute for training and inferences (running an LLM) of Large Language Models and is less energy junky even if came with a cost in data precision (FP4)

However, as businesses increasingly adopt AI technologies, it's crucial to understand the distinct requirements of AI training workloads and the basis of Datacenter business and cost.

First start with the ugly truth:

Data from Paper ⇒ https://arxiv.org/pdf/2304.03271.pdf

Paper: Making AI Less “Thirsty”: Uncovering and Addressing the Secret Water Footprint of AI Models

Without integrated and inclusive approaches to addressing the global water challenge, nearly half of the world’s population will endure severe water stress by 2030, and roughly one in every four children worldwide will be living in areas subject to extremely high water stress by 2040.

Source: Making AI Less “Thirsty”: Uncovering and Addressing the Secret Water Footprint of AI Models

Freshwater scarcity is a growing global issue, with two-thirds of the world's population already affected by severe water shortages.

By 2030, it is estimated that nearly half of the world's population could face severe water stress. Data centers, which house AI models like GPT-3 and GPT-4, are major consumers of freshwater, in addition to being energy-intensive. In 2021, Google's U.S. data centers alone used 12.7 billion liters of freshwater for cooling, primarily potable water.

This amount of water could have been used to produce millions of cars or electric vehicles. The combined water footprint of U.S. data centers in 2014 was estimated at 626 billion liters.

Despite its significant impact, the water footprint of AI models remains largely unknown to both the AI community and the public. As freshwater scarcity worsens, it’s crucial to address this issue, reflected in commitments to be “Water Positive by 2030” by companies like Google, Microsoft, Meta, and Amazon.

DID YOU KNOW? 🤔

ChatGPT needs to “drink” a 500ml bottle of water for a simple conversation of roughly 20-50 questions and answers, depending on when and where ChatGPT is deployed. While a 500ml bottle of water might not seem too much, the total combined water footprint for inference is still extremely large, considering ChatGPT’s billions of users. All these numbers are likely to increase by multiple times for the newly-launched GPT-4 that has a significantly larger model size. But, up to this point, there has been little public data available to form a reasonable estimate of the water footprint for GPT-4. ³

GEN AI PROTECTION
Understanding how an LLM is Trained and its cost

To simplify it, consider FLOPs* as a measurement of compute power.

Training an LLM ( Large Language Model ):

Training a 175 billion parameter LLM typically requires over 1TB of memory to store the model parameters and intermediate states.
The training process for a 175 billion parameter model with 300 billion tokens in the dataset is estimated to require around 3.15 x 10^23 floating-point operations (FLOPs*) .
To provide the required 12.15 x 10^16 FLOPS of compute power during the one-month training period, it is estimated that around 6,000 NVIDIA H100 GPUs (with 67 TeraFLOPS each) would be needed, accounting for the typical 30% GPU utilization in training workloads5.
Previous LLM models, like GPT-3, were trained using around 10,000 older NVIDIA V100 GPUs5.

Inference of an LLM :
(Inference is the action to execute/run an LLM)

For inference, the GPU memory requirements are lower than training, as the model parameters need to be loaded but not the intermediate training states5.
However, the large size of LLM models, with billions of parameters, still poses a challenge for fitting the entire model on a single GPU's memory4.
To address this, inference is often performed on a cluster of GPUs, where the model parameters are distributed across multiple GPUs and the results are aggregated4.
The use of techniques like key-value caching and quantization can also help reduce the GPU memory requirements for inference.

In summary, training a large 175 billion parameter LLM requires a massive GPU cluster of around 6,000 high-end GPUs, while inference can be performed on a smaller cluster of GPUs by leveraging various optimization techniques

Datacenter Math

AI workloads require massive computing power provided by specialized hardware accelerators like NVIDIA's DGX servers with powerful GPUs. However, these systems consume a lot of electricity, which translates to high operating costs for data centers.

Let's look at the power requirements for a hypothetical data center with 2,560 NVIDIA DGX servers, each containing 8 H100 GPUs:

Each DGX server consumes around 10,200 watts of power when running AI workloads at peak utilization.
Additionally, there are power needs for storage, networking, management servers, etc., bringing the total critical IT power per DGX server to around 11,100 watts or 1,389 watts per GPU.
With 2,560 DGX servers, the total critical IT power required is 28.4 megawatts (MW).
1. For context, a typical nuclear power plant produces around 1,000 MW.
However, the servers don't run at 100% capacity all the time. Assuming an 80% utilization rate, the actual critical IT power consumed is 22.8 MW.
1. The 22.8 MW of actual critical IT power consumed is enough to power around 19,000 typical U.S. households simultaneously.
Data centers also require power for cooling, lighting, and other facilities. Factoring in a power usage effectiveness (PUE) ratio of 1.25, the total data center power consumed is 28.4 MW.
At an average industrial electricity rate of $0.083 per kilowatt-hour, the annual electricity cost for this data center would be around $20.7 million. And it is not enough power to train an LLM

So what are the hyperscalers doing to address this issue?
=> See below

GEN AI PROTECTION
Fight for the Current Datacenter Capacity

The current State-of-the-Art Large Language Model <GPT-4> supposedly used 8k GPUs during 90 days for a total 0f 15MW to complete its pre-training

NVIDIA GTC 2024

AI demand is projected to double the total datacenter critical IT power demand from 49 GW in 2023 to 96 GW by 2026, with 90% of the growth coming from AI-related demand.

This demand is primarily driven by the aggressive plans of major AI clouds to roll out accelerator chips. For example:

OpenAI plans to deploy hundreds of thousands of GPUs in their largest multi-site training cluster, requiring hundreds of megawatts of critical IT power.
Meta plans to have an installed base of 650,000 H100 equivalents by the end of the year, while GPU cloud provider CoreWeave has plans to invest $1.6B in a Plano, Texas facility, implying plans to spend up to 50MW of critical IT power and install 30,000-40,000 GPUs in that facility alone.

Source: StructureSearch

From a supply perspective, sell-side consensus estimates of 3 million GPUs shipped by Nvidia in calendar year 2024 would correspond to over 4,200 MW of datacenter needs, nearly 10% of current global datacenter capacity, just for one year's GPU shipments.

The top global hyperscalers are rapidly ramping up datacenter construction and colocation leasing to meet this demand.

For example, AWS bought a 1000MW nuclear-powered datacenter campus for $650M USD, providing a valuable pipeline of datacenter capacity.

AI demand is projected to significantly increase datacenter critical IT power demand in the coming years, driven by the aggressive plans of major AI clouds to roll out accelerator chips.

To meet this demand, top global hyperscalers are rapidly ramping up datacenter construction and colocation leasing. This presents both opportunities and challenges for businesses in the datacenter and AI industries.

The Environmental Cost of AI

Sasha Luccioni, an AI researcher with over a decade of experience, argues that we should shift our focus to addressing the current impacts of AI and developing tools to ensure future generations of AI models are trustworthy, sustainable, and ethical.

In her paper, she examines the power usage of training the BLOOM model at the Jean Zay computer cluster at IDRIS, a part of France’s CNRS.

Additionally, it provides empirical observations of the relationship of an AI Chip’s TDP to total cluster power usage including storage, networking and other IT equipment, all the way through the actual power draw from the grid.

Training GPT-3 in Microsoft’s state-of-the-art U.S. data centers can directly consume 700,000 liters of clean freshwater (enough for producing 370 BMW cars or 320 Tesla electric vehicles)

It estimates BLOOM's final training emitted around 24.7 tonnes of CO2eq considering just the dynamic power consumption, and 50.5 tonnes when accounting for all processes from equipment manufacturing to operational energy usage.
It analyzes the energy requirements and carbon emissions from deploying BLOOM for real-time inference via an API endpoint.
It compares BLOOM's emissions to other large language models like GPT-3, OPT-175B, and Gopher.
It estimates the total emissions from all experiments run as part of the BigScience workshop at around 124 tonnes of CO2eq.

The trend of "bigger is better" in AI has led to a 2,000-fold increase in the size of large language models over the past five years, exacerbating their environmental impact.

So far we have been only using GPT-3, which is quite small compared to GPT-4. And GPT-5 is around the corner.

Thank you, see you next week!

Reply

or to participate.