Microsoft’s Fairwater Datacenters: From Power to Tokens

Microsoft’s next wave of AI datacenters is not about who has the most GPUs. It is about who can energize 300 megawatts on time, keep liquid moving without drama, move tokens across an AI backbone without wasting them, and swap silicon under the hood without customers caring. This is Fairwater in practice. Here is what Microsoft is building, why it looks like this, and how it will succeed or fail on physics, not slides.

What Microsoft is actually building

Fairwater is a two part pattern. One building focused on CPU and storage at roughly 48 MW to handle orchestration, caching, metadata, and spill. Next to it, a dense two story GPU hall around 300 MW where the accelerators live. The floor area runs toward 800,000 square feet per GPU hall. SemiAnalysis estimates a single 300 MW building can host on the order of 150,000 GB200 class GPUs with the network, power, and liquid service to match. They also describe a path to individual buildings at more than 600 MW and campus totals above 2 GW of IT load once fully built. In their words, Microsoft is moving “from energy to tokens,” a framing that puts the cost of power and the cost of idle directly into the cost of output.

“From energy to tokens.” – SemiAnalysis, Microsoft’s AI Strategy Deconstructed

Fairwater is paired with an “AI WAN” that SemiAnalysis pegs at hundreds of terabits per second today with engineering paths to the single digit petabit per second range. That is not a vanity number. It is the evacuation lane when you need to drain a building, refill in a neighbor or a nearby region, and continue training without corrupting checkpoints.

The bottleneck moved: power, shells, and interconnects

Chip scarcity is not the top risk any more. The top risk is electricity and shells. Satya Nadella has said it plainly in public sessions that you can end up with inventory you cannot plug in because the power and buildings are late. That is the constraint now. It translates to a very simple operational truth. The teams that can land high capacity interconnects, transformers, switchgear, substations, and chilled water plants on schedule will beat the teams that only buy GPUs.

Why the design is split into a CPU/storage hall and a GPU hall

Separating the orchestration and storage plane from the accelerator plane reduces blast radius, improves power quality for control systems, and simplifies liquid distribution. The CPU hall provides caching and control near line rate without living in the thermal and acoustic profile of the GPU hall. The GPU hall is allowed to be brutally dense, with service corridors and manifolds built for liquid at scale.

Cooling and liquid at scale

A 300 MW hall forces liquid somewhere in the chain. The questions are where and how much. SemiAnalysis points to builds with large outdoor air cooled chiller fields to hit time to service. That is fast to deploy and easy to maintain, but the longer term line points toward higher share liquid cooling inside the hall. Expect cold plate by default, with immersion used tactically where serviceability and cleanliness can be guaranteed. The tell will be the ratio of roof or yard chiller capacity to inside plate capacity. The faster Microsoft pivots to liquid heavy inside the hall, the more headroom they will have for GB200 follow ons and for non Nvidia engines that have different heat flux profiles.

Thermal stability and soak behavior

Short benchmarks are meaningless at this density. A real workload must run long enough to saturate manifolds, stabilize pump curves, and expose any control oscillation. Stable thermal maps over an hour tell you if a hall is tuned. Sawtooth thermal plots mean chronic over correction and wasted power. The long run telemetry will show whether a site is healthy. Operators that publish nothing here are asking you to trust marketing.

The AI WAN is the evacuation lane

Training at this scale cannot assume a single building is always healthy. A GPU hall at 300 MW is a failure domain unless you partition power, liquid, and networking ruthlessly. The backbone has to support fast evacuation of active jobs and state. SemiAnalysis’ 300 Tb per second number with a path to 10 Pb per second is the level where campus to campus training remains viable without ridiculous stalls. That is why Fairwater looks like a network product as much as a computing site.

Placement, caching, and latency budgets

Not all AI traffic is the same. Training tolerates distance if the pipeline is wide and the cache is sane. Interactive inference does not. The CPU hall will carry hot caches and feature stores for common models. The GPU hall runs the heavy inference or training. The scheduler should prevent silly cross building chatter. When it fails, you will feel it as tail latency.

Scheduling is the real product

At campus scale, a datacenter is a scheduler with buildings attached. A competent scheduler must maximize useful accelerator hours per installed megawatt. It should co locate model shards and datasets to reduce east west traffic, batch intelligently for throughput without starving latency sensitive jobs, and pre position checkpoints so evacuation is a copy, not a rebuild. It should also manage silicon heterogeneity without pain for customers.

Silicon heterogeneity and optionality

Microsoft has three levers. Nvidia, its own Maia, and access to OpenAI’s accelerator IP. SemiAnalysis suggests the OpenAI accelerator trajectory looks strong relative to Microsoft’s first Maia generations. The right strategy is to hide the engine behind Azure SKUs. If the API and SLAs are identical, Microsoft can route jobs to whatever silicon hits the best cost per token per hour and keep user experience stable. The only way this works is mature kernels, mature compilers, and a real catalog of quantized and distilled models with known telemetry. If those pieces lag, swapping engines becomes support load, not margin relief.

From megawatts to tokens: the economics

Cost per token is the only metric that matters once capacity clears. Everything else is a proxy. The variables are energy price, power factor penalties, cooling overhead, accelerator utilization, and rework. A site can lose a cent or two per kWh to a bad PPA or curtailment window and blow its token economics. Idle power is the other poison. If a hall cannot hit a high percentage of lit and useful accelerator hours per installed MW, cost per token drifts up regardless of chip name.

A simple cost intuition

Energy price: every $0.01 per kWh shows up in cost per million tokens. The exact slope depends on the model mix and batch sizes, but the direction is fixed.
Idle power: a 10 to 15 percent idle tax is common when schedulers are timid or when the storage plane cannot feed fast enough. That tax is intolerable at 300 MW.
HBM and packaging: if supply of high stack HBM or CoWoS assembly stutters, bins degrade and energy per token rises because you cannot hit planned clocks.
Cooling overhead: outdoor air cooled fields are fast to build but have worse seasonal efficiency than liquid heavy designs. Expect a migration once shells and teams are stable.

Why permitting and public acceptance now matter as much as fiber

At this scale, a datacenter is a civic conversation. Water rights, noise, traffic, and grid stress attract attention. Sites in colder regions with strong transmission corridors and friendly planning bodies will win timing. Microsoft will lease from third parties where that gets energization sooner and self build where long term control and custom plant justify the project effort. If you see Microsoft pivot to off grid or direct wire for a site, it is an indicator that local grid stability or pricing is not aligned with the campus plan.

Reliability at 300 MW is a different beast

A single hall can run well into nine figures of accelerator count. Reliability has to be granular. That means tight power segmentation with independent UPS islands, liquid segments that can be isolated without draining whole manifolds, and fire and smoke compartments that do not take whole rows offline. The network must support cold and warm migration across the pair and across the campus without rehydrating the world from scratch. The more Microsoft can move toward boring reliability, the better their economics will look. Uptime remains the best marketing.

What success looks like in numbers

Useful accelerator hours per installed MW above 85 percent, sustained over months.
Evacuation time under planned fault lower than the average checkpoint interval for the dominant training runs.
Energy per token and cost per million tokens trending down with each scheduler and kernel release, not just each chip refresh.
Latency distribution for interactive inference that holds flat even under evacuation. P99 is the customer experience. Average means very little.
Transparent packaging and HBM health across cohorts. If rework rates spike, the campus will feel it in output and in liquid plant load.

Risks Microsoft has not eliminated

Grid volatility: curtailment and price spikes can erase margins. On site storage and flexible loads help, but PPAs decide the floor.
Supply chain brittleness: advanced packaging and HBM are still chokepoints. Any slip shows up as yield pain and token cost.
Software maturity: orchestration, compilers, and kernels must keep up with hardware diversity. Otherwise silicon optionality becomes instability.
Cooling transitions: moving from air heavy to liquid heavy without downtime and without leaks in a live 300 MW hall is surgical work.
Public acceptance: the next sites are won or lost in town halls. Transparency on water, noise, and jobs matters.

My read on the pause and restart

SemiAnalysis tracked a large pause in 2024 where Microsoft stopped new pre leases and slowed self build, losing sites to competitors. They then reversed, refreshed the OpenAI deal, and moved to lease and build again at pace. That looks like a risk reset. The core of the plan stayed the same. The order of operations changed. Build where you can energize first. Lease where you can get there faster. Keep silicon optionality. Wrap it all in a network that turns buildings into a single logical pool.

Bottom line

Fairwater is a grown up answer to AI at national scale. It treats power as the scarce resource, liquid as a first class citizen, and the network as the safety net. The silicon badge matters far less than the ability to keep accelerators lit and useful per installed megawatt. If Microsoft can hold that line while it transitions cooling and matures scheduling across heterogeneous silicon, cost per token will fall and SLAs will stay honest. If not, the most expensive problem will be inventory that cannot be plugged in and buildings that do not deliver steady output.

Sources

SemiAnalysis — Microsoft’s AI Strategy Deconstructed: From Energy to Tokens. Select quotes used with attribution.
Microsoft public commentary via CEO Satya Nadella on power and shells; for context see this public session which aligns with the power constraint framing.

Microsoft’s Fairwater Datacenters: From Power to Tokens

What Microsoft is actually building

The bottleneck moved: power, shells, and interconnects

Why the design is split into a CPU/storage hall and a GPU hall