Back to Insights
Architecture

The Compute Moat

By Mathijs Boezer

Anthropic published a $30 billion revenue run rate this week.

The same week, they paid for 220,000 of someone else's GPUs.

Code with Claude on May 6 shipped no new model. The headline announcement was a deal with SpaceX for all of Colossus 1: a 300 MW NVIDIA cluster in Memphis, originally built by xAI, reassigned through SpaceX's merger with xAI three months earlier. The capacity comes online within the month. It funds doubled Claude Code rate limits and an order-of-magnitude jump in API Tier 1 limits.

Anthropic already had roughly $380 billion in pre-existing compute commitments across AWS Trainium, Google TPUs, Microsoft Azure, and Fluidstack. They still needed an idle NVIDIA cluster on a one-month timeline. From their loudest competitor.

One deal is anecdote. What makes it signal is the world's third-best-funded lab still ran short despite that. The frontier moved. Not to a new model. To a new constraint.

The Inversion

For most of the past three years, the AI competitive question was capability. Whose model scored higher. Whose context window was longer. Whose reasoning held up at low effort. Capability was the moat.

The moat moved.

Sundar Pichai, on the Q1 2026 Alphabet earnings call: "Obviously, we are compute constrained in the near-term... cloud revenue would have been higher if we were able to meet that demand." Google Cloud grew 63% to over $20 billion, on a $460 billion backlog. The admission is striking because Google designs and fabricates its own silicon. Captive supply did not save them.

Dario Amodei at Code with Claude: "We tried to plan very well for a world of 10x growth per year. And yet we saw 80x. And so that is the reason we have had difficulties with compute."

Microsoft disclosed an $80 billion backlog of Azure orders it cannot fill in the near term, attributed in their Q1 commentary to data center capacity, power, and grid lead times.

Jensen Huang at GTC 2026 projected $1 trillion in compute infrastructure through 2027, double his prediction from the prior year, and added: "In fact, we are going to be short."

The capability gap closed faster than the capacity gap. Anthropic willingly partnering with xAI's parent company is the cleanest proof of that. The GPUs in Memphis work the same for Claude as they did for Grok. Compute is fungible. Capability isn't quite, but it is closer than the press releases imply.

The Numbers

Combined hyperscaler AI capex in 2026 is somewhere between $660 and $690 billion. Roughly 75 percent of that is estimated to be AI infrastructure specifically.

NVIDIA's top four customers now account for 61 percent of data center revenue. A year ago that figure was 36 percent. Hyperscaler concentration is increasing, not decreasing.

The training-versus-inference split inverted. In 2023, inference was about a third of compute spend. By 2026, it is two thirds. Some forecasts push toward 70 to 90 percent as agentic workflows scale. An agent that calls itself or its tools fifty times per request consumes fifty times the inference a single chat would.

Token pricing has been falling for three years. That ended at the frontier. GPT-5.5, released April 2026, costs five dollars per million input tokens and thirty per million output. GPT-5 cost $1.25 and $10. A new tier opened above it: GPT-5.5 Pro at $30 and $180. Some of this is tier inflation, but the per-token sticker for the newest frontier tier has stopped falling, which is the change.

The 90 percent inference price drop story is alive at the commodity end. At the frontier ceiling, prices reversed up in 2026.

The Deeper Constraint

Chips are not the bottleneck anymore. Power is.

The median wait time for a new US power project that reaches commercial operation is five years. PJM, the largest US grid operator, has gone from under two years of interconnect queue in 2008 to over eight years in 2025. Per Berkeley Lab, there are over 2 terawatts of projects sitting in queues, and only 13 percent of interconnect requests from 2000 to 2019 ever reached commercial operation.

The SpaceX deal makes sense in that light. AWS Trainium 3 and Google's next TPU buildouts are multi-year. Anthropic needed capacity within the month. The only large NVIDIA cluster sitting at scale and not fully booked was the one in Memphis, on a grid connection xAI had bridged with on-site gas turbines (a setup that drew local environmental opposition). That is what was available.

The cynical reading is that Colossus 1 was underutilized post-merger capacity and Anthropic was the only buyer with the urgency to pay. That reading has merit. It also reinforces the thesis: when an idle 300 MW cluster appears, the frontier lab pounces on it.

Hyperscalers are responding by becoming utilities. Microsoft signed a 20-year, 835 MW PPA to restart Three Mile Island. Amazon committed to 1,920 MW from Talen's Susquehanna nuclear plant through 2042. Meta signed a 1,121 MW, 20-year deal with Constellation's Clinton plant. Google has agreements with Kairos Power for small modular reactors, with the first SMR online by 2030 and a 500 MW fleet by 2035, plus a 615 MW Duane Arnold restart with NextEra.

This is moat construction. The GPU buy is rented capacity. The 20-year PPA is owned position. Anthropic has not announced a long-duration power deal. It has hired the energy team, and the hiring suggests one is coming.

The IEA projects data center electricity demand will roughly double by 2030, from 485 TWh to around 950 TWh. McKinsey's forecast for AI workloads alone is 156 GW of capacity by 2030. The US is currently building toward roughly 95 GW. The math does not balance.

The Counter-Argument

Capacity will not be the binding constraint forever.

Hardware efficiency keeps compounding. NVIDIA says Blackwell Ultra delivers up to 50 times better performance per megawatt than Hopper. Rubin, shipping in volume in late 2026, promises up to 10 times higher inference throughput per megawatt than Blackwell on MoE workloads. Two generations together is roughly 100 times more inference for the same wall power.

Architecture is compounding too. The active-to-total parameter ratio in mixture-of-experts models keeps falling. DeepSeek-V3 activates 5.5 percent of its parameters. Qwen3 Next activates 3.75 percent. The same capability runs on less compute every quarter.

The strongest argument against this entire piece is DeepSeek. DeepSeek trained V3's final pretraining run for around $5.6 million in GPU compute, roughly an order of magnitude below comparable frontier spend. Total program cost including R&D, prior experiments, and infrastructure was substantially higher, and the $5.6 million number is the cleanest one to cite carefully. But if frontier-class capability is reproducible at sub-$10 million in pretraining compute, the capacity arms race is a story about scaling laws that are already breaking.

The bet behind this article is that those laws hold for another 24 months. Capex commitments and PPA durations are 15 to 20 years. The next 24 months still shape who has what during the relevant window.

If DeepSeek-style efficiency compounds faster than capex depreciates, the central claim has a 24-month half-life rather than a structural one. That is the honest framing.

For Builders

Three things change in the AI architectures I review when capacity, not capability, is the binding constraint.

First: cost-per-inference becomes a gating metric, not a P1 optimization. A feature whose unit economics depend on Opus or GPT-5.5 Pro pricing has to clear a real ROI threshold against the same feature on a floor-class model before it ships. Most teams I work with ship the latter.

Second: reserved-throughput contracts move from procurement to architecture. Provisioned throughput on Bedrock, dedicated capacity on Azure, committed-use TPUs on Google Cloud are no longer cost-optimization levers. They are availability levers. Pay-per-token pricing assumes there is a token to buy.

Third: multi-vendor is no longer a strategy choice. It is a reliability requirement. Datadog's 2026 State of AI Engineering report says 69 percent of organizations run three or more models. About 5 percent of AI requests fail in production. Roughly 60 percent of those failures are capacity limits, not bugs.

The architectural patterns follow. Gateway in front of every model call (LiteLLM, Portkey, OpenRouter, or a thin internal one). Multi-provider failover wired in from day one, which is only real if your prompts port between providers. Exponential backoff with jitter. Circuit breakers when a provider degrades. Sync path reserved for interactive work, batch and async for everything else. Cost-per-inference as a P0 metric, monitored continuously.

Closing

The frontier in AI is no longer a model number.

Anthropic just paid for a competitor's data center because their own multi-gigawatt buildout cannot be accelerated. Microsoft is carrying $80 billion in unfilled orders not because the chips do not exist, but because the substations and transformers do not.

A smarter model is a month away. A megawatt is a decade away.

That is the race for the next 24 months.