The Compute Moat
Anthropic just paid for 220,000 of a competitor's GPUs because their own buildout can't be accelerated. Capacity, not capability, is the 2026 AI frontier.
Notes on AI architecture, agent systems, latency, and building production AI products.
Anthropic just paid for 220,000 of a competitor's GPUs because their own buildout can't be accelerated. Capacity, not capability, is the 2026 AI frontier.
RAG was robust because the retrieval decision was removed from the model. Memory, Skills, and agentic search hand it back, and demand more in return.
Putting safety rules in the system prompt fails in production. The pattern that works: a written constitution, a separate judge, and a retry loop.
A better model can make your system worse. Production prompts fossilize around old failure modes. A more literal model follows them too literally.
Every AI system has two performance curves. The peak is the demo. The floor is production. Most teams build for the peak and then spend six months firefighting. The alternative is simple, unglamorous, and what actually ships.
Building AI systems got dramatically easier. Evaluating them didn't. That asymmetry, and Goodhart's Law, explain why most agent deployments still fail.
Most browser agent benchmarks measure whether a task completes. The more useful question is whether it completes faster than a human would — and what it takes to get there.
Amazon's AI coding incidents weren't caused by bad tools. They were caused by development processes that weren't redesigned to match the new risk profile.
Voice AI sits at the extreme end of the latency spectrum. The pipeline, the UX, and the failure modes are fundamentally different from chat.
Most agent systems feel slow because they are architected to be slow. The model is rarely the bottleneck anymore, and measuring everything else is where the real gains are.
The gap between a compelling demo and a reliable production system is where most AI projects die. The problems are predictable, and the fixes are systematic.
Most teams asking for an agent really need a workflow. Autonomy has a cost, and the default should be simpler than you think.
Traditional testing gives you pass or fail. AI systems live in a probabilistic space where correctness is a spectrum and ground truth itself is unreliable.
RAG was a breakthrough, but its limitations are clear. When your knowledge base is complex, dynamic, or multi-modal, you need architectures that go further.