Insights

Notes on AI architecture, agent systems, latency, and building production AI products.

Architecture May 12, 2026

The Compute Moat

Anthropic just paid for 220,000 of a competitor's GPUs because their own buildout can't be accelerated. Capacity, not capability, is the 2026 AI frontier.

Architecture May 5, 2026

Three New Knowledge Primitives

RAG was robust because the retrieval decision was removed from the model. Memory, Skills, and agentic search hand it back, and demand more in return.

Architecture Apr 30, 2026

Every Agent Needs a Judge

Putting safety rules in the system prompt fails in production. The pattern that works: a written constitution, a separate judge, and a retry loop.

Architecture Apr 22, 2026

Your Prompts Are Not Portable

A better model can make your system worse. Production prompts fossilize around old failure modes. A more literal model follows them too literally.

Architecture Apr 14, 2026

Build for the Floor, Not the Peak

Every AI system has two performance curves. The peak is the demo. The floor is production. Most teams build for the peak and then spend six months firefighting. The alternative is simple, unglamorous, and what actually ships.

Evaluation Mar 31, 2026

The Evaluation Gap

Building AI systems got dramatically easier. Evaluating them didn't. That asymmetry, and Goodhart's Law, explain why most agent deployments still fail.

Performance Mar 24, 2026

Browser Agents at Human Speed

Most browser agent benchmarks measure whether a task completes. The more useful question is whether it completes faster than a human would — and what it takes to get there.

Engineering Mar 17, 2026

The New Failure Modes in AI-Assisted Development

Amazon's AI coding incidents weren't caused by bad tools. They were caused by development processes that weren't redesigned to match the new risk profile.

Voice AI Feb 27, 2026

Why Voice AI Is Harder Than Chat AI

Voice AI sits at the extreme end of the latency spectrum. The pipeline, the UX, and the failure modes are fundamentally different from chat.

Performance Feb 10, 2026

Reducing Latency in LLM Systems: How We Got an Agent Pipeline Under 500ms

Most agent systems feel slow because they are architected to be slow. The model is rarely the bottleneck anymore, and measuring everything else is where the real gains are.

Production AI Jan 24, 2026

Why AI Demos Fail in Production

The gap between a compelling demo and a reliable production system is where most AI projects die. The problems are predictable, and the fixes are systematic.

Architecture Jan 10, 2026

The Difference Between AI Workflows and AI Agents

Most teams asking for an agent really need a workflow. Autonomy has a cost, and the default should be simpler than you think.

Testing Dec 19, 2025

Testing AI Systems Is Harder Than Testing Software

Traditional testing gives you pass or fail. AI systems live in a probabilistic space where correctness is a spectrum and ground truth itself is unreliable.

Architecture Dec 5, 2025

Beyond RAG: Architectures for Real Knowledge Systems

RAG was a breakthrough, but its limitations are clear. When your knowledge base is complex, dynamic, or multi-modal, you need architectures that go further.