Back to home

Insights

Notes on AI architecture, agent systems, latency, and building production AI products.

Architecture

Open Weights, Closed Door

I have a 5090 and can't run the best open model. The ones smart enough for serious work need a datacenter, and renting one rarely beats the API.

Architecture

One Model, Three Front Doors

Yesterday a product I work on went down in the Anthropic outage. The cheapest redundancy is the same model on a second host, and it costs almost nothing.

Architecture

A Bigger Window Is Not a Bigger Memory

Z.ai shipped a million-token context window this week with zero benchmarks. A window is a claim, not a capability. How to measure the one that matters.

Architecture

The Gap in Your Pocket

On Chatbot Arena your phone's AI looks like it kept up with the frontier. On a real reasoning test, the gap is widening. Here is why, and what it means.

Architecture

Fewer Tools, Better Agents

A Vercel team deleted most of its agent's tools and it got more reliable, on the same model. The lever is how many tools the model sees when it picks.

Architecture

Most AI Pricing Is Broken

Cursor's pricing change cost it more trust than money. Most SaaS pricing assumptions break under AI's variable cost-of-goods. A taxonomy of what's working.

Architecture

Most Agent Failures Are Boring

An AI agent ran a real Stockholm cafeteria. It failed in mundane, recognizable ways. Each maps to architectural patterns we already know how to build.

Architecture

The Compute Moat

Anthropic just paid for 220,000 of a competitor's GPUs because their own buildout can't be accelerated. Capacity, not capability, is the 2026 AI frontier.

Architecture

Three New Knowledge Primitives

RAG was robust because the retrieval decision was removed from the model. Memory, Skills, and agentic search hand it back, and demand more in return.

Architecture

Every Agent Needs a Judge

Putting safety rules in the system prompt fails in production. The pattern that works: a written constitution, a separate judge, and a retry loop.

Architecture

Your Prompts Are Not Portable

A better model can make your system worse. Production prompts fossilize around old failure modes. A more literal model follows them too literally.

Architecture

Build for the Floor, Not the Peak

Every AI system has two performance curves. The peak is the demo. The floor is production. Most teams build for the peak and then spend six months firefighting. The alternative is simple, unglamorous, and what actually ships.

Evaluation

The Evaluation Gap

Building AI systems got dramatically easier. Evaluating them didn't. That asymmetry, and Goodhart's Law, explain why most agent deployments still fail.

Performance

Browser Agents at Human Speed

Most browser agent benchmarks measure whether a task completes. The more useful question is whether it completes faster than a human would — and what it takes to get there.

Engineering

The New Failure Modes in AI-Assisted Development

Amazon's AI coding incidents weren't caused by bad tools. They were caused by development processes that weren't redesigned to match the new risk profile.

Voice AI

Why Voice AI Is Harder Than Chat AI

Voice AI sits at the extreme end of the latency spectrum. The pipeline, the UX, and the failure modes are fundamentally different from chat.

Performance

Reducing Latency in LLM Systems: How We Got an Agent Pipeline Under 500ms

Most agent systems feel slow because they are architected to be slow. The model is rarely the bottleneck anymore, and measuring everything else is where the real gains are.

Production AI

Why AI Demos Fail in Production

The gap between a compelling demo and a reliable production system is where most AI projects die. The problems are predictable, and the fixes are systematic.

Architecture

The Difference Between AI Workflows and AI Agents

Most teams asking for an agent really need a workflow. Autonomy has a cost, and the default should be simpler than you think.

Testing

Testing AI Systems Is Harder Than Testing Software

Traditional testing gives you pass or fail. AI systems live in a probabilistic space where correctness is a spectrum and ground truth itself is unreliable.

Architecture

Beyond RAG: Architectures for Real Knowledge Systems

RAG was a breakthrough, but its limitations are clear. When your knowledge base is complex, dynamic, or multi-modal, you need architectures that go further.