Yesterday afternoon a product I work on went down during a demo. It went down for a boring reason: Anthropic's API was erroring, Opus 4.8 was named on the status page, and I had wired the product to exactly one provider with no way around it.
The outage was real and broad. On June 23, Anthropic's first-party stack, claude.ai and the API and the Console and Claude Code, logged repeated incidents through the day. The afternoon one was the big one: elevated errors across multiple models, about 85 minutes at its worst. One of roughly ten disruptions in twelve days. Anthropic has not published a root cause, so I will not guess at one. What I know is narrower and more useful: the thing the product depended on returned errors for an hour, and I had built no way around it.
A model call is infrastructure now
When your product's core function is a model call, that call is infrastructure. Same as your database. Same as your payment processor. You would not run a single database with no replica and no failover and call it production. Plenty of teams run their model layer exactly that way, on one provider, because for two years it mostly worked.
The trap is that it mostly works. I have argued before that multi-provider is a reliability requirement, not a strategy choice. Yesterday I paid for not taking my own advice.
The question to put to your team is one sentence. Does anything we ship depend on a single model provider with no fallback? If the answer is yes, the rest of this is the fix.
The cheapest redundancy is the same model on a second host
Here is the part most failover discussions get wrong. They reach for a different model. If Claude is down, fall back to GPT or Gemini. That works. But it is the expensive kind of redundancy, because a different model is a different system. Different failure modes, different formatting, different refusals. You have to re-run your evals against the fallback to know it still does the job, and most teams never do, so the fallback is a guess.
There is a cheaper kind. The same Claude model you are already using runs in more than one place. Opus 4.8 runs on Anthropic's own API, inside Amazon's cloud through Bedrock, and inside Google's cloud through Vertex AI. Same weights. You may already pay one of those companies for something else.
The fallback is not a different system you have to re-qualify. It is the identical model behind a different front door, and its behavior is close enough that your prompts and outputs port without rework. That is the one case where failover is nearly free. Smoke-test the fallback path once, because version pinning and defaults can drift between platforms, and then you are done.
Failover is only real if your prompts port. Same model on a different host is the case where they trivially do.
The stacks are separate
The reason this helps is that the serving stacks do not share a building. Per Anthropic's own documentation, a Bedrock request goes to the AWS inference endpoint and a Vertex request goes to a Google Cloud endpoint. Neither touches api.anthropic.com. A failure in Anthropic's first-party serving does not propagate to the Bedrock or Vertex endpoints. Yesterday's incident was scoped to Anthropic's own surfaces. Bedrock and Vertex were not part of it.
The honest version is less clean than the marketing one. I cannot show you a green Bedrock status page stamped at the exact minute the product was erroring, because nobody publishes that. What I can say is that the serving and control planes are independent by design, and this incident lived entirely on one of them.
What the insurance costs
Almost nothing. That is the surprising part.
On latency, the gap is small and runs both ways. For the heaviest reasoning models, Anthropic's direct API leads: Opus 4.8 reaches first token a few seconds sooner than the same model on Bedrock. For the smaller models the difference is inside the noise, and on some of them Vertex or Bedrock is faster. So you keep Anthropic direct as your primary for the speed and fail over when it errors. On the happy path you give up nothing. On the failover path you may eat a few seconds of first-token latency on the heavy models, which is a fine trade against returning an error.
On uptime, the trade runs the other way. In your favor. A second host on a written guarantee turns most of a working day of downtime into about two hours a quarter. Concretely: AWS Bedrock promises 99.9 percent uptime in writing and pays you service credits when it misses. Anthropic's standard API makes no per-request promise, and its measured 90-day uptime at the time of the outage was around 99.1 to 99.4 percent.
The shape of it
Primary plus failover, behind a gateway.
Route to Anthropic direct first. When a call fails, retry the same request against Bedrock or Vertex with the model id swapped. Fail over on the errors another host can actually fix: 5xx, 429, a 529 "Overloaded," and timeouts. Not on a 4xx, which will just fail the same way twice. Cancel the in-flight call before you retry, so you do not fire a tool call twice or pay for two completions.
A 529 is Anthropic telling you it is out of capacity, and Datadog's 2026 survey puts roughly 60 percent of production AI failures at capacity, not bugs. That is the class a second host absorbs cleanly. A market-wide surge for the model is the harder case, and I get to it below.
You do not have to build the router. AWS ships a reference implementation that does Anthropic-primary with Bedrock fallback, retries, and backoff. A gateway like LiteLLM or Portkey gives you provider fallback lists and health-aware routing in front of every call. The cost is pay-per-use: you only spend on the second provider when the first one fails. If you already run on AWS or Google Cloud, the second host is a config change, not a new vendor relationship.
The one piece of real friction is that it is the same model in a different envelope. The model id differs per platform, Bedrock wants its own version header, and the auth is AWS signing on one side and Google credentials on the other. A naive proxy that forwards Anthropic's headers straight to Bedrock will break the request. It is an afternoon of plumbing, not a project.
The honest ceiling
Same-model failover is the cheapest layer, not the whole building. It does not save you from everything.
It does not save you from a bug in the model layer that ships to all three hosts at once. It does not save you from a market-wide demand surge that strains every provider serving Claude at the same time. And it does not save you from a cloud region going down: if your Bedrock fallback lives only in us-east-1, the next big AWS regional outage takes your fallback with your primary.
It also does not save you from your own untested config. A fallback you never exercise is not redundancy, it is a guess: a stale model id, an expired credential, a region where you never enabled the model, any of which you find during the outage instead of before it. And a fallback whose quota you never load-tested can brown out the moment you pour your full traffic onto it. So send a little traffic down the fallback path continuously, and find the breakage on a Tuesday, not during the next incident.
Real redundancy is cross-region and, past a point, cross-cloud. This is the first layer, not the last.
I did not have it yesterday. I will by next week. If your product makes a model call on its critical path and that call has exactly one address, you are one status-page incident away from the afternoon I just had.
A model call is infrastructure. The cheapest redundancy you can buy is the identical model on a second host.