Why a Multi-Model Strategy Is Now Mission Critical

15 Jun

A couple of months ago, in The Open Model Moment, we made the case that concentration risk is the most underappreciated exposure in enterprise AI. The argument was a simple one. If your business depends on one model, one API, one provider's roadmap, then a decision made somewhere you have no say over can undermine your operating moat overnight. At the time, that was a thesis. As of last Friday, it is a case study.

What Happened to Fable 5

On 9 June, Anthropic released Claude Fable 5, the first publicly available model in its new Mythos-class tier, and made it free to Pro, Max and Enterprise users. On the evening of 12 June, three days later, the company disabled it. Not through a deprecation cycle with notice periods and migration guides, but immediately, for every customer in every country.

The reason was not commercial. The US government issued an export control directive, citing national security, that restricted access to Fable 5 and the more capable Mythos 5 to US persons only. Because a provider cannot establish a user's nationality in real time, the only way Anthropic could comply was to withdraw both models from its entire customer base. Every other Claude model stayed available. The two strongest ones did not. Anthropic has said publicly that it disagrees with the order, regards it as a misunderstanding, and is working to restore access. All of that is fair, and none of it helped the organisations that had those models in production on Thursday. As an aside, I gave Fable 5 the task of reskinning the UI of our AI launchpad Pathway. Needless to say, it was delicious. So I hope it's made available again soon!

Concentration Risk Has a New Dimension

We have tended to frame concentration risk in commercial terms: a price rise, a deprecated model, a provider closing off the integration your product was built around. That version of the risk is real, and we have written about it before. What the Fable 5 episode exposes is a second dimension that is considerably harder to design around.

When a model's availability is governed by the export control regime of the country it was trained in, your AI capability stops being a vendor decision and becomes a function of another government's policy. For a UK enterprise (or any enterprise operating outside the US), that should land fairly heavily. What could be the most capable tool in your stack can be taken off the table not because your provider changed its commercial mind, but because a government you do not vote for decided you now sit on the wrong side of a line.

Which Makes the Timing of Lumen Sovereign Look Rather Prescient

In the same week, a coalition of UK organisations, among them Babcock International, BT, Lloyds Banking Group, LSEG, NatWest, PwC, Thales UK and Telefónica Tech, signed up with the British firm Cosine to design Lumen Sovereign, a frontier model trained entirely in the UK, on the Isambard-AI supercomputer, under the government's £500m Sovereign AI programme. It is built to run inside a customer's own infrastructure with no external data transfer, and is targeted for deployment readiness by the end of the year.

Cosine's framing is direct. Enterprises are recognising the risk of being wholly dependent on foreign providers, and that dependency brings security, cost and continuity exposure together. You need not assume that Lumen Sovereign will match the frontier labs on raw capability to understand why eight institutions of that scale put their names to it. They are not buying capability. They are buying control, and specifically the assurance that the systems their critical workflows rely on cannot be withdrawn by a directive issued in another capital.

The same reasoning is driving sovereign model investment across Europe, the Gulf and Asia. For a long time, "use the best model" effectively meant "use the best American model". That assumption is now being unpicked, and the more thoughtful enterprises are getting ahead of it rather than waiting to be caught out.

What a Sensible Stack Looks Like Now

The conclusion is the one we keep returning to, with the emphasis shifted. A multi-model strategy is no longer only an engineering discipline. It is a resilience question, and resilience has acquired a geographic component.

A robust stack should be spread across more than one provider, which most teams now accept, but also across more than one jurisdiction, which far fewer have considered. Frontier models from the US labs for the work where they genuinely earn their place. Open weight models, hosted on infrastructure you control, that cannot be switched off remotely by anyone. And, increasingly, sovereign or regional models for the workloads where continuity and data residency are not negotiable.

A distributed stack is only as good as your confidence that you can actually move between the parts of it. Having three models available is not the same as knowing that the workflow running on one of them will hold up on another. That confidence does not arrive for free. It comes from a deliberate testing and validation approach, a way of proving that your use cases, your skills and your agentic workflows perform to an acceptable standard on each model you might fall back on, before you are forced to.

This matters more with every month that passes, because AI is no longer sitting alongside the business process as a convenient assistant. It is becoming part of the process itself, embedded in how the work actually gets done. Once a model is doing real work inside a regulated workflow, "good enough" stops being a matter of opinion and becomes something you have to define and measure. That means acceptance thresholds for accuracy, latency and cost, and regression testing so that a model update, or a forced switch to an alternative, does not degrade an output that someone downstream is relying on. The point is to be able to show, rather than assert, that a replacement performs.

Treated this way, portability becomes something you have tested for rather than something you are hoping for. When a provider raises its prices, retires a version, or has a model withdrawn from under it overnight, the organisation that has been continuously validating its workflows across more than one model can move with evidence behind the decision. The one that has not is left discovering the gaps in production, at the worst possible moment.

The objective is not to pick the right horse. It is to make sure that no single ruling, letter or policy change can reduce your capability to nothing, and to know, in advance and with evidence, that the alternatives you are counting on will do the job.

The Strongest Argument Yet for Open Weights

This is also the clearest case we have seen for treating open weight models as a serious architectural choice rather than a cost-saving compromise. A model you run on your own infrastructure has no kill switch for a third party to reach. It may not be the most capable option on the market, but a capable-enough model that is still running tends to beat a superior one that has been withdrawn. For a large share of enterprise workloads, particularly domain-specific reasoning over private data, that is a trade worth making, and the gap between open and frontier models continues to close with each release.

It is a large part of why we are partnering with NVIDIA. Open weight models running on your own inference give you a stack you own end to end, rather than one you rent on terms someone else can change. The pattern that works is rarely a single model doing everything. It is a mix: smaller specialist models, fine-tuned on your own data, carrying the high-volume and narrowly defined tasks, with larger models held back for the work that genuinely needs them. Each model is matched to the use case rather than chosen by default, and as AT&T has already demonstrated at scale, both the economics and the latency move in your favour once you stop routing every request through a frontier API.

Most importantly, this lets you draw a clear line around the work that matters most. The most sensitive and mission-critical workloads, where an outage or an unplanned data transfer is simply not acceptable, can run on infrastructure that is wholly under your control, with no dependence on a provider's roadmap or a ruling made in another country. Less sensitive work can still call out to a frontier model when that is the right tool for the job. The point is that you decide where each workload runs, rather than having that decision made for you.

Closing Thoughts

None of this makes Anthropic the villain. The company is complying with a legal order it openly disagrees with and is working to reverse the position. The frontier labs are not the problem in this story, and that is rather the point. Even a provider acting entirely in good faith cannot promise that the model you depend on today will be reachable tomorrow, because the decision does not always sit with them.

The argument from the Open Model Moment piece has not changed. It has simply moved from a commercial register into a geo-political one. The sensible response is the same as it has always been, only more pressing. Do not build the business on a single point of failure you do not control, distribute across providers and across jurisdictions, and own your inference wherever it genuinely matters.

For a while, "switched off overnight" was a useful hypothetical for making the point on a slide. It is worth noticing that it has stopped being hypothetical.

EngineeringAI AgentsLarge Language ModelsLLMOps