The Open Model Moment: Why Every Enterprise Needs a Multi-Model Strategy

Generative AI

8 Apr

For a few years now, we have been making the case that the SaaS model as we know it is on borrowed time. Not dying, but evolving. The era of one-size-fits-all platforms is giving way to something far more interesting: hyper-tailored, purpose-built applications that fit the organisation, not the other way around.

That shift is no longer theoretical. It is happening in production, across industries, right now.

And it has a companion trend that is arguably even more consequential: the move towards open source models, both large and small, running on your own inference infrastructure. Whether that is distributed across cloud, on-premise or on-device, the direction of travel is gaining significant traction.

Three developments in the last few weeks crystallise the point perfectly.

Three Moves, Three Very Different Bets

Google open-sourced Gemma 4. Their most powerful open model to date, released with permissive licensing and designed to run efficiently across a range of hardware. It is a serious signal: Google is betting that ecosystem adoption, not API lock-in, is the winning play.

NVIDIA launched Nemo Claw. An enterprise-ready MVP built on top of Open Claw, and fully open-sourced. NVIDIA is essentially handing enterprises the scaffolding to build their own agentic coding environments, backed by the hardware ecosystem they already dominate.

Anthropic ruled out Pro-Max agreements for third-party harnesses like Open Claw. Users can still access Claude via the API, but the decision to restrict how their models are consumed through external orchestration layers is a deliberate one. For some businesses building on top of those integrations, that single decision could have been genuinely disruptive.

And therein lies the lesson.

The Case for a Multi-Model Strategy Has Never Been Stronger

When a single business model decision from a frontier provider can undermine your operating moat overnight, concentration risk is a real concern.

Google, NVIDIA and Anthropic are all making rational moves within their respective strategies, but if your business is wholly dependent on one model, one API, one provider's roadmap, you are exposed in ways that most leadership teams have not fully internalised.

A multi-model strategy is therefore architecture.

AT&T Already Proved the Point

If any of this still feels abstract, look at what AT&T has done. Their Chief Data Officer, Andy Markus, recently shared that by shifting workloads from large language models to fine-tuned open source small language models, AT&T has cut its AI operating costs by up to 90%.

The numbers are staggering. AT&T processes around 8 billion tokens a day. At that scale, routing everything through frontier LLMs is not just expensive, it is unsustainable. So they rebuilt their orchestration layer: a multi-agent stack where large models act as supervisors, directing smaller, purpose-built worker agents that handle the bulk of the processing. The result was a threefold increase in throughput, from 8 billion to 27 billion tokens a day, with dramatically lower latency.

The models themselves are not huge. AT&T fine-tunes open source models in the 4 to 7 billion parameter range on tightly scoped internal data: contracts, network logs, policies, call transcripts. For tasks like root-cause network analysis, fraud detection and contract clause extraction, these small models match or exceed the accuracy of general-purpose frontier models, at a fraction of the cost.

Markus put it plainly: the future of agentic AI is many, many small language models. Not one giant model trying to do everything, but a fleet of specialists, each trained on the organisation's own data, each doing one thing extremely well.

This is running in production across 100,000 employees and is a blueprint that any enterprise can follow.

Can Open Models Replace Tools Like Claude Code?

Let us take the question head-on. Does something like Nemo Claw, paired with models like Gemma 4, give organisations enough firepower to reduce their dependency on proprietary developer tools?

The honest answer: maybe, maybe not.

Claude Code is brilliant at what it does. But "brilliant" and "irreplaceable" are not the same thing. The gap between frontier proprietary models and open alternatives is narrowing with every release cycle. For many enterprise use cases, particularly those involving domain-specific reasoning over private data, an open model fine-tuned on your own corpus may already outperform a general-purpose frontier model operating through a generic interface. AT&T is living proof of exactly that.

The question is whether open models are good enough for your specific task, running on your infrastructure, governed by your policies.

For a growing number of organisations, the answer is yes.

The Smartest Knowledge Base Might Be the Simplest One

Credit here to our own Diogo Carrapato Sousa for flagging this internally: Andrej Karpathy recently shared a technique he is using to build local knowledge bases using nothing more than markdown files, exposed to an LLM running locally.

On the surface, it sounds scrappy, almost too simple, especially for enterprise architectures where shared knowledge sources, insight and action are key to deliver on collective business outcomes.

But think about it in the context of how large organisations actually operate. Layers of bureaucracy. Information that gets lost in translation, contorted, misunderstood. Knowledge buried in wikis that nobody reads, scattered across tools that do not talk to each other. It is remarkably similar to what happens when an LLM hits its context window limit: information overload leads to degraded output.

Karpathy's approach inverts that. Small, bounded, frequently refreshed knowledge, fed back to the same location, consistently. No complex RAG pipeline. No vector database. Just clean, structured context that the model can actually use.

For enterprises drowning in their own documentation, there is a profound insight here: sometimes the most effective architecture is the one that respects the limits of attention, both human and machine.

Are all of these insights an early indicator of where enterprise AI could be heading towards? Local, open and AI-tailored assistants running tasks securely on your own data… Let’s see.