The CapEx Versus OpEx Reckoning for Enterprise AI

29 Jun

At an executive partner dinner a little while ago, the conversation drifted, as these conversations invariably now do, towards the question of what everyone is really spending on artificial intelligence. One of the firms around the table offered a figure that gave the rest of us pause. They had decided to give every member of staff an allowance of £1500 a month to spend on AI tooling of their own choosing, with no central procurement function involved, no approved vendor list, and no constraint beyond the size of the monthly allowance itself. Their people were free to select whichever models and applications they preferred and simply to expense the cost.

Viewed as an exercise in discovery, there is a good deal to admire in the approach. A business learns very quickly which tools its people gravitate towards, where the genuine value is beginning to emerge, and which established workflows start to change shape once capable assistance is placed within easy reach. Viewed as a cost position, however, it is a variable line of expenditure that carries no ceiling and very little governance, multiplied across the headcount and compounding with every month that passes. An allowance of 1500 for each person each month amounts to £18,000 a year per head, and across a workforce of a thousand people that represents something in the order of £18 million of annual AI consumption, allocated according to individual preference rather than to any considered view of return. What gave the table pause, I suspect, was the recognition that most of us had lived through a version of this before, albeit under a different name.

The cloud parallel is exact, and instructive

When cloud computing first arrived in earnest, the central promise was liberation from capital expenditure. Organisations were encouraged to stop buying data centres, to stop attempting to forecast their capacity requirements three years in advance, and to stop sinking capital into hardware that began to depreciate the moment it was installed. In its place came the consumption model, under which a business paid only for what it used, scaled up as demand required, and switched off whatever it no longer needed. At the time this felt like unambiguous progress, and in a great many respects it genuinely was.

The difficulty, as most finance functions came in time to discover, arrived with the invoices. Pay as you go turned out in practice to mean paying for whatever anyone happened to provision, consumption proved a good deal harder to predict than the early business cases had allowed for, no single party felt any clear ownership of the meter, and the elegant variable-cost narrative had quietly become an ungoverned one. It was precisely out of this experience that the discipline now known as FinOps emerged, bringing engineering, finance and the business together to govern usage-based spending in a coordinated rather than a piecemeal fashion. Reserved instances and committed-use agreements followed soon afterwards, once organisations had appreciated that, for workloads which were steady and predictable, it was simply more economical to commit to capacity in advance than to rent it by the hour. The mature position, when it eventually settled, was never an ideological preference for operating expenditure over capital expenditure, or indeed the reverse, but rather a matter of matching the commercial model to the underlying shape of the workload.

Artificial intelligence has now reached the same inflection point, and the firm distributing £1500 a head sits very near the beginning of that same curve, still within the liberating phase that tends to precede the moment at which the invoices begin to concentrate minds. The question facing any regulated business is therefore whether it intends to work its way through the lengthy and expensive education that cloud once imposed on everyone, or whether it might instead choose to apply the lessons it has already paid so handsomely to learn.

Indeed, cloud consumption to a certain extent had its own control mechanisms as it was only exposed to a small footprint of engineers who had the keys and skills to build cloud hosted infrastructure. In the realm of AI, this protection barrier does not exist as everyone can build skills, workflows and connectors seemingly with a few small prompts. Therefore controls, education and awareness are pivotal for business users to protect their organisations from spurious and non value adding workflows from being executed on a daily basis. Indeed, this also raises questions about the same type of workflow being run on a daily basis by multiple users.

Namely; how do you protect against this? Whilst still liberating users with the single most profound technology innovation of our lifetime.

Why agentic workloads make forecasting so much harder

There is a particular reason that spending on AI proves harder to forecast than spending on conventional software, and it is worth setting out with some care rather than simply asserting.

Model-as-a-Service is, in essence, priced by the token. In the relatively straightforward world of a conversational assistant this remains broadly predictable, since a user poses a question, the model returns an answer, and the average exchange can be estimated with reasonable confidence. Agentic workloads dismantle that assumption rather thoroughly. An agent does not content itself with a single call to a model; it formulates a plan, invokes a tool, interprets the result it receives, reasons about what it has found, calls a further tool, checks its own working, and, in the multi-agent designs that are becoming steadily more common, it performs all of this across several cooperating agents at once. A single instruction issued by a single user can therefore fan out into dozens, or even hundreds, of model calls. Consumption, put another way, ceases to track the number of people using the system and begins instead to track the autonomy and the complexity of the work that has been delegated to it.

This is the dimension that the £1500 allowance has yet to feel in full.

As soon as these tools graduate from helping to draft an email to completing a multi-step process on the user's behalf, per-head consumption stops behaving like a tidy monthly figure and begins to scale in proportion to the quantity of genuine work being handed across.

Under an agentic strategy, in other words, operating expenditure rises in step with ambition, and ambition is, of course, precisely the quality that every leadership team is presently being urged to cultivate. The cost dynamic can consequently be stated with some confidence: renting capability through per-token Model-as-a-Service offers a low cost of entry alongside spending that is difficult to predict and that grows as agentic adoption deepens, whereas owning capability through dedicated or sovereign infrastructure inverts that profile, asking for a higher and largely fixed commitment at the outset in return for a variable cost that is brought firmly under control.

The temptation, faced with that question, is to reach instinctively for the old answer and simply reintroduce the gate. Yet the control gate cannot easily be rebuilt, and nor should it be, because the very quality that makes this technology valuable is the same quality that makes it difficult to govern: a curious person in finance or in legal, with no engineering background whatsoever, can now assemble in an afternoon something that would once have demanded a project, a budget and a queue. To close that door again would be to forfeit the prize. The task is therefore not to restore the bottleneck but to replace the protection that the bottleneck used, quite incidentally, to provide, and to do so deliberately this time rather than by happy accident.

That replacement rests, in our experience, on three things working in concert.

The first is visibility, for the unremarkable reason that an organisation cannot govern what it cannot see; if workflows are being created and run across the business every day, the business needs some means of observing what is being built, by whom, at what cost, and against which data, in much the way that cloud estates eventually acquired the instrumentation to watch their own consumption.

The second is reuse, which is the proper answer to the duplication that so rightly troubles you, for where a good workflow has been built once it ought to be promoted into a governed, supported and version-controlled catalogue, so that the hundredth person with the same need reaches for the established version rather than quietly assembling a hundredth variant of their own. Built once and shared, a workflow compounds in value; built a hundred times in isolation, it does little more than multiply the cost and the risk while fragmenting the very knowledge that ought to have accrued to the institution.

The third is education, since the most durable control of all is a workforce that understands what good looks like, knows which data must never cross which boundary, and can tell the difference between a workflow worth preserving and one that should not have survived its first run.

Taken together, these amount to a single principle, and it is the principle on which the whole question ultimately turns: the safe path must also be the easy path. Where the governed option is the more convenient one, where the catalogue is quicker to search than a fresh workflow is to build, and where the guardrails are experienced as assistance rather than obstruction, the user is liberated and the organisation is protected in the same motion. That is the balance worth designing towards, and it is properly a balance rather than a compromise, since either extreme, unchecked sprawl on the one hand or a reimposed bottleneck on the other, surrenders something the business cannot really afford to give up.

The case for owning, and the catch that accompanies it

It is worth being plain about what owning actually means here, since the word can slide around. To own, in this context, is to hold the compute itself: to run models on GPUs the firm has bought outright, or on dedicated and reserved capacity it has committed to, rather than to rent model access by the token from a provider who owns the hardware and meters your use of it. The choice, put simply, is between paying for the machinery or paying for the mileage.

The argument for owning the machinery is at its most persuasive in exactly those circumstances where regulated firms tend to feel the greatest pressure. It offers predictable cost at scale, since once consumption becomes high and sustained the per-token meter has a way of growing more expensive than reserved or dedicated capacity would have been. It affords genuine control over where a given workload actually runs, which matters a great deal when the underlying data is sensitive and the regulator expects to see isolation and auditability demonstrated rather than merely described. And it provides a measure of independence from the pricing and policy decisions of any single provider, which is no small consideration for an institution whose continuity obligations are themselves a regulatory matter.

The catch that accompanies all of this is real, and it mirrors the cloud lesson almost precisely. To buy that hardware is to acquire a depreciating asset, and in the particular case of AI that depreciation is unusually severe, because the frontier is advancing at such a remarkable pace. The model around which a firm optimises its infrastructure today may well be surpassed, and rendered cheaper to operate, within the year. A business that commits too early, or too generously, can find itself the owner of capacity that the wider market has only just rendered uneconomic. Renting, by contrast, keeps an organisation on the frontier and leaves that particular risk with somebody else, whereas owning caps the variable cost but transfers the burden of obsolescence onto the balance sheet. There is, in short, no course that is free of cost or of risk, only the rather more useful question of which costs and which risks a given firm is best placed to carry.

Indeed, this isn’t as easy as buying tin and dropping the infrastructure in your data centre (if you even still own one!). It will require a deep understanding of capacity planning, infrastructure tuning, AI harness and model testing and ongoing maintenance. It’s not for the faint hearted and will significant investment in engineering skills to ensure the investments are embedded, maintained and provided to the business in a seamless

A portfolio decision rather than a matter of principle

It follows from all of this that the right answer is not a doctrinal commitment to capital or to operating expenditure, but rather a portfolio, governed in much the way that mature cloud estates have come to be governed. A good many organisations have, in fact, already adjusted their commercial posture to accommodate precisely this kind of metered, consumption-based usage, and the adjustment they made there transfers across rather neatly.

Leaders in the energy sector will recognise the underlying shape of the problem more or less immediately, because the meter beneath all of this is not, in the final analysis, an abstract token at all; it is power. Inference is fundamentally a matter of power draw, the compute available to a firm is increasingly bounded by the energy available to run it, and those who are thinking most clearly about the cost of AI have begun to model it as they would model any power-based usage, namely as a metered resource comprising a base load to which it is sensible to commit and a peak into which it is sensible to flex. FinOps for AI is, at root, the very same adjustment that energy-intensive operators made some time ago: one commits to the predictable base, rents the unpredictable peak, and governs the whole from the centre rather than surrendering it to the sum of individual expense claims.

What to own, and what to rent

Reduced to a working rule, the decision turns on a small number of questions, each of which a leadership team can reasonably put to any given workload.

The first concerns volume and predictability. Where a workload is high in volume and stable in its demands, running day in and day out, the case for committed or dedicated capacity is a strong one, much as it was in the cloud era, because the economics cross over in favour of ownership once consumption is large enough and steady enough to amortise the initial outlay. The second concerns sensitivity and regulatory exposure. Where the data or the use case demands isolation, auditability and demonstrable control, that requirement can justify dedicated infrastructure in its own right, quite independently of where the cost arithmetic happens to land. The third concerns maturity. Where a workload is spiky, experimental, or simply early in its life, and its eventual volume is not yet understood, it belongs on per-token pricing, since paying a premium per unit is the correct price for declining to commit capital before the pattern of demand has had the chance to reveal itself; the fifteen-hundred-pound experiment properly belongs in this category, though it would be the better for being ring-fenced and measured rather than left open-ended. The fourth concerns the pace of change. Where remaining on the latest and most economical frontier model matters more than capping the unit cost, renting preserves the freedom to move on without stranding an asset one has paid for.

Most regulated firms, having worked through these questions honestly, will arrive at a deliberate division of the estate: a committed base of owned or reserved capacity serving the steady, sensitive and high-volume core of their activity, with rented elasticity layered above it to absorb the experimental and the peak. The error worth guarding against is not the choice of one side over the other, since thoughtful firms will quite properly weight that balance differently according to their circumstances. The error is to drift into a position through simple inattention, which is, when one looks closely, exactly what an ungoverned per-head allowance quietly arranges on a firm's behalf.

Modelling the true cost before the invoice does it for you

None of this can sensibly be settled by instinct, and it is here that a great many organisations find themselves short. The per-token price, or the sticker price of a GPU, represents only the visible tip of the cost. Beneath the surface sits the remainder of the iceberg: the platform and orchestration layer, the people required to operate it, the governance and audit overhead that regulation quite properly demands, the work of migration and integration, and the depreciation schedule attaching to anything owned outright. A total-cost comparison that weighs nothing more than token cost against hardware cost will furnish the wrong answer, and will tend to do so with a good deal of confidence.

In our own work for a capital markets client, the value lay not in arriving at a single headline number but in constructing a total-cost-of-ownership model that made the full three-year picture explicit across each of those layers, accompanied by a dealsheet discipline that allowed the leadership team to see for themselves how the balance between owning and renting moved the economics as volume and the mix of workloads changed.

Once the crossover points become visible in that way, the decision ceases to be a matter of conviction and becomes what it ought to have been all along, namely a series of considered, workload-by-workload commercial judgements, revisited as the frontier advances and as the firm's own volumes grow into something it can plan around.

The reckoning

The firm handing out £1500 a head is not, on any reasonable reading, wrong to be experimenting; it is wrong only if it comes to mistake the experiment for a strategy, and allows an ungoverned variable cost to compound until the finance function is obliged to call a halt, which is more or less precisely the path that cloud followed in the years before the discipline of FinOps existed to temper it.

That lesson, at least, has already been paid for once. The regulated firms that emerge in the strongest position over the course of 2026 will, in all likelihood, be those that manage to apply it without having to pay for it a second time. Invariably treating AI consumption as the metered, power-based resource it genuinely is, deciding with some deliberation what is best owned and what is best rented, modelling the full three-year cost before entering into any commitment, and governing the meter from the centre rather than one expense claim at a time.

The technology may be new and profound, but the discipline it asks of us is not.

Synopsis: As enterprises scale AI, per-token spending on agentic workloads is becoming as unpredictable and ungoverned as early cloud bills were before FinOps. This piece sets out the CapEx versus OpEx choice for enterprise AI: when to own dedicated or sovereign compute, when to rent model access by the token, and how to model the true three-year cost before committing.

AI StrategyRisk ManagementAI GovernanceAI AgentsInfrastructure