Patient Comet · Infrastructure

Own Your AI

A Chinese lab built a world-class AI model for around six million dollars, a fraction of the hundred-million-plus its Western rivals cost, and gave it away free. For teams with serious volume, running a model that good on hardware you own now costs less than the cloud subscriptions they already pay. Most organisations are still paying the old price for something they could now own outright.

Nadim A. MassihNadim A. Massih30 April 2026 · 9 min read
Own Your AI: Why Renting Intelligence Is About to Look Foolish · illustration

The AI Subscription Is Breaking

For the past five years, shipping an AI feature meant one thing.

You built a product. You had an idea for an AI feature (a smart search, an automated summary, a document assistant). You signed up for an API (a connection to someone else’s AI server), and it worked. Impressively, immediately, well.

Your customers loved it.

What they did not see was what happened on every query. Their data (their documents, their messages, their records) left your product, travelled to a server owned by Microsoft, Google, or Anthropic, got processed by a model you did not own, and came back as an answer. You paid for every token (the small chunk of text an AI bills you for), every query, every interaction. The cost scaled with your users. The faster you grew, the larger the bill.

For most teams, this was the only option. The models worth using required infrastructure so large and expensive that building or owning them was simply not realistic. You connected to the cloud because the cloud was where the intelligence lived.

The intelligence was rented. Your product was the front end. The AI lived somewhere else.

That assumption just broke. And it broke on three separate occasions, between January 2025 and April 2026.

Three Events That Changed the Math

Each one dismantled a different part of the old model.

January 2025: DeepSeek releases R1. DeepSeek (a Chinese AI research lab) released a reasoning model (an AI system built to think through complex problems) and published the full method openly. When the training cost emerged in a peer-reviewed Nature paper, it landed like a small bomb: $294,000 (Nature, 2025). That figure covers only the final training step that taught the model to reason. The full training stack (base model included) came to around $6 million: still a fraction of the $100 million or more that comparable Western models cost to build. A model that matched the best in the world, given away free.

The markets understood immediately. The release wiped roughly $589 billion off Nvidia (the company that makes the specialist chips AI runs on) in a single day: the largest single-day market cap loss for any company in US stock market history (CNBC, 2025). Investors were not frightened of one lab. They were frightened of the assumption under their entire position: that only the very largest companies could build intelligence worth using.

April 2026: Google releases Gemma 4. Google’s Gemma 4, in its 31-billion-parameter dense version, beats models more than ten times its size, including rivals above 400 billion parameters. And it runs on a single GPU you can own (Google, 2026). A model you could host yourself, on one card you own, winning blind taste-tests against rivals that need a computing cluster.

Ongoing 2025: Apple ships AI on the device. Apple builds a roughly three-billion-parameter model directly into its devices, available to every developer, with on-device inference (the processing of each AI request) that costs nothing and runs locally, meaning the data never leaves the phone (Apple, 2025). No contract. No server. No log sitting somewhere waiting for a legal request.

Three events. One implication. The ingredient that made powerful AI expensive and remote has become something a product team can own, fine-tune, and ship. The question is no longer whether you can build AI into your product without the cloud. The question is whether you are going to.

Training a world-class AI model: what it cost
AI Training Cost Collapse 2020–2025 $100M+ $10M $1M $294K ~10x cheaper/yr 2020 2022 2024 Sept 2025 DeepSeek-R1 · $294,000 Published in Nature, 2025 Source: Nature / CNN, 2025 · a16z, 2025 PATIENT COMET
The cost of training a frontier-class AI model has fallen from over $100 million in 2020 to around $6 million all-in for DeepSeek-R1 in 2025, with the final reasoning phase alone costing $294,000. Inference cost is falling at roughly the same rate. (Nature/CNN 2025; a16z, 2025)

What Builders Can Now Do

Those three events share a cause: open models. Open models (AI models whose design and weights are published freely) change the product equation entirely.

A software team can now take one of these models, fine-tune it (adapt its behaviour by training it further on their specific domain and data), and ship it as a permanent, built-in part of their product. The model travels with the software. When a customer buys the product, they get the AI too.

Not a subscription to the AI. The AI itself.

The product model is changing
Product Model Shift: Cloud API vs Built-In AI BEFORE Customer Product Cloud API ↗ AI Model • Data leaves on every query • Cost scales per token • Customer rents the intelligence • Third-party sub baked into pricing • Off-limits in regulated markets AFTER Customer Product + AI built in ✓ Data never leaves the device ✓ No per-query cost at scale ✓ Customer owns it outright ✓ Cleaner, simpler pricing model ✓ Wins regulated markets by design Source: Patient Comet analysis PATIENT COMET
The old model required every AI interaction to leave the product and reach a third-party server. The new model puts the intelligence inside the product. Same feature, different architecture, and a fundamentally different business.

Think carefully about what this removes.

No per-query token costs at scale. Once the model is built in, the marginal cost of an AI interaction drops to near zero: the customer’s own hardware does the work. No external API dependency. The product works offline, in environments where data cannot leave the building: hospitals, law firms, government offices, banks. And no third-party subscription invisibly embedded in your pricing.

The customer owns what they paid for. Completely.

What happens to your margins as you scale
Margin comparison: Cloud API vs Built-In AI model at scale CLOUD API MODEL High Low Margin User growth → Revenue API costs Margin shrinks at scale API costs rise with every new user. BUILT-IN MODEL High Low Margin User growth → Revenue Model cost Margin grows at scale Model cost is fixed. More users = better margins. Source: Patient Comet analysis PATIENT COMET
Under a cloud API model, AI costs scale directly with your user base, compressing margins as you grow. With a built-in model, AI cost is largely fixed. Growth stops working against you.

Now think about what it creates.

A product with a fine-tuned model built in is structurally harder to replicate than one that connects to a shared API. A competitor cannot switch to a better API endpoint and close the gap overnight. The model (trained on your domain knowledge, shaped by your users’ actual needs, integrated into your product’s logic) becomes part of what you ship, and part of what makes it yours.

The pricing model changes too. SaaS products with AI features charge recurring subscriptions partly because they pass through API costs. When the model is built in, that cost disappears. You could sell the software once. Or with a simpler subscription. The AI is included, like a camera in a phone, not a streaming service you keep paying for.

One honest note: fine-tuning a model for production is a genuine engineering effort. Tools like Hugging Face and Unsloth (developer tools for fine-tuning open AI models) have made it achievable without a research lab, but it requires a competent ML engineer, proper evaluation, and a realistic timeline. It is not a weekend project. It is, however, now within reach for any well-resourced product team, something that was not true two years ago.

Apple understood this at the operating system level. The on-device model in every device is not an add-on you pay extra for. It is the product. Every software builder now has the same option at the application level.

That option opens markets that the subscription model could not reach at all.

What This Unlocks for Regulated Industries

There is a version of the builder opportunity that is not just about economics. It is about which markets you can serve at all.

For a significant and fast-growing portion of the software market, cloud AI is not a choice. It is off the table.

A medical device company cannot sell a diagnostic tool that sends patient data to a US cloud server under GDPR (the EU data protection regulation) and HIPAA (the US healthcare privacy law). A legal technology firm cannot win enterprise contracts in regulated jurisdictions if their AI feature sends every query to OpenAI. A government software supplier cannot pass a security review if the intelligence in their product lives in a data centre they do not control.

For these markets, the product that wins is the one where the AI runs locally, the data never moves, and the intelligence ships with the software.

This is not a compliance headache. It is a competitive opening, and it just got considerably larger.

In June 2025, the legal counsel of Microsoft France was asked under oath at a French Senate hearing whether he could guarantee that data stored in France by Microsoft would never be passed to US authorities without French approval. His answer was four words.

“Non, je ne peux pas le garantir.” No. I cannot guarantee that (The Register, 2025).

The US CLOUD Act (2018) gives American authorities the right to demand data from any US-headquartered company, regardless of where that data physically sits. An EU data region gives you lower latency and a reassuring label. It does not give you jurisdiction. A Microsoft executive said so, on the record, to a parliament.

The legislative response has followed. On 3 June 2026, the European Commission proposed restricting Microsoft Azure, Amazon Web Services (AWS), and Google Cloud from processing public-sector financial, judicial, and healthcare data across all 27 EU member states (European Commission, 2026). Private companies remain free to choose any platform, but the public-sector scope covers the exact categories where the most valuable enterprise software operates. Those three providers control roughly 70 per cent of Europe’s cloud market.

A product maker who ships with a local, fine-tuned model is not just removing an API dependency. They are entering markets that their cloud-dependent competitors structurally cannot. That is a durable advantage, because the legislative direction is accelerating, not reversing.

Intelligence stopped being scarce. The pricing has not noticed.

What runs where: the four tiers of AI in 2026
Tier Model Runs on Approx. cost Data in-house? Best for
On-device Apple (~3B params) Your device Free Yes Mobile apps, sensitive consumer data
Self-hosted open Gemma 4 / Llama 4 (31-70B) One GPU you own £15-40K hardware Yes Most business tasks, document processing
Mid-tier cloud GPT-4 class APIs Cloud (shared) Per-token No General reasoning, low-volume tasks
Frontier closed o3, Gemini Ultra Cloud (proprietary) Premium per-token No Hardest agentic work, frontier reasoning
Source: Google DeepMind, 2026 · Apple, 2025 · a16z, 2025. “Data in-house” means the workload data never leaves your infrastructure. PATIENT COMET

When Cloud Still Wins

Local models do not win everything. The honest version of this decision has four camps.

The owner says
“Sovereignty stopped being optional. Every cloud call is a copy of the crown jewels leaving the building. Parity has arrived for most of what we do: the disciplined move is to stop renting our own confidentiality back.”

They are right about the risk, right about parity for most everyday tasks, and right that the default needs to be challenged. The Microsoft Senate testimony is not an abstract legal warning. It is a documented fact about the present.

The renter says
“The gap that matters has not closed. The frontier still leads on the hardest work.”

On deep multimodal reasoning and complex multi-step tasks (the hardest agentic work, where the AI must plan and act autonomously), closed frontier models still lead. What you rent from a cloud provider includes reliability guarantees, enterprise support, and someone else’s engineering team on call at three in the morning. Below serious volume, a cloud API almost always wins on price.

The router says
“Hybrid is the only honest answer, but be clear-eyed about what it costs.”

Self-hosting is not a binary switch. A single server capable of running a production-grade open model costs between £15,000 and £40,000, and IDC research suggests hidden costs add another 40 to 60 per cent on top (IDC, 2025). Below roughly £2,000 to £3,000 per month in API costs, the cloud almost always wins. Above roughly 100 million queries per month, self-hosting saves millions annually (Silverthread Labs, 2026).

The compliance-mandated mover
“Our regulator has already decided. Our job is execution.”

For organisations in regulated European sectors (and for the product makers who serve them), the debate is close to resolved by law. If the European Commission’s Tech Sovereignty Package passes as proposed, the routing decision for public-sector financial, judicial, and healthcare data will have been made by legislation. Move deliberately. Move early.

When self-hosting beats the cloud on cost
Break-Even: Cloud API vs Self-Hosted AI £120K £90K £60K £30K £0 0 50M 100M 150M 200M Monthly queries Monthly cost Break-even ~50–100M queries/mo Cloud wins Self-hosting wins Cloud API Self-hosted Source: Silverthread Labs, 2026 · IDC, 2025 PATIENT COMET
Below roughly £2,000-3,000 per month in API spend, the cloud almost always wins on price. Above around 100 million queries per month, self-hosting can save millions annually. (Silverthread Labs, 2026; IDC, 2025)
Where I stand

The router wins the argument, but only when the routing is designed rather than defaulted.

The owner is right that the old assumption has expired. The renter is right that the frontier gap is real on the hardest work. Both observations are correct and neither is a complete policy on its own. The mistake is letting either one become the answer for everything.

The organisations (and the product teams) that come out ahead will be the ones that make a genuine per-workload decision: sensitivity, volume, capability required. Write it down. Apply it consistently. Do not revisit it every time a new model is announced. That one-page document is worth more than almost any model selection you make this year.

Four moves do most of the work once you decide to act on this.

Which workload goes where: a routing framework
AI Workload Routing Decision Framework Sensitive data or customer records? Does this workload touch proprietary information? YES NO High volume & predictable? YES NO Frontier reasoning required? YES NO Build it in locally fine-tune + ship Cloud API manage the exposure Local open model no fine-tune needed Cloud frontier hardest tasks only Stays local Goes cloud PATIENT COMET
A simple routing framework. Sensitive, high-volume workloads go local first. Occasional, complex reasoning stays cloud. The framework is the same whether you are consuming AI or building it into a product.

Four Moves for Builders

1

Fine-tune on your domain and ship the model with your product

The generic open model is the starting point, not the destination. Fine-tune it on your specific domain (legal clauses, medical terminology, financial documents, customer support patterns) and it becomes a meaningfully better product for your users, at no additional per-query cost. Budget for it as a proper engineering project: a competent ML engineer, several weeks of work, and a rigorous evaluation process. The payoff compounds as your user base grows.

Product engineering
2

Build retrieval into the product: the model reads, not copies

Retrieval means the model queries your customer’s documents at the moment they ask a question, rather than those documents being stored or copied anywhere. The customer’s data stays on their infrastructure. The model reads it in place, returns an answer, and nothing leaves. This architecture is what makes your product viable in legal, medical, and financial markets, and worth building correctly from the start.

Data architecture
3

Know which markets need local: go there first

European regulated sectors are the clearest immediate opportunity: financial services, healthcare, government, legal. These markets are where cloud AI is increasingly constrained by law, and where a locally-running, data-sovereign product wins on architecture before the sales conversation even starts. The EU Tech Sovereignty Package and the CLOUD Act exposure of US cloud providers are moving this market in your direction. Position deliberately.

Go-to-market
4

Ship with a model you can upgrade, not one you are married to

The open-model release cadence is fast: Gemma 4 succeeded Gemma 3 in months; Llama 4 succeeded Llama 3. Fine-tune in a way that keeps you portable: build your prompting and retrieval layer so the underlying model can be swapped when a better one arrives. Teams that fine-tune so deeply they cannot switch will spend 2027 maintaining a model that has already been superseded. Stay portable.

Engineering strategy
The Take

The Era of Renting Intelligence Is Ending

In 2025, a lab trained a world-class model for the price of a modest apartment, then gave it away. In 2026, a Microsoft executive told a national parliament he could not protect data stored in his company’s European buildings. The European Commission responded by proposing to restrict three of the world’s largest cloud providers from the most valuable categories of enterprise data. These are not predictions. They are the current situation.

The shift underneath both of these facts is the one most product teams have not yet acted on. AI is transitioning from a service you subscribe to, to a feature you ship. That transition does not happen overnight, and it does not apply to every use case: the cloud still wins on the hardest frontier work, and still wins below serious volume. But for most of what software products actually do, the transition is already technically possible.

The builders who move first will find three things waiting for them: lower costs at scale, access to regulated markets that their cloud-dependent competitors cannot enter, and a product that is structurally harder to replicate because the intelligence is theirs.

The subscription model worked when intelligence was scarce. It is not scarce anymore.

Where to start
  1. Identify one AI feature you currently pay per-token for. Pick one that runs frequently on predictable inputs and handles sensitive data. That is your first candidate for bringing in-house.
  2. Estimate what it costs you today. Pull three months of API invoices, attribute the cost to that feature, then project it as your user base doubles. That number is what changes with a built-in model.
  3. Talk to one ML engineer this week. Ask: how long would it take to fine-tune an open model on our domain for this specific use case? Get a real estimate. Most teams are surprised by how achievable it has become.
  4. Map your regulated-market opportunity. If you sell to healthcare, legal, financial, or government customers in Europe, find out specifically whether your current cloud AI architecture creates compliance exposure for them. Start that conversation before your competitors do.

What kind of AI would you ship inside your product if tokens cost nothing and the model was yours?

Nadim A. MassihNWritten byNadim A. MassihCreative Product Strategist · Tech, Creative & AIMore articles
Common questions

Questions, answered first

Can a fine-tuned open model really match a frontier cloud model for my use case?

For domain-focused tasks (document processing, structured data extraction, customer support in a defined context), a well-fine-tuned open model frequently outperforms a generic frontier model. On open-ended complex reasoning and long agentic tasks, frontier closed models still lead. The only way to know for your specific workload is to run the benchmark. Do that before committing either way.

How much does it cost to fine-tune and host an open model?

Hardware for a production-grade open model server: £15,000 to £40,000. Engineering for a proper fine-tuning project: four to eight weeks for a small ML team. Ongoing hosting and maintenance: estimate 40-60% of hardware cost annually (IDC, 2025). Below roughly £2-3K per month in current API spend, cloud almost always wins on total cost. Above that, run the numbers for your situation.

Does an EU cloud data region protect our customers from US legal demands?

Not reliably. The US CLOUD Act allows American authorities to demand data from US-headquartered providers regardless of where the data sits. In June 2025, Microsoft France confirmed this under oath at a French Senate hearing. Genuine sovereignty requires a locally operated provider, or data that never leaves the customer’s infrastructure in the first place.

What is fine-tuning and do we actually need it?

Fine-tuning means continuing a model’s training on your specific data so it becomes better at your particular tasks. You do not always need it. For many use cases, a well-designed retrieval architecture works better and is cheaper to maintain. Fine-tuning makes most sense when you need the model to consistently follow domain-specific patterns or terminology. Start with retrieval. Fine-tune when retrieval is not enough.

What exactly is the CLOUD Act?

The Clarifying Lawful Overseas Use of Data Act (a 2018 US law giving American authorities the right to demand data from US-headquartered technology companies, regardless of where that data physically sits). It applies to Microsoft, Google, Amazon, and every other major US cloud provider, including when operating in Europe.

Can a model I run myself really compete with the big cloud ones?

For most real-world business tasks, yes, meaningfully so. A 31-billion-parameter open model on a single GPU now beats much larger cloud-only rivals on human-preference testing (Google, 2026). At the genuine frontier (complex reasoning, long agentic tasks), closed models still lead. Run the benchmark on your specific use case. That number, not the benchmark chart, is the one that matters.

Receipts

Sources & references

Nature / CNN, 2025

DeepSeek-R1 peer-reviewed on the cover of Nature; final RL reasoning phase cost $294,000; full training stack (including V3 base model) approximately $6 million, a fraction of the $100M+ required for comparable Western models. Became the most-downloaded open model in the world.

CNBC, 2025

Nvidia lost roughly $589 billion in a single day after the first R1 release; the largest single-day market loss in US history.

Google DeepMind, 2026

Gemma 4 released April 2026 under Apache 2.0 licence; beats much larger models including Llama 405B on human-preference testing; runs on a single GPU.

Apple, 2025

On-device model (~3B parameters) with free local inference; data stays on the device; available to all developers.

a16z, 2025

Inference cost falling approximately 10x per year; open-model enterprise adoption concentrated at larger, regulated firms driven by on-premise and compliance requirements.

The Register / French Senate, 2025

Microsoft France confirmed under oath at a French Senate hearing (June 2025) that it cannot guarantee data sovereignty for data stored in France against US authority demands.

CNBC / TechRadar, 2026

EU Tech Sovereignty Package formally proposed 3 June 2026; proposes restricting Microsoft Azure, AWS, and Google Cloud from processing public-sector financial, judicial, and healthcare data across all 27 EU member states.

IDC, 2025

Hidden costs of on-premise AI infrastructure represent 40-60% of total cost of ownership beyond hardware purchase.

Silverthread Labs, 2026

Self-hosting break-even: below ~£2-3K/month in API spend, cloud wins; above ~100M queries/month, savings of £5M-£50M annually.

Keep reading

More articles