Five things every neocloud and MSP needs to turn AI factories into a business that scales

Global Head of Partner Programs, Distribution & Channel Operations

Why building AI factories as a service requires more than GPUs, and what to look for in a control plane.

If you're running a neocloud, an MSP, or a sovereign cloud build-out, demand is the easy part of your job — every enterprise on the planet seems to be asking about AI infrastructure right now. What's harder (and what's likely to separate the providers still here in 2028 from the rest) is how you turn a pile of accelerated hardware into a repeatable, governed, profitable service.

The headline figures are undeniably attractive. Gartner expects worldwide AI spending to reach $2.5 trillion in 2026, up 44% year over year, with sovereign cloud IaaS alone forecast at $80 billion — growing 35.6% overall, and as much as 83% in Europe, 89% across the Middle East and Africa, and 87% in mature Asia Pacific. Forrester pegs neoclouds and specialized GPU and sovereign infrastructure at $20 billion in revenue, while Synergy Research has neocloud revenue alone topping $23 billion in 2025, and McKinsey now counts more than 100 neoclouds globally, with 10 to 15 operating at meaningful scale.

Awkwardly, the same McKinsey work puts GPU rental gross margins at 14-16% after labor, power, and depreciation, which is rather worse than most non-tech retail. Selling raw GPU capacity in a bare-metal model leaves you with the economics of a high-capex hosting business, which probably isn’t what your CFO was hoping for.

Better margins come from selling something more than GPU capacity: governed, multi-tenant AI services that customers can use straight away, with isolation, choice, and Day-2 operations included in the price. From our perspective there are roughly five capabilities that top performing providers need to deliver.

1. Secure multi-tenancy that customers will trust

Multi-tenancy is, in effect, your commercial model. You're selling slices of expensive hardware to customers whose requirements often conflict and whose jurisdictions sometimes don't get along. A tenant that ends up exposed to another tenant's data ends your business; a tenant who needs three weeks of manual networking changes to onboard erases your margins.

Namespaces and RBAC are a reasonable starting point, although they don't address the harder questions on their own. You're going to want workload isolation, network isolation enforced as close to the silicon as you can manage — which is the territory NVIDIA's MIG addresses, as we've covered before — and quotas that hold up under contention rather than turning into theoretical limits at the worst possible moment. As Aviz Networks' Thomas Scheibe put it at GTC this year, providers want "to consume GPU infrastructure like a product, without reinventing orchestration and operations for every cluster," which only really works when the isolation underneath is dependable.

2. Open choice, because no two customers want the same stack

You're unlikely to win a sovereign or regulated customer by presenting them with a single opinionated stack and asking them to either accept it or look elsewhere. One tenant will bring their own storage vendor; another will insist on a specific observability platform; the third might want ClearML, while the next refuses to move off Run:ai. Some need air-gapped, some need confidential computing on BlueField DPUs, some want NIM and NeMo end to end, and a few would rather not use NVIDIA hardware at all.

A control plane that can't accommodate these kinds of opinions without re-architecting per customer turns every onboarding into a bespoke engineering project, which is a huge headwind if you’re trying to build a profitable service business. Open choice in this sense means choice of compute hardware, Kubernetes distribution, AI framework, storage, networking, security tooling, and deployment model, all sitting on top of consistent governance that you, the provider, define.

3. A service catalog of repeatable blueprints

One of the bigger differences between profitable AI service providers and unprofitable ones, in our experience, is whether they treat onboarding as catalog selection or as bespoke engineering. Custom-building each customer environment turns the operation into a consulting firm that happens to carry a lot of capex, whereas blueprint-driven onboarding lets you design once, validate the design, version it as it evolves, and deliver it to many tenants without redoing the work each time.

The catalog needs to extend beyond infrastructure into the AI stack itself: approved profiles for inference platforms, RAG environments, MLOps tooling, GPU-backed workspaces, and model-as-a-service offerings. Customers select from what you've published and onboard themselves, while you stay in control of what's in the catalog and how it evolves.

4. GPU utilization and metering you can run a P&L on

As we've explored before, GPU utilization across the industry is dismal: Wesco's analysis of more than 4,000 Kubernetes clusters put the average at 13%, with memory usage rarely above 20%, and even OpenAI estimates its own infrastructure at around 33% — and they're as sophisticated as operators get. Combine those figures with rental margins in the mid-teens, and a fair amount of expensive hardware ends up earning less than it costs to own.

What you need, then, is visibility into who's using what, where capacity is constrained, and where it's stranded; quotas and fair-share scheduling that hold under contention; and metering data that helps with whatever billing models you choose to offer. You also need the operational latitude to oversubscribe carefully or partition deliberately, with enough data to tell which approach is working in any given pool.

5. Sovereignty and deployment flexibility from the start

For customers who care about sovereignty — which nowadays is most of them — the management plane has to live inside the boundary they've defined. A SaaS connector reaching in from outside doesn't really qualify; the control plane itself needs to sit within the operator's four walls, with upgrades and audits running on their own schedule, air-gapped where required and using FIPS 140-3 cryptography for regulated workloads.

Sovereignty has shifted from being a public-sector niche to a mainstream commercial requirement. Go to any trade show, from KubeCon to GTC, and you’ll find it cropping up in almost every talk and vendor booth.

Gartner forecasts that 20% of existing cloud workloads will move from global hyperscalers to local providers, and digital sovereignty has become a top-line agenda item for European leaders from Ursula von der Leyen to Mario Draghi, who has called for a European strategic cloud "that gives us data sovereignty in critical domains." By 2027, Gartner predicts, 35% of countries will be locked into region-specific AI platforms.

Built to tick every box

These five trends? They make up your operating reality, and it’s a tough one. They’re also what we built PaletteAI for. We took it to general availability at NVIDIA GTC in March 2026, and NVIDIA has validated it against both the Enterprise AI Factory design and the AI Factory for Government reference design, which means the underlying infrastructure validation work isn't something you need to redo yourself.

For multi-tenancy, PaletteAI provides tenant-aware operations to platform teams: tenant-scoped settings, project boundaries, RBAC that integrates with the identity systems customers already use, and quotas that hold across shared GPU pools. Our partnership with Netris extends that into hardware-enforced network isolation on BlueField DPUs, so concurrent tenants on the same bare-metal GPU hosts get genuine separation rather than a strongly worded namespace, while our partnership with Aviz Networks brings the same orchestration to NVIDIA Spectrum-X Ethernet fabrics.

Open choice lives in PaletteAI Studio, where platform teams assemble approved profiles from across the ecosystem — NVIDIA NIM, Triton, and NeMo, alongside ClearML, Run:ai, Kubeflow, MLflow, and Hugging Face — and where the supported hardware extends to AMD GPUs, ARM-based systems, Google TPUs, and AWS Graviton in addition to NVIDIA. Whatever a particular customer chooses, the platform's job is to make sure the result is governed and repeatable.

The repeatable blueprint side comes through profile bundles, version tracking, and project cloning, which together let you publish a service catalog once and deliver it many times over, with new tenants onboarding in minutes and self-serving within the guardrails you've defined.

For GPU economics, fleet overviews, metering views, utilization dashboards, and Alertmanager-driven alerting give you the operational data to run a P&L against, and when those are combined with MIG partitioning, dynamic resource allocation, and namespace-level quotas, we've seen customers push utilization on shared pools up by as much as 70%.

On sovereignty, finally, PaletteAI is self-hosted by default, including air-gapped, and PaletteAI VerteX adds FIPS 140-3 validation and hardened controls for the public sector and regulated industries. The certifications customers ask about — SOC 2 Type 2, ISO 27001, FedRAMP, UK Cyber Essentials Plus — are all in place. We’re already widely used in classified programs.

Oh, and one more little thing that shows we have your business squarely in mind: we support white-labeling of the PaletteAI interface, so it’s your brand that shows up in front of your customers, not ours.

Where to start

If you're building your AI factory from scratch, a workable sequence is to define the tenant model first (how many customers, what isolation guarantees, what sovereignty boundaries), then design the service catalog you want to sell, and only then build the operating model that lets you deliver it without writing bespoke engineering for every customer. The control plane is what makes the rest of it possible.

We've worked with MSPs, neoclouds, sovereign cloud operators, and strategic partners on exactly this set of problems, so if you'd like to see how PaletteAI fits a build-out like yours, you can book a demo with our team, or read more on our AI-as-a-Service, sovereign AI, and AI factories pages.

May 20, 2026