Published

June 2, 2026

Why PaletteAI doesn't have to ship every AI tool (and how AI skills let us add new ones in a day)

If you're building an AI platform, you've probably had this conversation more than once.

A customer (internal or external) asks whether your platform supports their tool of choice. ClearML or Kubeflow for the MLOps stack. NVIDIA NIM or vLLM for inference serving. LiteLLM fronting a fleet of model endpoints. Run:ai for GPU scheduling. Or something newer that only had a stable release six weeks ago. Whatever it is, the answer is usually "not yet, but we can get there." The question is how long “there” takes.

For platform teams, the gap between "we support a tool" and "we support every tool a user might want" is absolutely brutal, and it can seem impossible to bridge. The AI tooling ecosystem doesn't slow down for anyone. Hugging Face crossed two million hosted models in 2025. The CNCF AI & ML landscape now tracks hundreds of projects, with new entries arriving every month.

And the pace itself is accelerating: Hugging Face’s first million models took over 1,000 days to land; the second million arrived in around 335. Platform teams trying to keep up are running integration backlogs that won't clear in a single planning cycle, let alone a sprint, and customers don't have the patience to wait for the next vendor roadmap quarter. There's no realistic universe where a managed platform ships first-class support for every framework, agent, vector store and inference runtime. So either you accept being behind, or you build an integration model that lets your team and your customers fill the gap themselves.

That second path is what PaletteAI does. We’ve gone deep enough into it that we even wrote a guide for the AI assistants helping us along the way, and we’re going to put it in public so anyone can use it. Here’s how it works.

The bundle model

PaletteAI's unit of integration is a ProfileBundle. A bundle is a self-contained Kubernetes deployment recipe for a single tool or stack. It includes the Helm chart, the configuration, the infrastructure dependencies, the variables surfaced to end users, and the documentation. Adding a new tool, upgrading an existing one, or forking its behavior for a particular customer all flow through the same artifact, with the same review process and the same set of conventions.

At the heart of each bundle is a WorkloadProfile: the declarative graph that PaletteAI uses to actually deploy the tool. The workload profile (WLP) lists the components that make up the deployment in the order they need to come up, with each component referencing the next through dependencies and outputs. That's where the integration logic sits. Which Helm chart to pull, which namespace to land it in, which secrets need to exist before the chart runs, which routes to expose to the user, and what to surface back to PaletteAI once everything is healthy. The bundle is the package. The WorkloadProfile is the engine inside it.

To make that concrete, here's a stripped-down fragment of our vLLM Production Stack bundle. The whole vLLM stack (engine, router, optional chat UI, optional tracing, optional KV cache offload) is wrapped behind a single `type: vllmstack` CUE component. As a user, you select the features you want with boolean variables, and the component handles all the conditional resource rendering underneath:

- name: vllm-prod-stack

type: vllmstack

properties:

targetNamespace: '{{ spoke ".def.vllm-model-namespace.namespaceName" }}'

chart:

spec:

chart: vllm-stack

version: '0.1.11'

sourceRef:

kind: HelmRepository

name: '{{ spoke ".def.vllm-repository.repoName" }}'

# Optional capability toggles, all default to false

enableToolCalling: '{{ default false .var.toolEnabled }}'

enableLMCacheCPUOffload: '{{ default false .var.lmcacheCpuOffloadEnabled }}'

otelTracesEnabled: '{{ default false .var.otelTracesEnabled }}'

enableOpenWebUI: '{{ default false .var.openWebUIEnabled }}'

values:

servingEngineSpec:

modelSpec:

- modelURL: '{{ .var.modelRepo }}/{{ .var.modelName }}'

requestGPU: '{{ default 1 .var.modelGpu }}'

That single component replaced what used to be a stack of separate HelmReleases, HTTPRoutes and ServiceMonitor wrappers, each one fired conditionally based on which toggles you set. Each chart upgrade across the 0.1.8 → 0.1.11 arc has been a matter of bumping versions inside the component definition, while the WLP shape above stays stable for everyone who's already deployed it. The variable surface you see doesn't grow as the underlying stack does, and the conditional logic that would otherwise be impossible to express in PaletteAI's templating layer lives in CUE where it belongs.

Each bundle ships two variants. A connected variant for clusters that can reach the public internet (and consequently our registry), and an airgap variant that pulls every chart and image from an internal OCI registry. Both variants share the same WorkloadProfile structure, the same user-facing variables (apart from the registry credentials only the airgap variant needs), and the same documentation surface.

Once the bundle gets validated, packaged, and published, it shows up as a deployable application in PaletteAI for any cluster on a compatible infrastructure profile. Users pick it, fill in the variables, and PaletteAI handles the actual deployment, the lifecycle, the upgrades and the airgap mirroring.

Crucially, all of this lives in a Git repository. Adding a tool to PaletteAI isn't a vendor request. It's a pull request.

A week in the trenches

Take a recent week. We shipped six bundle changes, each one opened as its own pull request and delivering a measurable piece of value for customers like you.

We bumped the vLLM Production Stack from 0.1.8 to 0.1.11, with the latest release bringing vLLM engine v0.20.2 along with a newer router and an updated Open WebUI native integration. The CUE component underneath the bundle moved with them, because that's what wires vLLM's optional toggles (tool calling, KV cache offload, distributed tracing, an integrated chat UI) into the deployment. With this PR we’ve now delivered a production inference stack customers can stand up by filling in a handful (exactly 5!) variables. Yay.

We upgraded ClearML Enterprise to its newer chart for a customer who needed to use the latest and greatest. Sounds like a one-line change. It really wasn't. The newer chart ships its routing layer natively, which lets us remove a whole component from the WorkloadProfile, but only if we restructure how the CA certificate is wired in. We did the chart bump and a structural refactor in the same change, renamed all the user-facing variables to drop a redundant prefix that was making them feel verbose in the UI, and bumped the underlying Kubernetes baseline.

We built a new bundle for ClearML Enterprise Agent. This one is a bit more involved than you might expect, because the enterprise agent uses a completely different Helm chart from the open-source ClearML agent we already shipped. The two charts have different secret layouts, different namespaces, and different conventions for how they consume credentials. The PR is two folders, one chart, an entirely new bundle in the PaletteAI catalog.

We renamed our OSS ClearML agent bundle for naming parity with the new enterprise variant, and refactored its internals onto the same modern component shape in the same change. We did the same refactor on the ClearML OSS server bundle.

And then, on the back of all that work, we wrote an AI Skill capturing everything we’d learned in the process — patterns, naming conventions, the lot — so the next bundle goes faster than the last one, no matter who’s writing it.

So in a week we shipped two chart upgrades, a new tool integration, two refactors, and a meta-improvement to our authoring workflow. That's the pace the bundle model enables when you have the right patterns established.

Pattern-matching in the real world

That pace works because the work is repeatable. Each bundle hits roughly the same set of decisions, and after you've shipped a handful of them, you start seeing the shape. A few of the lessons we learned:

Templating has limits, and CUE is the escape hatch. PaletteAI's WorkloadProfile templating layer supports a small, deliberate set of operations. When a bundle needs conditional logic, that logic doesn't go in the WorkloadProfile. It goes into a CUE component that wraps the deployment and renders different resources depending on user-supplied flags. Our vLLM bundle uses this pattern to support tool calling, KV cache offload, distributed tracing and an integrated chat UI as independent toggles. Knowing where the line sits between "use templating" and "write a CUE component" saves hours per bundle.

OCI source health checks aren't all created equal. Flux, the deployment engine underneath, treats OCI Helm repositories and OCI artifact repositories differently. One actively pulls and reports healthy on its own. The other is essentially a credentials holder, and never reports healthy until something downstream consumes it. PaletteAI's component health gate waits for the source to report healthy before letting the deployment proceed, which means one of these two shapes needs an opt-out annotation and the other doesn't. We learned this the hard way (don’t ask), then we applied the fix across the bundles and made it the first rule in the new Skill.

The folder model preserves history. When you upgrade a chart, you don't replace the old folder. You add a new one. Customers running the previous version need their bundle to keep resolving. The result is a bundle catalog where every shipped version stays addressable, and upgrades become customer-triggered rather than vendor-driven.

Variable naming is UI design. The variables a bundle exposes show up in the PaletteAI interface. They’re the surface someone deploys against. Names, descriptions and required-versus-optional flags all matter, and we spent meaningful time on the variable surface of every bundle this week because it affects how quickly you can deploy.

None of these are obvious from a first read of the existing bundles. They're the kind of thing you learn by getting it wrong, which is a problem when you have lots of contributors who each need to learn the ropes. So we did something about it.

Inside our Skill

We probably don’t need to tell you about Skills… they aren’t new by AI market standards. Anthropic’s open Skills spec is well-established: structured, progressively-disclosed guidance that sits in the repository and loads automatically when an AI assistant detects a relevant task.

After shipping all those bundles, I decided this was the perfect candidate for a Skill. It sits in our public bundles repository alongside the bundles themselves, organized as an entry-point file plus ten reference files (about 1,500 lines in total) in a progressive-disclosure shape. The entry point is short, it lists a pre-PR checklist, and it routes the assistant to whichever reference matches the work in progress. The references are:

A WorkloadProfile patterns reference for component structure, deployment priorities, and how to wire namespaces and source outputs through the deployment graph
A templating reference documenting the templating layer's limits and the three patterns for handling conditional rendering when those limits get in the way
An OCI health-check reference that distinguishes the two source shapes, quotes the upstream documentation verbatim, and spells out exactly when the opt-out annotation is required
A versioning reference covering the folder model for chart upgrades, the version-bump discipline for bundle internals, and the coupling between CUE component versions and bundle versions
A variables reference for naming conventions, required-versus-optional decisions, and the standard variable patterns that appear across bundles
A README structure reference documenting the sections and tables every bundle's README ships with, including the current infrastructure component versions
An airgap reference covering the OCI repository shape, the image registry override, the mirror manifest format, and the paletteai mirror command
A validation reference listing the validators that run in CI and the workarounds for running them locally
A CUE components reference covering generic CUE authoring patterns, with PaletteAI's vLLM component as the worked example
A PR conventions reference for commit message style, PR scope, and the body shape we use across the repository

Writing this up as a Skill rather than a wiki page means a coding assistant loads it automatically when the work is relevant — Claude Code, Codex, Gemini CLI, whichever you’ve got open. The next person who sits down to add a new tool to PaletteAI doesn’t have to learn the OCI annotation rule by getting it wrong. They open their assistant, describe what they want to build, and the assistant already knows where the traps are. We’re already seeing three great outcomes from codifying our tacit knowledge:

What one engineer can ship goes up. The same engineer who’d have taken two weeks on their first bundle — pinging Slack, bouncing through review cycles — can now produce a working draft in an afternoon. The Skill carries the institutional knowledge that used to live in three or four senior engineers’ heads. New starters ramp faster, senior engineers stop fielding the same questions, and the team ships more without adding bodies.

The Skill is public — take it. If you’re running PaletteAI and you want to extend it with a tool we don’t ship out of the box, the same guidance our internal team uses is sitting right there in our public bundles repository. Open it, copy the conventions, send a PR upstream when you’ve got something working. You don’t have to wait on us — and we’d rather you didn’t.

Each round teaches the next. Each bundle we ship teaches us something new about the model. Each new pattern goes back into the Skill, and the next bundle is faster again. Over time, the gap between "a customer asks for X" and "X is in the catalog" keeps shrinking. We’re hoping this is how our platform stays in step with the ecosystem instead of perpetually playing catch-up.

Platforms have to operate differently at AI pace

Two things are true at once. AI platform vendors (like us) can’t ship native support for every tool in the ecosystem fast enough, and platform consumers (like you) can’t tolerate waiting six months for the integration you need today. “Trying harder” as a vendor doesn’t solve that. What works is an integration model shaped right for the pace the ecosystem actually moves at.

PaletteAI's bundle model leaps the gap. A bundle takes a day or two to write, validates in CI, and works for tools the platform has never seen before. The PRs we shipped last week covered inference serving, MLOps, agent orchestration and a private enterprise chart from a third-party vendor. The same approach handles whatever ships next month, whether that's the next Run:ai equivalent, the next LiteLLM gateway, or a new project nobody's heard of yet.

The Skill is the second-order effect. Once the model is repeatable, the repetition can be codified, and the codified version becomes something a coding assistant can apply directly — which multiplies what your team can ship without growing headcount. You get the integration you need sooner, our engineers spend less time context-switching, and the platform stays in step with the ecosystem rather than chasing it.

If you're running your own AI platform — or a platform-engineering function inside a larger company — the real test for your integration model is whether it can keep pace with what your users are asking for. Hand-building every integration isn’t realistic for any vendor. PaletteAI is built around the assumption that the right answer is to make it cheap for everyone else to extend us.

What should we support next?

If there’s a tool, framework or runtime you’d like to see in PaletteAI — an MLOps platform, an inference runtime, an agent orchestrator, something newer than any of those — tell us. Better still, open a PR. The Skill is sitting in our public bundles repository, ready for anyone who wants to try authoring one.

New to PaletteAI? Start with the product overview and the documentation. For background on how the team thinks about AI tooling and developer workflows, our earlier post on the monorepo decision is worth a read.

Jun 2, 2026