September 14, 2023

Cluster API and Kubernetes cluster management

Yitaek Hwang
Yitaek Hwang
Guest contributor

What is Cluster API?

Cluster API (CAPI) is a Kubernetes sub-project within the CNCF’s Cluster Lifecycle Special Interest Group. It focuses on using declarative APIs for Kubernetes cluster management. In essence, Cluster API utilizes the operator pattern prevalent in Kubernetes design to simplify provisioning, upgrading, and operating Kubernetes clusters themselves. (Learn more about this meta-pattern in our CNCF webinar ‘Kubernetes all the things!’)

The Kubernetes Cluster API project has been around for several years. It has a rapid release schedule thanks to the active SIG Cluster Lifecycle team, and a healthy 3k GitHub stars. 

At Spectro Cloud, we placed an early bet on CAPI. We’ve published several blogs over the years (and a book, actually). 

But still, many of the folks we meet at events aren’t familiar with Cluster API or what it can do. So it’s time for a refresher.

With Cluster API, infrastructure components that were traditionally managed outside of Kubernetes (for example, virtual machines or networking components) can now be configured and managed just like any other application workload running inside Kubernetes. 

The implications of Cluster API are massive. For one, it applies the best principles that made Kubernetes successful (i.e., declarative management APIs, consistent workflows, automation) to cluster management. 

Cluster API Kubernetes helps to reduce human error in manual cluster-management workflows and replaces them with scalable and repeatable processes. 

Also, there’s tremendous benefit to centralizing the approach to managing both infrastructure and containerized applications, rather than using multiple tools. 

How does Cluster API work?

Today, there are many different ways to bootstrap Kubernetes clusters. We have:

  • Cloud-specific providers such as eksctl
  • Kubernetes-specific providers like kubeadm or kubespray
  • Infrastructure-as-Code (IaC) solutions like Terraform and Pulumi

Each of these tools come with various degrees of complexity and are all opinionated to a certain degree. 

Cluster API aims to standardize this process by extending principles familiar to Kubernetes users, namely declarative APIs and the operator pattern

Operators in Kubernetes extend the idea of the control loop to continuously monitor the current state and reconcile with the desired state. 

Operators create custom resources to create Kubernetes API objects that they want to watch and reconcile via controllers. 

CAPI Cluster - Operator Pattern

With respect to Cluster API, this means that a custom controller scans for the desired Kubernetes cluster state then automates lifecycle activities such as provisioning, upgrading, and managing various clusters. 

CAPI abstracts away the implementation-specific details of various Kubernetes tools mentioned above. This makes Cluster API a flexible and efficient tool for managing Kubernetes clusters whether that entails deployments, upgrades, or scaling operations.  

Components of Cluster API

Components of Cluster API

Underneath the hood, Cluster API consists of the following components and custom resources.

First, we have the management cluster where the various CAPI providers (e.g., infrastructure, bootstrap, and control plane) and resources are stored. 

Cluster API has an extensive list of supported providers ranging from cloud providers like AWS, Azure, GCP as well as bare metal providers like VMware, MAAS, and The flexible nature of the API has allowed the community to create many new providers. In fact, Spectro Cloud created one for Canonical MAAS

Deploying Cluster API involves two Kubernetes clusters: one is a temporary cluster called the bootstrap or management cluster, which you use to create a second cluster that becomes the permanent Cluster API management cluster.

Then, we have the actual workload cluster, which is simply a Kubernetes cluster managed by the components declared in the management cluster. 

Diving into the providers, we have:

  • Infrastructure: responsible for provisioning and managing infrastructure components required by Kubernetes such as VMs, load balancers, and VPCs. 
  • Bootstrap: responsible for bootstrapping Kubernetes components such as creating cluster certificates, installing control plane components, and joining worker nodes to the control plane. 
  • Control plane: responsible for Kubernetes API components such as kube-apiserver, kube-controller-manager, and kube-scheduler. By default, kubeadm is used for the control plane, but other variants like or talos can be configured as well. 

Finally, to support these providers, we have the following custom resources:

  • Machine: spec for the infrastructure backing the Kubernetes node (i.e. VM)
  • MachineSet: spec for maintaining a stable set of Machines (similar to ReplicaSet for pods) 
  • MachineDeployment: spec for updating Machines and MachineSets (similar to Deployment)
  • MachineHealthCheck: spec for defining the healthiness of the nodes
  • BootstrapData: spec for machine-specific initialization data (mostly used for initializing cloud VMs)
Example Cluster with GCP

Benefits of Cluster API 

Cluster API has many parallels to core Kubernetes resources, by design. 

These similarities serve well for core consumers of Cluster API, namely managed Kubernetes or Kubernetes-as-a-Service providers. Cluster API simplifies the process of managing Kubernetes cluster lifecycles, particularly across multiple different environments. 

This is obviously beneficial for those dealing with multiple variants, but also useful for those only using one. This is because now there is a standardized way to manage both the infrastructure component and the applications that run on Kubernetes.

Obviously there are other solutions for creating and managing Kubernetes clusters, including Terraform, ArgoCD and Helm charts to bootstrap various clusters, other open-source projects like Gardener, or some layer on top of Terraform like cdktf, terragrunt, etc. Cluster API has some advantages over these alternatives, but you’ll need to evaluate how satisfied you are with your existing approach before deciding whether it’s worth changing.

Kubernetes Lifecycle Management with Cluster API and GitOps

To demonstrate the benefits, let’s take a real world example using CAPI and GitOps. Since Cluster API shares a lot of the same principles as Kubernetes applications that have long since embraced GitOps, we can manage Kubernetes clusters the same way via GitOps with Cluster API. 

We will follow the demo from “Cluster API and GitOps: the key to Kubernetes lifecycle management” talk by Nic Vermande. 

In this demo, Nic created a highly-available, production-ready Kubernetes cluster deployed to GCP using Cluster API and ArgoCD. 

Shortcomings and solutions

One of the problems highlighted in the demo is that Cluster API alone is not sufficient for full Kubernetes lifecycle management. That’s because Cluster API provisions a barebones cluster, but it’s not a fully-functioning Kubernetes cluster by itself. 

It does not have CNI installed, is missing critical Kubernetes tooling like autoscaler, ingress controller, CSI, as well as application helpers like CD tools.

In the demo, Nic goes through a manual approach of creating these with ArgoCD:

  • First, he uses the apps-of-apps pattern from ArgoCD to define the Kubernetes cluster to be managed by Cluster API. 
  • He then creates a Helm chart to define the cluster components for the GCP kube-adm cluster, including the machine template for the control plane and worker nodes, as well as machine deployments. 
  • He uses the cluster-autoscaler Helm chart along with Kustomize to deploy these components to multiple environments. 
  • Finally, sensitive parts (e.g., kube-config) are encrypted and consumed via the KSOPS plugin to keep all of the components in a single Git repo. 

While the demo successfully creates high-available Kubernetes clusters on GCP using Cluster API and GitOps principles, it demonstrates that not only is this a time-consuming process, but also a complex task, especially for those with small platform engineering teams. 

How Palette can help

An alternative solution is to use Spectro Cloud’s Palette. Palette is a production-ready, scalable solution that tackles cluster lifecycle management whether they are running on the cloud, on prem in the data center, or at the edge. 

Under the hood, Palette uses Cluster API to help build and manage clusters using Kubernetes patterns. But on top of that, it adds in critical day-two operations tasks like scaling, patches/updates, monitoring and observability, as well as backup and restore operations — and support for the “full stack” of Kubernetes integrations through the Cluster Profile concept.

Instead of handcrafting repeatable templates from scratch, you can use Palette’s powerful suite of tools to simplify this process. 

Get started for free with Spectro Cloud Palette today and see the power of Cluster API at scale. Also, check out Cluster API and Declarative Kubernetes Management from Spectro Cloud founders Tenry Fu and Saad Malik on O’Reilly for a deeper-dive into this topic. 

How to
Enterprise Scale
Subscribe to our newsletter
By signing up, you agree with our Terms of Service and our Privacy Policy