Best practices for managing bare-metal Kubernetes clusters

So you’ve made the decision to deploy Kubernetes on bare metal. Good for you. Bare-metal nodes increase security and control. They may reduce costs. They typically lead to better overall performance.

That said, getting the most out of bare-metal Kubernetes requires more than simply provisioning physical servers as nodes and going to town. To build bare-metal clusters that optimize performance, cost and manageability over the long term, it’s critical to think strategically about exactly how you design and administer bare-metal Kubernetes environments.

To that end, let’s take a look at five best practices for running Kubernetes on bare metal. The tips we discuss below will help you maximize the benefits of bare metal clusters while minimizing the headache.

What is bare-metal Kubernetes?

Before jumping into best practices, let’s make clear what we mean by “bare-metal Kubernetes.”

We’re referring to Kubernetes environments whose master and worker nodes operate as bare-metal machines, as opposed to virtual machines. In other words, when you run Kubernetes on bare metal, you take the virtualization hypervisor out of the picture.

Why run Kubernetes on bare metal?

There are lots of compelling reasons to create bare-metal Kubernetes clusters. The most important include:

Greater control over nodes, because you can access the physical hardware directly.
Higher performance for workloads because there are no resources “wasted” on hypervisor overhead.
Simpler management – at least in some respects – because you don’t have to manage VMs and hypervisors in addition to the rest of your stack.

There are clear drawbacks to bare-metal Kubernetes, too, such as greater provisioning complexity and the fact that you can’t divide a physical server into distinct software-defined parts.

Still, if you decide that the benefits outweigh the challenges based on your infrastructure setup and workload requirements, running bare-metal Kubernetes can be a great way to get even more value out of the open source orchestrator.

Best practices for bare-metal Kubernetes

Choose your machines wisely

Not all bare-metal nodes are made the same. Beyond the obvious fact that each physical server may have different hardware specifications, the location of the servers could have major consequences for determining how much control, performance and cost benefit you can actually obtain.

For instance, using bare-metal server instances in a public cloud – which is convenient because you don’t have to acquire, set up and manage the hardware – will typically not offer the same level of control as running nodes on servers that you own yourself. You won’t have direct access to the hardware, so you won’t have full control. Cloud-based server instances are also not likely to deliver the greatest cost benefits over the long term.

Think, too, about how many resources each server has. In general, it makes less sense to use massively powerful servers as bare-metal nodes because doing so increases the risk that their resources will go under-utilized. You can’t “slice and dice” the compute and memory resources of a physical server in order to share them with multiple workloads in the way you could if you were using VMs.

That said, the best type of machines to use could vary according to your workload types and use cases. For instance, high-performance computing (HPC) or AI workloads might benefit from having very powerful machines, whereas for generic application hosting, having a larger number of less powerful nodes would typically be preferable.

Don’t overestimate performance

If you are using bare-metal Kubernetes because you think it will dramatically improve workload performance, you probably need a reality check.

bare-metal Kubernetes workload performance

Replace this with the following image: 62-image-1.jpeg

It’s true that running on bare metal does improve performance – but only by a limited amount in most cases. Factoring in hypervisor overhead (which can be as low as 2 percent and guest operating system resource consumption, nodes that run as VMs probably only “waste” something like 10 to 20 percent of available resources compared to bare metal. That resource savings only matters if your workloads are actually exceeding 80 to 90 percent capacity. For workloads that run near peak capacity on a continuous basis – as an HPC deployment might, for example – this could be a major savings. But it may be less important in contexts where average workload demand is lower.

The point here is that, while bare-metal Kubernetes can indeed improve performance, it’s important to set realistic expectations. If you commit to bare-metal clusters in the belief that you’ll cut your infrastructure spend in half or double the performance of your workloads using the same number of servers, you will probably end up disappointed.

Automate node provisioning

If your nodes run as VMs, it’s easy to automate node provisioning using VM images and APIs. But matters become trickier when you are dealing with bare-metal servers, because you can’t as easily join them to a cluster in an automated fashion. And obviously, configuring each bare-metal node manually is not a scalable approach.

Fortunately, there’s a solution: Using tools like MAAS or Tinkerbell, you can leverage the Kubernetes Cluster API – a solution for automating cluster configuration – in conjunction with bare-metal nodes. Cluster API doesn’t support bare metal directly, but MAAS, Tinkerbell and similar tools serve as “middlemen” that allow you to provision bare-metal clusters automatically via Cluster API.

So, if you’re thinking that you can provision bare-metal Kubernetes clusters manually, think again. Use an automated approach instead. Here’s a good walk-through explaining how to deploy MAAS and Cluster API for this purpose.

Avoid OS sprawl

When you run nodes as VMs, it’s comparatively easy to keep their operating systems and configurations consistent. You can use golden images to provision them initially, and you can update them using a consistent process because the virtual hardware for each machine will be identical (if you set it up that way at the start).

With bare-metal nodes, things become more complicated. Your physical hardware may not be identical. Installing updates or modifying the OS requires more work because you can’t simply drop a new image into place.

There are no simple solutions on this front (although automated provisioning tools can help). You’ll just need to work harder to ensure software consistency across your bare-metal Kubernetes infrastructure. The effort will pay off by simplifying management, thanks to fewer variables between the software environment running on each machine.

Consider using VMs and bare metal simultaneously

Perhaps the best way to make the most out of bare-metal Kubernetes is to avoid running everything on bare metal, or everything on VMs. Most Kubernetes distributions don’t care whether a node is a VM or a bare-metal server, and they will happily manage clusters that include both types of nodes at the same time.

This means it’s possible to build some clusters with bare-metal nodes and others with VMs. (You could also theoretically mix bare metal and VMs within the same cluster, but from a management perspective, that would likely become very difficult.) You could then use a multi-cluster Kubernetes strategy to manage all clusters centrally. Multi-cluster environments are certainly more complex to administer than single-cluster setups, but they can be managed effectively with the right planning and tools – such as Palette, which allows you to manage multiple clusters consistently, regardless of whether they are based on VMs, bare metal or both.

Mixing VMs and bare metal could be useful if, for example, some of your physical machines are high-end servers, while others are less powerful. It might make sense in this case to divide the high-end machines into VMs, while letting the others operate as bare-metal nodes. Running VMs and bare-metal nodes simultaneously could also be beneficial if you have some Pods that need to access specific bare-metal hardware features (like GPUs to support AI/ML workloads), while others can run on any generic VM.

Conclusion

Bare-metal Kubernetes offers great potential for reducing cost and improving performance. But it doesn’t achieve those benefits automatically. To make bare-metal nodes work for you, think strategically about where and how to run those nodes, how to provision them and how to manage them on an ongoing basis. Be sure as well to be realistic about how much of a performance benefit you’ll glean from bare-metal clusters.

Tags:

Best Practices

Bare Metal

5 best practices for managing bare-metal Kubernetes clusters