May 7, 2023

Three common Kubernetes challenges — and how to solve them

Dmitry Shevrin
Dmitry Shevrin
Infrastructure Specialist

With Scale Comes Complexity

Kubernetes has a pretty fearsome reputation for complexity in its own right (as we’ve discussed before). Learning it for the first time and standing up your first cluster, deploying your first application stack… it can be painful.

virtual machines Kubernetes meme

But as any seasoned operator will tell you, it’s when you expand into running Kubernetes in production at scale that you come across the real pain!

Let’s delve into  three of the most common "growing pains" that we’ve seen in the field:

  • Developer productivity
  • Multicluster headaches
  • The edge learning curve

We’ll not only explore the pain, but show you some ways to sidestep these pitfalls.

Pain 1: Developer Productivity

Infrastructure doesn’t exist for its own sake. As an operations team, the clusters you create are there to provide application delivery for the dev teams you support.

Despite the popularity of the term “DevOps,” most developers don’t have the skill set to be cloud native infrastructure or Kubernetes experts. They would much rather be coding features than managing infrastructure (as we have explored in this blog post).

Developers just want to consume infrastructure elements such as Kubernetes clusters, and they have little tolerance for delays and hurdles in their way. Unfortunately, it’s not always easy to give them what they want.

Let’s take a simple example: running a test suite on a new release of application code.

The developer wants a pristine, clean cluster — possibly multiple clusters, running specific versions of Kubernetes and other software. This is vital so they get accurate testing that predicts production behavior. We all know that it’s impossible to mirror a production infrastructure setup on a local laptop, as tempting as it is.

If a test fails for a simple reason, the dev might want to repeat the CI/CD pipeline almost immediately.

But you know it’s not that easy.

ci/cd pipeline

Firing up a new cluster takes work, costs money, and even if you have the capacity to jump right on the request, it also takes time. Which means your developers are kept waiting.

This is a real conundrum. Now imagine it happening across dozens of dev teams pushing multiple code pipelines per day.

Giving Devs What They Need with Virtual Clusters

So, how do you deal with it? One answer is virtual clusters. In a virtual cluster setup, the infrastructure team keeps host clusters stable and under their full control, while giving developers virtual clusters. Virtual clusters are isolated, presenting zero risk to the underlying infrastructure and take next to no time or resources to fire up, meaning you can give every team their own clusters to play with.

This solution is based on the open source vcluster project, but at Spectro Cloud we’ve made this technology enterprise-ready and easy to consume. That is, we’ve added a sleek UI, implemented a fully declarative approach with Cluster API behind the scenes, connected all this to role-based access controls and presented it as an easy-to-consume SaaS solution.

Using Palette, developers can order their own clusters created specifically for them based on cluster profiles, defined by the infrastructure group. Each team can have their own playground. Of course the whole process can be fully automated using Terraform provider or REST API calls. With this approach granular control remains in hands of the infrastructure team, but developers now have their freedom and ability to test their code in prod-alike conditions.

Pain 2: Multicluster Headaches

Everyone starts with one Kubernetes cluster. But few teams today stay that way.

This number quickly grows to three when you split development, staging and production environment clusters.

And from there? Well, our research found that already half of those using Kubernetes in production have more than 10 clusters. Eighty percent expect to increase the number or size of their clusters in the next year.

You can probably manage a couple of clusters manually using kubectl, k9s and other open source tooling. But once you grow to a couple dozen clusters, managing Kubernetes becomes overwhelming.

How can we solve this problem?

The foundational principle of multicluster management: You shouldn’t be manually using kubectl and connecting to your clusters individually. Instead, you describe their configuration in a declarative approach.

That “future state” description should cover the entire cluster, from its infrastructure to the application workloads that run on top. In other words, you should be able to recreate your whole cluster stack from scratch using only its description, without manual intervention.

Multicluster, Multicloud, Multienvironment

As you look ahead to growing numbers of clusters, it pays to be cloud-agnostic from the start.

It’s tempting to stay with just one public cloud provider. It’s the easiest path: only one product set and terminology to learn. And cloud providers normally do their best to lock you in, often financially incentivizing loyalty.

But there are many reasons why you might find yourself in a multicloud environment. Mergers and acquisitions sometimes unexpectedly bring another provider in house. You might want to use multiple clouds to access specific features, or to spread risk and improve availability.

A number of companies we’ve seen are also using a hybrid architectural approach to deploying Kubernetes: some legacy data center-based deployment applications coupled with cloud-based ones.

Whenever you’re dealing with different siloes of Kubernetes, what really helps is a single pane of glass. If you’re using multiple cloud providers or on-premises plus a cloud provider, you need tooling that enables you to make similar deployments to various environments. It helps achieve standardization and simplifies your overall organizational infrastructure.

Pain 3: The Edge Learning Curve

From the data center and cloud, you might start looking even further afield: to the edge.

Organizations are increasingly adopting edge computing to put applications right where they add value: in restaurants, factories and other remote locations.

But edge presents unique challenges. The hardware is often low power: Your clusters might be single-node devices. The connectivity to the site may be low bandwidth or intermittent, making remote management difficult. There’s a whole new frontier of security to consider, protecting against hardware tampering.

And the killer: When we’re talking about restaurant chains or industrial buildings, compute might need to be deployed to hundreds or thousands of sites. There won’t be a Kubernetes expert at each site — or even a regular IT guy — to help onboard new devices or fix any configuration issues locally.

These are big challenges, but there are solutions to help you.

Low-touch or no-touch onboarding technologies mean that someone without IT or Kubernetes knowledge should be able to take a device, plug in power and ethernet, and watch provisioning complete automatically, without any human intervention whatsoever. (Check out a demo of that right here).

One of the prerequisites for this deployment is an ability to provision your whole cluster from scratch in one go, including operating system provisioning on the underlying device. When you can do such a provisioning in a declarative way, this significantly simplifies mass-scale deployment. The idea here is that provisioning should start automatically and build the system up to a point when it’s capable of reaching HQ, from where you’d be managing the deployment centrally.

Once your cluster is ready, you can further mitigate risk if your OS is immutable. It just has fewer elements to break, so provides a more reliable foundation for your clusters. So, pay attention to which OS you’re choosing. There’s a number of both open\ source and commercial offerings available on the market. Check out this comparison to get you started.

Facing Challenges? You’re Not Alone

As a platform team, you will almost certainly face these challenges as you scale Kubernetes in production. They’re daunting, for sure. But there are approaches you can take to be successful.

The cloud native, open source Kubernetes ecosystem is huge, and you know you’re not alone. There are many different projects out there that you can apply. And there’s a wealth of community experience you can draw on too, at events like KubeCon.

But perhaps the biggest tip we can offer is this. If you’re building for scale, don’t DIY.

By all means start with DIY to learn and prove the concept. But you should know that the “do it yourself” path only becomes harder with time. We have moved on from needing to do “Kubernetes the hard way.”

So what’s next? If you’re interested in diving deeper into the challenges you can expect to see as your Kubernetes use grows, join me at our webinar. Watch the recording here!

Edge Computing
Enterprise Scale
Best Practices
Subscribe to our newsletter
By signing up, you agree with our Terms of Service and our Privacy Policy