Why Kubernetes? A Guide to Cloud-Native Infrastructure

In ancient Greek, the word “Kubernetes” referred to a helmsman, someone who steered a ship (a ship full of containers in this case!) The Kubernetes we’re interested in here is not nearly so old as the age of Homer — it was released back in 2015. But in just a few short years, a lot has already changed.

Today, defining Kubernetes conventionally as just a “container orchestrator” is certainly reductive, if not an outright fallacy. It may be fairer to say that Kubernetes has become the de facto cloud Operating System, a platform that drives the whole industry toward a more modern, reliable, scalable, and resilient infrastructure to run cloud-native applications.

If you’re a beginner looking to learn Kubernetes, this article will give you the ‘Dummies Guide to Kubernetes’ and more. We’ll explain some Kubernetes basics, cover some of what makes Kubernetes special, and explore how the platform is being used in the enterprise today.

Kubernetes basics

Kubernetes is a container orchestration platform that was originally designed by Google. It is now an open-source project with contributions from a large number of companies. Kubernetes allows you to run containerized applications at scale in a production environment. It handles all the complexity of deploying and managing containers on a large number of hosts.

Kubernetes is based on a few key concepts:

Nodes: A node is a host that runs one or more containers. A Kubernetes cluster typically consists of a large number of nodes.
Pods: A pod is the basic unit of deployment in Kubernetes. It is a group of one or more containers that are deployed together on the same node.
Labels: Labels are key/value pairs that can be attached to objects in Kubernetes, such as pods. They are useful for users and controllers to identify and select groups of objects.
Selectors: Selectors are used to select a group of objects based on their labels. They are often used in conjunction with labels.
Replication Controllers: A replication controller ensures that the specified number of pods composing a service are always running. It will automatically create a new pod in case an existing pod is deleted or unhealthy.

Getting started with Kubernetes

Kubernetes is usually deployed as a cluster of physical or virtual machines. Each machine in the cluster runs a Kubernetes agent, called kubelet. The kubelet is responsible for starting and stopping containers on the node.

The Kubernetes control plane is responsible for managing the cluster. It exposes a REST API that can be used to manage the cluster. The state of the cluster components is stored in a distributed store, provided by etcd. Etcd typically runs on the control-plane nodes as a set of containers, but it can also be installed and managed outside of Kubernetes as VMs or bare metal servers.

Another component running on the control-plane nodes is the controller manager, responsible for running the Kubernetes controllers. The most critical controller is the replication controller, which ensures that a specified number of pods are always running.

The kube-proxy component runs on each node and provides network proxy services for the containers on the node. It ensures traffic gets forwarded to the correct pod, typically through iptables NAT rules, based on the pod's IP address and port.

Finally, coreDNS provides DNS services for the Kubernetes cluster. It is responsible for resolving DNS names to IP addresses for the services in the cluster. It runs as a set of containers.

The core Kubernetes components are represented in the figure below:

The control-plane nodes are also responsible for scheduling pods on worker nodes in the cluster. This task is performed by the scheduler, which runs on every control-plane node. In a production environment, the control-plane nodes are deployed in a highly available (HA) fashion with a minimum of 3 nodes. An external load balancer is required for accessing the Kubernetes API server, and etcd data is replicated across the control-plane nodes for resiliency. Other control-plane architectures are possible, but the one we've just described is the simplest solution for HA.

Finally, there is the container runtime, which is responsible for running containers on the nodes. Kubernetes supports multiple container runtimes, including Containerd and CRI-O. It is worth noting that Docker support has been removed since Kubernetes 1.23.

Kubernetes uses a declarative approach to deployment. This means that you simply describe the desired state of your application, and Kubernetes will ensure that the application always matches that state. For example, if you configure a Kubernetes deployment with three pod replicas, Kubernetes will ensure that three pod copies are always running.

The platform is designed to be highly scalable. Kubernetes can easily be deployed on a large number of nodes. It is also designed to be highly extensible. Additional Kubernetes components can be deployed to add functionality to the platform. These components include:

Ingress controllers: An ingress controller is responsible for routing external traffic to services inside the Kubernetes cluster.
Monitoring solutions: Many monitoring solutions can be deployed on Kubernetes to monitor the cluster's health and applications. Examples include Prometheus (metric collection) and Grafana (Dashboard).
Logging solutions: They can be deployed on Kubernetes to collect and aggregate logs from the applications in the cluster. Popular stacks include EFK (Elasticsearch, Fluentd, and Kibana) and ELG (Elasticsearch, Logstash, and Grafana).
Networking and Security solutions: Kubernetes can be integrated with the existing physical or virtual network through a CNI (Container Network Interface). It can also leverage independent software-defined solutions, generally based on OVS (Open vSwitch) or eBPF (enhanced Berkeley Packet Filter). Security is a core feature of these solutions, with capabilities like namespace isolation and ingress/egress traffic control.
Storage solutions: Many storage solutions can be used to provide persistent storage for applications in a Kubernetes cluster through the CSI driver (Container Storage Interface). It includes both proprietary and Open Source solutions (Ondat, Portworx, Rook, Amazon EBS, and the like).
Continuous integration/delivery (CI/CD) solutions: They automate the building, testing, and deployment of applications on Kubernetes. Tekton is an example of a Kubernetes-native pipeline, where every task runs as a container.

Hierarchical components

You can think of Kubernetes workloads as emails popping up in your Gmail inbox. You can add tags to them and create a higher-level view containing all the emails with a specific tag. Kubernetes works the same way to control pods. A picture is worth a thousand words, so take a look at the diagram below:

For the sake of simplicity, we didn't represent ReplicaSets, which are automatically created by the Deployment controller.

The Deployment controller relies on pod labels to determine which pods it manages. If a pod is not reachable or destroyed, another one will take its place to ensure the desired number is reached. Similarly, service endpoints are derived from pod labels. Traffic directed to the billing-svc service will be load-balanced among the three pods with the label app=billing.

By default, pods are not accessible from outside the cluster. Kubernetes clusters are fenced from a network perspective. Services with a type that is set as NodePort, open a dynamic port on every worker node (identical on each node), and users can access the desired service on this port. The service is only reachable within the cluster if the type is set to ClusterIP (the default). In other words, Kubernetes default settings only authorize service-to-service communication.

The beauty of Kubernetes lies in its eventual convergence. For example, if a user changes the number of replicas of the billing Deployment to three, the corresponding service will also update its endpoint members, accordingly.

In addition, other controllers may influence the number of pods in a Deployment. HPA (Horizontal Pod Auto-scaling) and PDB (Pod Disruption Budget) configurations may compute another desired state regarding the number of pods a deployment should be running. They could also be represented in the picture above, but we have omitted them for clarity.

What makes Kubernetes special?

While Kubernetes is a huge success in the enterprise today, that wasn’t always guaranteed. In fact, following the success of Docker and the rise of containers, Kubernetes emerged into an all-out container orchestration war against many competing options.

Users had their pick of Docker Swarm, Apache Mesos/Marathon, HashiCorp Nomad, and Pivotal Cloud Foundry. Although some of these competing technologies are still active, it’s clear to most observers that Kubernetes won the war.

Why? Kubernetes has several key advantages that still power its success today:

More than a container orchestrator: an extensible API platform

Saying that Kubernetes is a container orchestrator is accurate but reductive. In fact, the most powerful feature of Kubernetes is its rich and extensible REST API, combined with the automation and high availability it provides at scale.

One of the key characteristics of Kubernetes is its API extensibility. It already contains a substantial amount of native objects accessible through a standard REST API, but it also provides an easy way to extend it.

Creating Custom Resources (CR) allows developers to add arbitrary resources as first-class citizens in Kubernetes. CRs are defined as a set of properties, encapsulated in a YAML file and ingested by Kubernetes, via a simple command line or the API. Developers can then leverage existing Kubernetes procedures to build infrastructures and application services.

This turns Kubernetes into a generic cloud API that can be deployed anywhere, from customer premises to public hyperscale environments.

This applies to both infrastructure and application components. As a result, you can not only embrace modern application development patterns but also natively reap the benefits of infrastructure-as-code (or rather infrastructure-as-yaml). This is also valid for any component outside of Kubernetes (e.g., an Amazon EC2 instance) , which you can represent as a first-class object in the Kubernetes API server using Custom Resource Definitions (CRD). You can further manage their lifecycle with custom controllers.

Event-driven scalability

Custom controllers can be built to extend Kubernetes abilities to control the number of pods required for a particular application over time. When CPU and memory may not be what you want to monitor to make decisions about auto-scaling, this is the way to go. Other metrics, such as the number of messages in an application queue, can be factored in. They represent external sources, identified as event-driven information custom controllers must compute and manage to perform automation tasks.

For example, Kubernetes Event-Driven Autoscaling (KEDA) is an operator that enables event-driven autoscaling of containerized workloads. KEDA provides a scalable and serverless event-handling architecture that can trigger arbitrary actions in Kubernetes based on events from external sources such as Kafka, RabbitMQ, NATS, Azure Storage Queues and Tables, HTTP requests, and more.

KEDA is designed to be used in concert with existing Kubernetes autoscaling mechanisms such as the HPA. KEDA scales pods, based on demand but does not replace or duplicate Kubernetes HPA functionality. It is a new project accepted into the CNCF as an incubating project in November 2019. The project is being led by Red Hat and Microsoft.

Knative is another popular open-source project that also uses custom controllers. It comprises a set of components to build, deploy, and manage modern serverless workloads on Kubernetes. Knative provides developers with a set of building blocks upon which they can construct event-driven pipelines composed of several stages, from receiving events to handling them appropriately.

With Knative, you don't need to provision or manage any infrastructure, so you can focus on what matters most – building great applications. Like KEDA, Knative is also an incubating project at the CNCF. The project is being led by Google, IBM, Red Hat, VMware, and SAP.

The de facto cloud OS for cloud-native applications

Kubernetes is no stranger to the twelve-factor app patterns. In fact, Kubernetes is certainly the most appropriate infrastructure platform to run cloud-native applications adhering to these principles.

Companies that invest in this type of application usually reap a competitive advantage. They can independently manage individual components of their application (i.e., microservice) without any disruption, run A/B testing, and perform canary rolling updates. Of course, none of this can be done without adapting internal processes related to the application development lifecycle. DevOps, TDD (Test Drive Development), Agile development methodology, and “shift left” are principles that need to be implemented, at least partially.

A word of caution here: although the close alignment of Kubernetes to modern app patterns is a huge advantage, the more Kubernetes machinery is exposed to core application developers (as opposed to SRE, DevOps, and infrastructure developers), the less likely they will be to enjoy the experience. App developers must be empowered by the platform teams abstracting away as many Kubernetes components as possible.

Since the early days of UNIX, developers have used standard libraries to interact with operating system functions. In the C or C++ programming language, the GNU C Library (glibc) is the library that provides Portable Operating System Interface (POSIX) functions and critical interfaces to the kernel.

Similarly, Kubernetes is often considered the cloud OS and provides integration services for developing and maintaining cloud-native applications. From ingress application routing to DNS services and load-balancing, Kubernetes acts as the glibc library to interact with the platform without extra application code. But there is a counterpart – a lot of glue is required to make all pieces work together and provide end-to-end application lifecycle support.

The Swiss army knife for application delivery

Lifecycle management of cloud-native applications is a very active domain with many projects trying to help developers in their daily work. Deploying applications to Kubernetes is performed by submitting a set of YAML manifests to the Kubernetes API server. Each manifest describes a particular component or microservice making up the application.

There are many ways to interact with the REST API:

Use kubectl as an authorized user to deploy the manifests.
Use a release manager like Helm to make the deployment of applications more streamlined across multiple clusters and locations.
Use GitOps principles to strictly control change requests on manifests within a version control system like Git.
Use base and overlay Kustomize manifests to deploy applications with different parameters according to their environments. For example, the number of replicas of a particular Deployment may differ in production vs. development environments.

Tools such as the ones listed below provide continuous delivery capabilities and can be leveraged within DevOps pipelines.

Argo CD

Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes. Argo CD follows the GitOps pattern of using Git repositories as the source of truth for defining the desired application state. Application manifests are declaratively defined in a Git repository, and Argo CD continuously monitors the running application state and compares it against the desired state stored in the Git repository. When differences are detected, Argo CD automatically updates the running application to match the desired application state specified in Git.

Flux CD

Weaveworks Flux is a tool that automatically ensures the state of a cluster matches the config in Git. Flux is a project for GitOps-style deployment on top of Kubernetes. It provides declarative updates for Kubernetes resources in a Git repo. It does this by using an operator to trigger deployments inside Kubernetes, which means you don't need a separate CD tool. Flux and Argo CD are both hosted by the CNCF as incubating projects.

Helm

Helm is the de facto package manager for Kubernetes applications. It streamlines the installation and management of Kubernetes applications. Helm uses a packaging format called charts. A chart is a collection of files that describe a related set of Kubernetes resources.

Kubernetes application manifests can be defined using Helm templates, which are then passed through the Helm engine to render into Kubernetes manifests. It enables consistent and repeatable deployments across different environments. Helm is a graduated project at the CNCF and is led by Microsoft, Google, and Bitnami.

Kustomize

Kustomize is a tool designed to customize raw, template-free YAML files for multiple purposes, leaving the original YAML untouched. With Kustomize, you can traverse a Kubernetes manifest to add, remove, or update configuration options without forking the manifest. This can be useful when deploying the same application to multiple environments with different configuration options.

The center of a vibrant open source community

The projects we’ve just discussed (and there are many more…) lead us to another key advantage that Kubernetes has been able to draw on: the OSS community. From its early stages, Kubernetes could rely on a dedicated community of users and contributors who were later federated under the CNCF (Cloud Native Computing Foundation) umbrella.

This community has been very open and eager to welcome people from any personal background, geographical location, and level of knowledge. From students to non-technical contributors, Kubernetes has always been a project that managed to use curiosity and diversity as a weapon to increase tenfold its adoption rate, and generate a whole landscape of other products that you as a user can draw on.

Local initiatives, like KCD (Kubernetes Community Days) and the CNCF ambassador program, further helped evangelize the project worldwide. Check out this Twitter post about the KCD occupancy.

In addition, many companies that adopted Kubernetes encourage their developers to contribute back to the project. In 2022, a total of 4,682 companies have contributed to the project, which includes reviews, comments, commitments, or PR (Pull Request).

Hyperscaler managed services clear adoption roadblocks

While the community has been hugely important in making Kubernetes the vibrant landscape it is today, arguably just as vital is its productization by the public cloud hyperscalers: AWS, Azure, Google Cloud and others.

The creation of public cloud services based on the Kubernetes control plane has created compelling value-add for users. In services such as EKS, AKS and GKE, the cloud provider hosts the control plane and supports its lifecycle management. It exposes minimum functions to customers, namely an API endpoint available via a private or public network. Users are responsible for deploying compute nodes to act as Kubernetes workers and run their applications packaged as containers. It is a managed service available on-demand, and a cluster can be ready for application deployment in less than 20 minutes.

Another benefit of public cloud Kubernetes is that it gives users direct access to and integration of other cloud services, such as storage, monitoring, and user identity access management (IAM). A common set of APIs is used to integrate the required services. This reduces engineering efforts to standardize security and develop integration tools.

And as usual, the public cloud premise of multi-zone availability and economy of scale attracts businesses with strong service-level objective (SLO) requirements. This is why additional capabilities are generally provided to increase elasticity and save computing costs, such as node-group autoscaling based on consumed metrics.

The Operator Framework is driving Kubernetes maturity

Kubernetes has gained a huge amount of momentum in a short time, but it’s certainly not universal or mature to the point of commoditization — yet. It is still missing crucial components for businesses to benefit from a single infrastructure stack.

For example: Kubernetes was first designed to be the home of stateless application components. But its success also hinges upon its ability to run business-critical applications, like databases and message queues. They are stateful in nature and need special care. They cannot be treated as pets.

New paradigms, such as the Operator Framework, are being designed to make it easier for SRE, DevOps, and infrastructure developers to deploy and upgrade these components.

When a Kubernetes stateful application is deployed (StatefulSet), a specific set of properties is set to guarantee a stateful behavior. However, it is not responsible for building the application components on top, such as a MongoDB cluster. This is where the Operator Framework comes into play.

Operators are Kubernetes controllers that extend their functionality to manage specific types of applications. They need to be implemented for each managed application and generally include the following features:

State management: The operator is responsible for the application state and ensures that the required resources are always available.
Upgrade management: The operator can automatically upgrade an application to a new version while ensuring that the application is always available (i.e., no downtime).
Configuration management: The operator can manage application-specific configuration and make it available inside the Kubernetes cluster.
Application-specific knowledge: The operator encapsulates the specific domain knowledge required to manage the application.

While there are hundreds of Operators out there already, the Operator Framework is still in its early days and will undoubtedly evolve over time… you can even write your own. The Operator Framework promises to make Kubernetes the de-facto platform to run any type of application, stateless or stateful, and maintain its desired state. This is a critical step to turning Kubernetes into a commodity.

Ready to start your Kubernetes journey?

Kubernetes has become the de facto standard for container orchestration and is used by enterprises of all sizes to deploy and manage applications at scale. The Kubernetes community is very active, with many tools and projects emerging to help developers in their daily work. In this article, we’ve looked at some of the most popular tools for deploying and managing applications on Kubernetes and introduced some of the fundamental aspects of Kubernetes.

While there are many different options available, each with its own advantages and disadvantages, Kubernetes provides a consistent and extensible programming model to deploy and run modern applications. This makes it possible to switch between architecture patterns and ecosystem tools as your needs change, without having to learn a completely new set of APIs or workflow engines. Kubernetes acts as the modern standard cloud API, regardless of where you run your applications — on-premises, in the cloud, or at the edge.

New to Kubernetes? Get some hands-on time and start preparing for your first certification by trying out Spectro Cloud Palette and its Cluster Profiles. Our curated profiles will help you deploy Kubernetes in any environment. They also include all the tools and components you need to prepare for your CKA, CKAD and CKS exams.

Tags:

Concepts

Open Source

Cloud

Enterprise Scale

Why Kubernetes now? A beginner’s guide to modern cloud-native infrastructure