Published
September 14, 2023

2-node edge Kubernetes clusters for HA and cost savings

Justin Barksdale
Justin Barksdale
Principal Architect

Rethinking architectures for the edge

The signs are clear: the future of applications is at the edge. 

Many of the engineers we work with are focused on deploying services to edge computing locations, primarily using Kubernetes. They’re putting their code closer to the source of data, closer to the user, in order to deliver efficient and responsive user experiences and business processes. 

This trend will only accelerate as AI/ML workloads come to dominate (as we explore in this blog).

When you’re deploying applications to the edge, ensuring availability has to be a primary consideration in your architecture.

Edge applications are likely to be business critical, and outages may mean lost revenue. 

Yet, infrastructure in edge locations is very different from the data center. They may have to contend with temperature extremes, interrupted power, infrequent servicing, and other environmental factors that increase the likelihood of system failures and downtime. 

In the event of a failure, inconsistent WAN or internet access at edge locations may hinder remote troubleshooting. Getting skilled staff and spare hardware to remote locations is often costly and slow, making outages challenging to solve quickly.

Achieving high availability (HA) at the edge

A single node cluster is rarely acceptable for a production application anywhere, and at the edge is no different. While there are many business continuity and disaster recovery techniques — such as implementing backup and recovery — the foundation of any high availability system is to eliminate single points of failure.

The industry consensus recommends a minimum of three nodes to ensure fault tolerance and create a baseline HA architecture.

However, edge locations are likely to be constrained by space, power, and cooling factors (as in the standard SWaP or SWaP-C metrics) and could struggle to accommodate three hardware nodes. 

Edge deployments are also often at massive scale, with hundreds or thousands of locations. This means the cost of deploying and maintaining edge Kubernetes clusters can be a significant concern for organizations.

The cost of running a 3-node HA cluster at the edge

Let's break down the cost components of a 3-node HA cluster at the edge:

1. Hardware Costs: Edge locations often require specialized hardware to withstand harsh environmental conditions. Acquiring and maintaining three such nodes can strain budgets. With three nodes at each location, you must also budget for more spares, cables, etc.

2. Operational Expenses: Running and managing three nodes involves higher operational costs, including power, cooling, and personnel for maintenance, plus more logistical costs around warehousing, shipping, and insurance.

3. Licensing Costs: Kubernetes is open source, but organizations may incur licensing fees for associated tools and software and support contracts, which may be priced per CPU.

4. Networking Costs: Edge locations may necessitate dedicated networking infrastructure, adding to the overall cost with cabling, switching, and routing. This is especially true if you’re implementing redundant network connectivity.

The cost of each node becomes increasingly impactful as your fleet expands to hundreds or even thousands of clusters. The time has come to explore the art of the possible. How can we rethink what high-availability architectures look like at the edge?

Transitioning to a 2-node HA Cluster

What if there was a way to ensure high availability with only two nodes? It would certainly be an attractive proposition in terms of operational costs:

1. Hardware Savings: Reducing the number of nodes from three to two immediately cuts hardware costs by one-third, which is particularly significant in edge environments where specialized hardware can be expensive.

2. Operational Efficiency: Managing two nodes is more operationally efficient than three, reducing power consumption, lower maintenance costs, and reducing spare holdings and logistics.

3. Licensing Cost Reduction: Any per-box or per-CPU license costs will be cut by a third, just like your hardware costs.

4. Networking Simplification: Managing a smaller cluster simplifies networking requirements, potentially leading to savings in network infrastructure costs.

How can you achieve a quorum with only 2 nodes?

Aside from greater hardware redundancy, there’s a simple reason why HA architectures conventionally start at three nodes. 

An odd number of nodes (minimum of three) is necessary to create a quorum for key-value stores such as etcd, which rely on the Raft consensus algorithm to mitigate against errors and data loss. 

With only two peer nodes, it would be impossible to ‘tiebreak’ and resolve a disagreement between them.

One possible solution would be to use an external database. However, edge clusters are distributed in remote facilities where skilled IT resources are often unavailable locally. Additionally, these facilities can have inconsistent WAN or internet connectivity. 

So it’s clear: any HA edge deployment should be able to stand independently without a continuous connection to external services.

The other approach for achieving reliable 2-node K8s clusters at the edge is to abandon a quorum model and adopt a Primary/Backup or Active/Standby model, which delivers the requisite high availability cluster control plane. 

This means looking at alternative Kubernetes backing stores beyond etcd, including Kine + NATS, Kine + RDBMS, and Kine + SQLite with streaming replication. 

Introducing Palette 2-node edge

With Spectro Cloud Palette, we have built on these concepts and developed a 2-node solution for high availability at the edge for organizations that cannot or will not justify the conventional 3-node architecture. 

Embedded in our edge solution lives an agent responsible for many activities, including the initial cluster bootstrapping, upgrade, and management of the underlying operating system (Ubuntu, OpenSUSE, RHEL, etc.), based on the open-source Kairos project

We have been diligently working to put more intelligence into our agent, enabling it with even more superpowers. In conjunction with leveraging a different "raftless" key-value store, these superpowers give Palette a unique ability to provide a 2-node highly available Kubernetes cluster at the edge.  The Palette agent has the intelligence to prevent many of the traditional challenges like Split brain while providing the same functionality of an active-active, highly available 2-node Kubernetes cluster.

Today this 2-node capability is available for clusters built on the popular K3s lightweight distribution, with more variations currently in testing.

Conclusion

Edge computing is here to stay, and Kubernetes is a fundamental technology for managing containerized and AI/ML workloads at the edge. While deploying a 3-node HA cluster may provide robust reliability, it comes at a significant cost. Transitioning to a 2-node HA cluster offers a compelling alternative, delivering cost savings, operational efficiency, and scalability without compromising reliability.

In the ever-changing world of technology, adaptability is vital. Organizations that can harness the power of edge computing while making cost decisions will be well-positioned to thrive in this new era of computing.

To find out more about Palette’s solutions for edge computing, get in touch.

Tags:
Operations
Observability
Enterprise Scale
Edge Computing
Subscribe to our newsletter
By signing up, you agree with our Terms of Service and our Privacy Policy