Exposing the elephant at the edge: security

Imagine the scene:

A drone buzzes overhead, collecting real-time video data: maybe it’s monitoring crop health on a farm, surveying a disaster zone to direct rescuers to injured survivors, or scanning rooftops looking for broken solar panels.

The drone returns to its nest, and the data from its cameras and sensors is dumped to a small edge Kubernetes cluster, formed of three industrially hardened machines.

This is a perfect edge computing use case: insights from the survey are mission-critical and urgent, yet the video files are large and network connectivity in the field is patchy at best. So the cluster starts processing the footage locally with machine learning algorithms — perhaps automatically identifying diseased plants, locations of combatants, or failed solar panels, alerting decision-makers tens or hundreds of miles away to key action points.

The drone nest may be out in a (literal) field, in an urban area in the bed of a pickup truck, or on a ship. All very different environments, but they have one thing in common: you can’t take physical security for granted.

Drones are playing an increasingly valuable role in many industries — like agriculture — and expose the challenges of computing at the edge. Source: https://flic.kr/p/2hpLjkB

What should we worry about here?

Ultimately, the sensitive data and applications running on this edge Kubernetes environment needs to be protected from a few key things:

Theft of edge data

In many edge use cases, there is an elevated risk that someone could walk up to your edge computing devices and simply take them, whether it’s a burglar or a battlefield opponent. The risk here is not the loss of the hardware, but the loss of the data and other intellectual property stored on it, and the risk of the active device being accessed and its connections used to compromise other systems.

It’s difficult to protect all data on a running system from being stolen if physical security is compromised, but there are mitigations that can be applied, and theft can be made harder. In particular you can physically lock away data on machines that are powered down at night or when not in use, through several different methods:

**A physical key, such as a *Yubikey*. **Without the key, the device won’t boot or be able to decrypt any data stored locally. Because the key is easily portable, it can simply be removed when the cluster doesn’t need to be operational. The downside is that someone needs to be on-site to insert and remove the key.
A locally provided secret, like a password or PIN. In this case, the device won’t boot or the operating system won’t gain access to any encrypted data. The actual key needs to be stored in something like a TPM (Trusted Platform Module) on the device, but cannot be unlocked without user intervention. Again, this doesn’t work well for devices that need to be able to boot and operate unattended.
A remote system. In this option, an external central party available via network is able to uniquely identify the edge device and help it unlock local data. This option allows for unattended operation, but maintains central control — if a device is stolen, it can effectively be bricked, or if a higher level of security is needed then the central platform may even require human interaction to approve each boot. The challenge here is how to handle scenarios where the network connection is unavailable.

In security, it's important to realize that there’s never a perfect option. There’s only a series of compromises to be made (no pun intended). The operating model you choose has to take into account the risk profile and importance of the data, and how the system is used. It’s important to balance strict security against operational usability. Imagine a farmer losing a morning’s productivity because she damaged her Yubikey, or a disaster-response team unable to access a drone system because they didn’t have the right PIN. The consequences could be catastrophic.

Tampering

Falsifying edge data or altering the way it’s processed is another important consideration. Protecting the system from tampering becomes critical, and this needs to be done both with physical security and operating system and Kubernetes distribution security.

Physical tampering

Physical tamper protection can be achieved in many ways, from passive security screws and locks that resist access, to more active methods such as switches — if a bad actor attempts to open the server case it can set off certain events that result in the data being locked or the equipment destroyed.

It’s also common to lock down physical interfaces and ports (e.g. management ports) so that they aren’t operational if someone connects a device to them, either through turning them off in firmware, or by physically blocking or disconnecting them.

Enter immutability!

A fairly recent development is the rise of immutability for operating systems. This protects against tampering by using read only file systems. This means that an attacker gaining remote access to the machine cannot simply alter the operating system. Any malicious software running in memory will be removed on reboot as it can’t install itself permanently to the operating system.

Likewise, field engineers or system administrators can’t alter the operating system either, meaning that untrustworthy internal actors cannot easily install malicious software. Combined with physical tamper detection, this makes it difficult to systematically take control of devices or exfiltrate data from them by compromising the operating system.

Immutable operating systems also go hand-in-hand with atomic upgrades. Because the OS or users on the OS can’t alter the OS itself, an upgrade has to be performed by installing an entirely new OS image (often on a separate ‘B’ partition) and booting over to it. As this upgrade path is the only path to make alterations — rather than incremental patches to the running OS — it is easier to enforce stricter security around the software supply chain. The A/B partition upgrade approach, incidentally, doesn’t just help with security: reliability is improved because in the event of a failed upgrade, the box will roll back to the previous known-good configuration.

Immutable operating systems are very powerful from a security and reliability perspective, but they are also more difficult to modify and configure for obvious reasons. To solve this, there are new and powerful tools like kairos.io that enable the use of GitOps to drive operating system configuration and upgrades.

Operating system and Kubernetes distro integrity

Security standards like FIPS, DISA STIG, and NIST-800 are all extremely important to consider when building edge devices. Imagine you’re baking a cake, and somebody slips a mind-controlling ingredient into it. Now they could ask you for your passwords and get any other data they need from you.

Standards like NIST provide for security controls and process best practice at every stage of the development and operational lifecycle in the software supply chain, from hardening configurations to implementation processes. They’re not perfect, but they help ensure no backdoors are left open in the solutions you’re using. In other words, they help to ensure your baker’s ingredients are verified and that the baking of the cake is done according to certain standards.

There’s no point using immutable operating systems, for example, without knowing that what’s in the operating system is fit for purpose. A single compromised crypto algorithm in the operating system can enable access to the entire cluster, or an entire fleet of clusters.

So what’s the answer?

Security is hard, and getting harder, across all areas of the IT and OT landscape, but edge is particularly challenging. Our research this summer found that security and compliance was the top concern for those using Kubernetes in production looking at adopting edge.

The answer to security concerns, as it always has been, is twofold:

First, be realistic about risk and the threats you face in your specific environment, with your specific applications, data, processes, users and physical locations. Risk can never be cleared completely, no matter how much time and money you throw at it — and overconfidence can be dangerous. Your planning must encompass considerations for mitigation and recovery as well as prevention and detection.

Second, plan for multifaceted, multilayered defense to tackle all parts of the attack surface. The risks we’ve discussed in this blog — such as tampering — are particularly relevant to edge devices, but they’re not the only challenges, and attaining a robust security posture means many things, from putting in place strong identity and access management with zero trust principles, plus SOC-style monitoring and event alerting, staff training and playbook practice, auditing of vendors, secure coding practices… all of these ingredients help deliver an environment where security thrives, whether your cluster lives in the cloud, on the battlefield, or in a wheatfield.

It sounds like a lot, right? Don’t think that this burden is all on you. The whole industry is working hard to tackle the ‘elephant in the room’ of edge computing, making security foundational to the infrastructure stack. For example, we’ve just announced major enhancements to our Palette Edge platform that bakes in tamperproof immutable edge OS capabilities, a hardened Kubernetes distro, and much more.

We believe the future is bright for security at the edge.

Sep 29, 2022