Published

July 19, 2022

How to fill the Terraform Day 2 operations gap

This blog accompanies a recent webinar. If you’d prefer to watch and listen instead of reading, check out the recording here!

Terraform: the infrastructure as code tool

HashiCorp Terraform is wildly popular, and for good reason. As an open-source infrastructure-as-code tool it provides a declarative workflow for provisioning infrastructure resources for public clouds, private data centers, cloud managed services, and a growing number of independent software solutions.

To do this, Terraform makes use of an extensive provider network, which you can think of as plugins or abstractions of a given technology API. Just pick the right provider for your chosen technologies, whether that’s AWS resources, Nomad, or indeed Kubernetes and related ecosystems projects like Helm.

Yes, you can compose, deploy and configure Kubernetes with Terraform

When it comes to Kubernetes, Terraform can deploy and configure all of the infrastructure necessary to build a cluster from scratch, including private/public networking and compute resources.

Terraform can then pass a Cloud-init script to perform a Kubernetes installation, or call on a Kubernetes installation tool like Kubespray, kops, Kubeadm, et. Terraform can also deploy and configure the major cloud provider managed K8s services, such as AWS EKS, as well as proprietary private data center Kubernetes solutions such as OpenShift, VMware Tanzu, and Rancher.

deploy and configure Kubernetes with Terraform

In other words, it’s pretty versatile.

But what about managing running clusters?

Terraform is good at creating Kubernetes clusters from a Day 0 and Day 1 perspective. This makes it particularly helpful for non-production use of Kubernetes where a cluster is created, lives a short life (for example for testing), then is destroyed.

But long-lived Kubernetes clusters that run and evolve in production environments pose challenges for Terraform. That’s because it’s mainly designed as an infrastructure provisioning tool.

Although Terraform can directly manage all Kubernetes resources, including Deployments, Services, Custom Resources (CRs and CRDs), Policies, Quotas and more, managing the lifecycle of Kubernetes clusters via Terraform tends to be more of a traditional operations approach.

Terraform is mostly used for first provisioning of infrastructure

Often a minimal Kubernetes cluster configuration is provided to development teams in either a request-driven process or self-service model where Terraform is run directly or abstracted by another tool.

This pattern leads to Kubernetes clusters that are provisioned with Terraform once, then highly customized outside of Terraform on Day 2 to Day X. Developers work on these clusters iterating on various cluster settings, middleware, tooling, and the deployment of applications into the cluster. Once the final desired configuration is determined for a Kubernetes cluster, that fixed state can be written into a Terraform config for reuse deploying more clusters into a production environment.

The reality with Kubernetes is that often developers and Kubernetes people don’t know how to use Terraform and don’t really want to learn. Can you blame them? There’s so much focus on using Kubernetes ecosystem tools to provision software, such as Kubernetes manifests, Helm Charts, Kustomize, and GitOps tools like ArgoCD and Flux. But when other tools are responsible for application deployment and configuration, this breaks down the workflow of using Terraform to manage a Kubernetes cluster’s lifecycle beyond the core infrastructure provisioning piece.

The outcome is often what is known as ‘config drift’: changes made to resources deployed by Terraform that are made or introduced outside of Terraform. Terraform has historically not been the best tool when it comes to detecting drift. Often Terraform users will implement workarounds for this by scheduling the Terraform Plan or Refresh operation at regular intervals as well as using third party tools. This only recently changed with the announcement of drift detection capabilities in the commercial offering of Terraform Cloud Business. A fundamental difference between the Terraform approach to drift detection with Kubernetes is the fact that Terraform state is tracked outside of the Kubernetes cluster. Palette is able to enforce drift detection by running inside of a Kubernetes cluster moving reconciliation locally.

This is where Palette comes in.

Bridging the gap with Palette

Palette is a Kubernetes lifecycle management solution created by us here at Spectro Cloud.

Like Terraform, Palette deploys and configures the infrastructure used to build a Kubernetes cluster.

Like Terraform, Palette uses a declarative model — this time, based on the open-source Cluster API provider network (CAPI) as the underlying technology.

And like Terraform, Palette is designed to support all kinds of infrastructure environments, from public cloud to bare metal.

So you might be thinking “hey, I have Terraform, why would I need this?”

Actually, Palette works well together with Terraform to solve the real-world limitations we’ve just discussed.

The key to this is Palette’s supercharged implementation of Cluster API and its Cluster Profiles. Cluster Profiles are a blueprint, defining a cluster’s state for things you would like to have control over in a Kubernetes environment — but with Palette, this blueprint includes all possible layers of the cluster, not just the OS, K8s distro, network and storage layers, but components like ingress, monitoring, security, service mesh and the production applications scheduled into the cluster.

Palette blueprint for all cluster layers

It's not a problem if your teams like to use different provisioning tools, because Cluster Profiles can consume existing raw manifest files, Helm charts, and Kustomize manifests to define any layers of a Kubernetes cluster.

And nor is it a big risk if teams modify the configuration of your clusters away from the defined initial template. Through the implementation of Cluster API controllers and active reconciliation loops, Cluster Profiles provide drift detection as a native function for layers that have been declared by Palette, out-of-the-box for any clusters deployed by Palette. Palette scans each cluster every two minutes, comparing the cluster config against the declared profile. It ensures that required components are present on the cluster, at the right version, and appropriately configured — and it automatically remediates any exceptions. So if you mandate for security and compliance reasons that your cluster should have tools like Falco, Open Policy Agent, Prisma Cloud Compute (Twistlock), or Snyk installed, or logging and monitoring tools like Elastic-Fluentd, Prometheus-Grafan, or Splunk, you can be sure they won’t be removed.

To make changes to any layer in the cluster, updated versions of a Cluster Profile are created then propagated to the clusters associated with the profile. When changes to a cluster are declared, Palette will orchestrate the changes on the clusters, fully automating the change: from applying a new manifest, performing a Helm upgrade, or in the case of infrastructure level changes, orchestrating rolling upgrades to perform immutable infrastructure changes.

An example of an immutable change would be upgrading the version of Kubernetes by rolling in new virtual machines or bare metal servers and draining the older nodes while maintaining a highly available cluster. Palette also provides a solution for self-service consumption or Kubernetes-as-a-Service, multi-tenancy, role-based access controls, integration with external identity providers, Kubernetes roles and role bindings, fleet management, bare metal provisioning, and various edge use cases.

You can keep using Terraform

The Palette solution can encompass all the elements of your Kubernetes cluster stacks that your various dev teams work with, and it can manage changes to your clusters automatically over time via a cluster profile blueprint. Does that mean we’re suggesting you replace Terraform? No, absolutely not.

We’ve built Palette to work alongside Terraform, so you can keep using the tools you’re familiar with.

Spectro Cloud maintains a verified HashiCorp Terraform Provider for Palette, which allows Terraform users to gain all the capabilities of the Palette platform.

keep using Terraform and Palette together

Instead of using Terraform to directly build Kubernetes clusters on infrastructure providers or consuming managed Kubernetes services, Terraform can be used to configure and manage the Palette platform. This provides several advantages:

You can stick to the workflows you know and trust

The Palette platform provides Graphical User Interface workflows, however, most companies have adopted strict policies to automate while preventing the use of point-and-click operations. Palette has an API, but the direct consumption of APIs can be challenging and result in a lot of highly customized scripting and automation work. Terraform excels in this area with the HashiCorp Language (HCL), a vast Terraform Provider network, and vibrant user community.

Existing investments in Terraform open source expertise or the various commercial Terraform solution offerings, Terraform users are in a position to quickly adopt the use of Palette without making significant changes to current operations, pipelines, or workflows. Terraform can be used to create separation of duties by having separate configurations to manage the creation of Cluster Profiles separately from the creation of Kubernetes clusters. Terraform provides a documented record of all operations passed to Palette along with versioning through the use of a version control system.

Palette provides Graphical User Interface workflows

You can orchestrate infrastructure outside of Kubernetes, too

Palette is highly specialized in the lifecycle management of Kubernetes clusters across multiple operating environments and conditions.

While some organizations have fully factored their applications and dependent services to run containerized, the reality is that most organizations are not fully containerized: they’re at some phase of a journey that often involves a dependence on non-containerized applications or services that live outside of Kubernetes.

This is a great use case for combining Terraform and Palette, where complex infrastructure deployments can be orchestrated inside and outside of Kubernetes environments. Terraform can be used to automate Palette, and a Palette-created cluster can be used to trigger Terraform, so there are a number of automation scenarios that can be designed between both solutions.

orchestrate infrastructure outside of Kubernetes

You get centralized visibility of your fleet

Terraform is very good at provisioning infrastructure across multiple environments but it is not designed to provide visibility into the health and status of all of the resources it has created. Palette provides a single point of visibility for all of the Kubernetes clusters it creates as well as the health of the infrastructure they are running on. This provides an operational view and a solution for managing a fleet of Kubernetes clusters without having to jump into different cloud consoles, disparate private datacenter infrastructure tools, combing through hundreds or thousands of Terraform state files, or an insane amount of kubectl commands.

Terraform and Palette: better together

Terraform and Palette are a great combination when it comes to managing Kubernetes environments. Palette fills in the gaps and enhances the operational experience for Terraform and Kubernetes users. Palette also extends the reach of Terraform outside of public clouds by providing a simple to consume experience when deploying Kubernetes clusters into VMware, OpenStack, bare metal environments, or a variety of edge deployment scenarios.

Ready to get started? Check out our docs pages, and sign up for free access to Spectro Cloud Palette — no credit card required.

Harshi Corp Terraform and Spectro Cloud Palette

Jul 19, 2022