Published 2022-11-03

Deploying complex infrastructure with a Terraform state machine

SA @ Spectro Cloud

How do you handle dependencies in complex deployments?

In today’s world of interconnected cloud services, deploying application infrastructure can get pretty complicated. For example, your Kubernetes app in EKS may need several pods to share storage, requiring you to set up Amazon EFS for your cluster. Your IT department may require you to use RFC1918 IP address conservation for any EKS clusters you deploy in their main VPC. Another app might be deployed through Flux and require retrieving a SOPS key from your company’s secret store first and adding it to the cluster as a secret. Automating all the steps of actually getting a Kubernetes application into production is not easy.

Terraform has helped many companies address all or part of this problem. But, with Terraform being a desired state language, it can be difficult to deal with complex interdependencies between resources. The Terraform language is aimed at deploying everything in a single run, using resource dependencies to create resources in the right order. However, the more complex the infrastructure gets, the more difficult it becomes to avoid circular dependencies in your TF code that make deployment difficult, or sometimes even outright impossible.

A real example — and a workaround you can use today

I personally ran into this problem as I was working with a customer that uses the RFC1918 IP address conservation approach for EKS. This approach requires customizing the AWS VPC CNI so that it attaches pods to dedicated pod networking subnets that use a special RFC1918 CIDR (typically 100.64.0.0/16).

Due to limitations at the time, deploying the CNI with a custom configuration wasn’t working through CAPA(the Cluster API provider for AWS) and I needed to deploy the cluster with a vanilla CNI configuration first, then redeploy the AWS VPC CNI Helm chart over it with a custom configuration. Making this work through a single Terraform run was simply impossible, so I had to figure out a solution.

I figured that if I could just run Terraform multiple times, with slightly different configurations for each run, I could deliver a working solution for the customer.

Terraform would still be able to manage the final state of the infrastructure that it deployed, as well as correctly deprovision it on a TF destroy run. So, the question became: how can I make Terraform apply different configurations, depending on which step of the deployment process it is? And how can I maintain the state of which deployment step a cluster is in?

Let’s take this step by step…

To start with the easy part of maintaining state: this could be accomplished by using a tag on the cluster that stored the current step. I chose to use a simple step tag that contains a number to indicate which step of the multi-step process the cluster is on. A fresh cluster would start on step 0 and complete at an arbitrary number, incrementing by 1 for each run.

The next challenge was determining the correct value of the step tag dynamically during a TF run. I achieved this through an external resource in Terraform:

data "external" "wait_for_cluster_state" { program = ["bash", "./modules/cluster/wait_for_cluster_state.sh"] query = { CLUSTER_NAME = var.name SC_HOST = var.sc_host SC_API_KEY = var.sc_api_key SC_PROJECT = var.sc_project_name } }

This will execute the wait_for_cluster_state.sh bash script that queries the Spectro Cloud Palette API to determine if the cluster exists and which step it is on. It also verifies that all pending changes on the cluster have been fully implemented and will loop until that is the case. This is helpful in scenarios where, for example, the new VPC CNI config has been applied, but it takes the cluster a while to fully implement the changes.

This is the script I use for this external resource:

#!/bin/bash eval "$(jq -r '. | to_entries | .[] | .key + "=" + (.value | @sh)')" # results in: # $CLUSTER_NAME, $SC_HOST, $SC_API_KEY, $SC_PROJECT # Get project UID PROJ_UID=$(curl -s -H "ApiKey:$SC_API_KEY" https://$SC_HOST/v1/projects | jq -r --arg SC_PROJECT "$SC_PROJECT" '.items[].metadata | select(.name==$SC_PROJECT) | .uid') # Get cluster state info CLUSTER_STATE_DATA=$(curl -s -H "ApiKey:$SC_API_KEY" -H "projectUid:$PROJ_UID" https://$SC_HOST/v1/spectroclusters\?filters\=metadata.name\=${CLUSTER_NAME}ANDstatus.state!="Deleted"\&fields\=metadata.uid,metadata.labels,status.state,status.conditions,status.packs) # Parse status.state field CLUSTER_STATE=$(echo $CLUSTER_STATE_DATA | jq -r '.items[0].status.state') if [ "$CLUSTER_STATE" = "Running" ]; then # Parse metadata.labels.step field CLUSTER_STEP=$(echo $CLUSTER_STATE_DATA | jq -r '.items[0].metadata.labels.step') # Loop as long as their are machine pools or packs still being created/applied while echo $CLUSTER_STATE_DATA | jq -r '.items[] | "status=" + .status.conditions[].status, "status=" + .status.packs[].condition.status' | grep -e "status=True" -v > /dev/null do sleep 35 # Refresh cluster state info for next iteration CLUSTER_STATE_DATA=$(curl -s -H "ApiKey:$SC_API_KEY" -H "projectUid:$PROJ_UID" https://$SC_HOST/v1/spectroclusters\?filters\=metadata.name\=${CLUSTER_NAME}ANDstatus.state!="Deleted"\&fields\=status.conditions,status.packs) done # Output current cluster step jq -n --arg CLUSTER_STEP "$CLUSTER_STEP" '{step:$CLUSTER_STEP}' else # Cluster does not exist, output step -1 jq -n '{step:"-1"}' fi

The script returns -1 if the cluster doesn’t exist yet, otherwise it returns the value of the step tag on the cluster. We can then use the value of the current step in our code as data.external.wait_for_cluster_state.result.step. I use the following code to generate two local variables that make it easier to use the cluster state in other places of the code:

locals { cluster_exists = tonumber(data.external.wait_for_cluster_state.result.step) >= 0 ? true : false step = local.cluster_exists == true ? (lookup(local.eks_state_map, data.external.wait_for_cluster_state.result.step).last_step == true ? tonumber(data.external.wait_for_cluster_state.result.step) : tonumber(data.external.wait_for_cluster_state.result.step) + 1) : 0 eks_state_map = { # map of unique configs for each step

The cluster_exists variable is very useful for other blocks in Terraform, where you only want to define a resource if the base cluster has been deployed. For example, I needed to retrieve the value of an AWS security group that gets auto-created when the EKS cluster is deployed. So, I used this variable to define the TF resource like this:

data "aws_security_group" "eks" { count = local.cluster_exists == true ? 1 : 0 tags = { "aws:eks:cluster-name" = var.name } }

The step variable combines several pieces of logic:

  • If the cluster does not exist, return 0
  • If the cluster exists and its current step value is not the last step in the process, increment the step value by 1 and return that value
  • If the cluster exists and its current step value is the last step in the process, return the step value as-is.

The last step: the state map

To determine which step constitutes the last step in the process, we come to the final piece of the puzzle: the state map. This is a local variable that provides the ability to define a unique configuration for every individual step. The basic structure looks like this:

eks_state_map = { "0" = { last_step = false tags = ["step:0"] # Variables for step 0 # variable1 = value_0 }, "1" = { last_step = false tags = ["step:1"] # Variables for step 1 # variable1 = value_1 }, "2" = { last_step = true tags = ["step:2"] # Variables for step 2 # variable1 = value_2 } }

I found the most powerful way to leverage the state map is through using dynamic blocks. For example, I defined dynamic blocks for the spectrocloud_cluster_eks resource, so that I can dynamically set the cluster configuration based on the current step:

resource "spectrocloud_cluster_eks" "this" { name = var.name tags = lookup(local.eks_state_map, local.step).tags cloud_account_id = data.spectrocloud_cloudaccount_aws.this.id cloud_config { ssh_key_name = var.sshKeyName region = var.aws_region endpoint_access = "public" vpc_id = data.aws_vpc.eks.id az_subnets = { "${var.aws_region}a" = "${data.aws_subnet.eks-prv-0.id},${data.aws_subnet.eks-pub-0.id}" "${var.aws_region}b" = "${data.aws_subnet.eks-prv-1.id},${data.aws_subnet.eks-pub-1.id}" "${var.aws_region}c" = "${data.aws_subnet.eks-prv-2.id},${data.aws_subnet.eks-pub-2.id}" } } dynamic "cluster_profile" { for_each = lookup(local.eks_state_map, local.step).profiles content { id = cluster_profile.value.id dynamic "pack" { for_each = cluster_profile.value.packs content { name = pack.value.name tag = pack.value.tag values = pack.value.values dynamic "manifest" { for_each = try(pack.value.manifests, []) content { name = manifest.value.name content = manifest.value.content } } } } } } dynamic "machine_pool" { for_each = lookup(local.eks_state_map, local.step).machine_pools content { name = machine_pool.value.name min = machine_pool.value.min count = machine_pool.value.count max = machine_pool.value.max instance_type = machine_pool.value.instance_type disk_size_gb = machine_pool.value.disk_size_gb az_subnets = machine_pool.value.az_subnets } } }

Which then allows me to set the desired state per step in the state map like so:

eks_state_map = { "0" = { last_step = false tags = ["step:0"] # Variables for step 0 profiles = [ { id = data.spectrocloud_cluster_profile.infra_base_profile.id packs = [] } ] machine_pools = [ { name = "temp-worker-pool" min = 1 count = 1 max = 1 instance_type = "t3.large" disk_size_gb = 60 az_subnets = { "${var.aws_region}a" = data.aws_subnet.eks-prv-0.id "${var.aws_region}b" = data.aws_subnet.eks-prv-1.id "${var.aws_region}c" = data.aws_subnet.eks-prv-2.id } } ] }, "1" = { last_step = false tags = ["step:1"] # Variables for step 1 profiles = [ { id = data.spectrocloud_cluster_profile.infra_base_profile.id packs = [] }, { id = data.spectrocloud_cluster_profile.infra_cni_profile.id packs = [ { name = "cni-aws-vpc-addon" tag = "1.0.0" values = templatefile("${path.module}/config/cni-aws-vpc.yaml", { eks-security-group : local.eks-security-group-id, eks-subnet-a : local.eks-pod-subnet-a eks-subnet-b : local.eks-pod-subnet-b eks-subnet-c : local.eks-pod-subnet-c }) } ] } ] machine_pools = [ { name = "temp-worker-pool" min = 1 count = 1 max = 1 instance_type = "t3.large" disk_size_gb = 60 az_subnet = { "${var.aws_region}a" = data.aws_subnet.eks-prv-0.id "${var.aws_region}b" = data.aws_subnet.eks-prv-1.id "${var.aws_region}c" = data.aws_subnet.eks-prv-2.id } } ] }, }

Finally, we need to tie it all together and make Terraform output some useful information so that we can use a simple script to loop TF runs until the last step is reached. First, we define some useful outputs:

output "cluster_name" { value = spectrocloud_cluster_eks.this.name } output "completed_step" { value = local.step } output "last_step" { value = lookup(local.eks_state_map, local.step).last_step }

We then use the last_step output to determine if we need to perform another TF run, in the following script:

terraform fmt -check terraform init -input=false -upgrade terraform validate until [ "$LAST_STEP" = "true" ] do echo "Applying Terraform configuration..." terraform apply -auto-approve -input=false echo "Terraform apply complete." LAST_STEP=$(terraform output -raw last_step) if [ "$LAST_STEP" = "true" ] || [ "$LAST_STEP" = "false" ]; then if [ "$LAST_STEP" = "false" ]; then echo "Waiting 2 minutes before starting next Terraform cycle..." sleep 120 fi else LAST_STEP="true" fi done

and with this, our Terraform state machine is complete. I hope this state machine walkthrough is useful for tackling your more complex Terraform infrastructure deployment challenges. It certainly helped me get more out of Terraform than I was able to before.

New CAPA developments ahead!

While my solution works, it was always meant as a temporary workaround until the technology existed to no longer need multiple runs to deploy the infrastructure.

So, while this workaround is still a useful tool to keep in your back pocket, I’m pleased to say that we have also contributed significant improvements to CAPA to get the custom VPC CNI functionality working out of the box. This capability is new in Palette 3.0 — and it means I’ll be able to move my customer’s automation back to a single Terraform run. To find out more about Palette 3.0, check out the release notes.

Related Articles

  • Blog Posts

    How to keep your Kubernetes secrets… secret

    Read our article
  • Blog Posts

    Kubernetes for dummies tutorial

    Read our article
  • Blog Posts

    Enterprise challenges for containers and Kubernetes

    Read our article
  • Blog Posts

    Why we Invested in Spectro Cloud

    Read our article
Be the first to receive the latest on
K8s, Palette, our upcoming webinar, events, and much more!

We are using the information you provide to us to send you our montly newsletter. You may unsubscribe at any time.
For more information, please see our Privacy Policy.

Spectro Cloud uniquely enables organizations to manage Kubernetes in production, at scale. Our Palette management platform gives effortless control of the full Kubernetes lifecycle, across clouds, data centers, bare metal and edge environments.
Connect with us
Connect with us

© 2022 Spectro Cloud®. All rights reserved.