Published

July 20, 2022

How to share data between containers with K8s shared volumes

A quick guide to using Kubernetes shared volumes to communicate data or config between two Kubernetes containers.

Kubernetes shared volumes between two containers

Kubernetes is a powerful container orchestrator that can support many different deployment patterns. For the vast majority of use cases, Kubernetes deployment runs a single container per pod. This model works great for most stateless applications as state is offloaded to a database or some other component.

However, some applications require multiple containers to work together. A common example of this design pattern involves a sidecar container deployed alongside the main pod for exporting logs or metrics. Besides the sidecar use case, there may also be times where sharing a data volume may simplify application design or deployment patterns.

Since containers in Kubernetes clusters are ephemeral by design, the above use cases need a way to persist data and share storage. Otherwise, files and data stored on-disk inside the containers will be lost whenever the container crashes or is restarted.

This is where Kubernetes volumes come into play. In this guide, you will first learn about Kubernetes volumes and the main types that Kubernetes supports. Then, we will walk through an example of sharing data between containers using Kubernetes shared volumes.

Kubernetes volumes

In Kubernetes, a pod is a group of containers with shared storage and network resources. This means that containers with a shared storage will be able to communicate with each other. Kubernetes uses volumes as an abstraction layer to provide shared storage for containers.

Fundamentally, volumes are directories that can be mounted to pods. One or more volumes can be mounted to a single pod as specified under volumes and containers[*].volumeMounts sections in the pod specs.

Kubernetes broadly supports three categories of volumes:

Persistent volumes: storage in the Kubernetes cluster that is preprovisioned or created via dynamic provisioning using storage classes
Projected volumes: a type of storage that can map several existing volumes in the same directory
Ephemeral volumes: storage that does not persist across restarts like emptyDir (useful for logs), configMap, or secret

The various types of storage listed on the Kubernetes website all fall under one of these categories of volumes. For example, storage classes provided by cloud providers such as AWS EBS volumes, allows pods to persist data to EBS volumes using Kubernetes syntax. Other open-source file systems like Glusterfs also allow complex applications the ability to share files.

Now that we understand what Kubernetes volumes are, let’s walk through a simple example.

Creating a pod with a shared volume

To create a shared storage, create a volume by invoking the Kubernetes API. Specify the volume for the pod in .spec.volumes:

volumes:

 volumes:
  - name: example-volume
    emptyDir: {}

To use a volume, simply specify the volume by name in the .spec.containers[*].volumeMounts section:

  apiVersion: v1
kind: Pod
metadata:
  name: example
spec:
  containers:
    - image: k8s.gcr.io/test-webserver
      name: test-container
      volumeMounts:
        - mountPath: /logs
          name: example-volume

Now that we know how volumes and volumeMounts work, let’s take a full example of two containers sharing data via volumes.

Here is an example where a Debian image writes a simple “Hello world” HTML file and mounts it as root index.html on the nginx server:

example.yaml

  apiVersion: v1
kind: Pod
metadata:
  name: shared-storage-example
spec:
  volumes:
    - name: shared-data
      emptyDir: {}
  containers:
    - name: container-1
      image: nginx
      volumeMounts:
        - name: shared-data
          mountPath: /usr/share/nginx/html
    - name: container-2
      image: debian
      volumeMounts:
        - name: shared-data
          mountPath: /data
      command: ["/bin/sh"]
      args: ["-c", "echo Hello world > /data/index.html"]

Note that a volume of type emptyDir was created with the name “shared-data”. This volume is shared between two containers (container-1 and container-2). The nginx container (container-1) mounts the volume at /usr/share/nginx/html, whereas the same volume is mounted on the debian container (container-2) under /data.

When this pod is deployed to Kubernetes, the Debian container writes “Hello world” to the shared volume and exits. Then the index.html file is mounted to the root directory and nginx will return “Hello world” when curled inside the nginx container.

To demonstrate this example, create the pod and the two containers:

kubectl apply -f example.yaml

Then verify the state of the pods:

kubectl get pod shared-storage-example --output=yaml

The output should show that the Debian container has terminated (relevant output):

 apiVersion: v1
kind: Pod
metadata:
  ...
  name: shared-storage-example
  namespace: default
  ...
spec:
  ...
  containerStatuses:

  - containerID: ...
    image: debian
    ...
    lastState:
      terminated:
        ...
    name: container-2

And the nginx container should be running still:

  - containerID: docker://96c1ff2c5bb ...
    image: nginx
    ...
    name: container-1
    ...
    state:
      running:
    ...

Now exec into the nginx container:

kubectl exec -it shared-storage-example -c container-1 -- /bin/bash

Since the Debian container has written “Hello world” to index.html into the nginx root directory, we can curl the server to get the expected response:

root@shared-storage-example:/# curl localhost

The output is a web page with simple text:

Hello world

While this example is trivial, think of a small-scale Django application where the static assets are generated by a Django server and hosted via nginx. Django expects the STATIC_ROOT setting to store and serve these files. Upon startup, the Django server can run the collectstatic command:

$ python manage.py collectstatic

This will generate the statisfiles (e.g. admin page css). Now nginx can serve these files instead of the “Hello world” text blob from the example above. Also, instead of using an emptyDir, a persistent volume can be created to save static files to persistent disk so it’s not regenerated every time a pod is recreated.

Real-world scenarios

Taking a step back, a common scenario for using a shared volume is to support applications where there is some helper function that needs to be deployed alongside the primary application.

Let's look at more examples where using a Kubernetes shared volume makes sense:

Exporter sidecars: the main application can log to file, while a sidecar container can stream from a shared volume and export it to a centralized logging infrastructure (e.g. Elasticsearch, Splunk, etc).
Configuration refresh: some applications may require live-reloading of configuration without restarting the entire pod. Prometheus is a good example where a secondary container watches for configuration changes mounted via a ConfigMap and broadcasts changes to the main application (e.g. new alert rules, scrape endpoints, etc)
Shared functionality: some workloads have functionality that’s handled more efficiently via sharing storage rather than sending data over the network asynchronously. For example, if you have some image, audio, or video upload microservice, it may entail some compression or format conversion step. Rather than sending big files over the network, it can be stored and another container can do the long-running task. This may be especially beneficial when the two containers are written in different languages (e.g. compression library is available in C but main application was written in Java).
Initialization step: a lot of open-source projects now support auto-generating self-signed TLS certificates. An initialization or helper container can start up and generate those files and pass it onto the main application prior to startup.
Debugging: as of Kubernetes v1.23, ephemeral containers are now in beta. These containers or other helper sidecars can be deployed in tandem to inspect and export hard-to-detect bugs and errors.

Takeaways

Kubernetes was designed to support running multiple containers in a single pod to share both storage and network resources. This property helps support scenarios where there are auxiliary or helper applications that support the primary application by sharing data and files (e.g., exporting logs, refreshing configurations, and more). To implement this pattern, we can leverage Kubernetes shared volumes and mount them to all the containers that need access in a pod.

Interested in learning more about storage topics on Kubernetes? Why not check out our webinar with data gurus Portworx.

Jul 20, 2022