Published

April 29, 2023

K8sGPT + LocalAI: Unlock Kubernetes superpowers for free!

As we all know, LLMs are trending like crazy and the hype is not unjustified. Tons of cool projects leveraging LLM-based text generation are emerging by the day — in fact, I wouldn’t be surprised if another awesome new tool was published during the time it took me to write this blog :)

For the unbeliever, I say that the hype is justified because these projects are not just gimmicks. They are unlocking real value, far beyond simply using ChatGPT to pump out blog posts 😉. For example, developers are boosting their productivity directly in their terminals via Warp AI, in their IDEs using IntelliCode, GitHub Copilot, CodeGPT (open source!) , and probably 300 other tools I have yet to encounter. Furthermore, the use cases for this technology extend far beyond code generation. LLM-based chat and Slack bots are emerging that can be trained on an organization’s internal documentation corpus. In particular, GPT4All from Nomic AI is a fantastic project to check out in the open source chat space.

However, the focus of this blog is yet another use case: how does an AI-based Site Reliability Engineer (SRE) running inside your Kubernetes cluster sound? Enter K8sGPT and the k8sgpt-operator.

Here’s an excerpt from their README:

k8sgpt is a tool for scanning your Kubernetes clusters, diagnosing, and triaging issues in simple English. It has SRE experience codified into its analyzers and helps to pull out the most relevant information to enrich it with AI.

Sounds great, right? I certainly think so! If you want to get up and running as quickly as possible, or if you want access to the most powerful, commercialized models, you can install a K8sGPT server using Helm (without the K8sGPT operator) and leverage K8sGPT’s default AI backend: OpenAI.

But what if I told you that free, local (in-cluster) analysis was also a straightforward proposition?

That’s where LocalAI comes in. LocalAI is the brainchild of Ettore Di Giacinto (AKA mudler), creator of Kairos, another fast-growing open source project in the Kubernetes space. Here’s a brief excerpt from the LocalAI README:

LocalAI is a straightforward, drop-in replacement API compatible with OpenAI for local CPU inferencing, based on llama.cpp, gpt4all and ggml, including support GPT4ALL-J which is Apache 2.0 Licensed and can be used for commercial purposes.

LocalAI’s artwork inspired by Georgi Gerganov’s llama.cpp

Together, these two projects unlock serious SRE power. You can use commodity hardware and your data never leaves your cluster! I think the community adoption speaks for itself:

There are three phases to the setup:

Install the LocalAI server
Install the K8sGPT operator
Create a K8sGPT custom resource to kickstart the SRE magic!

To get started, all you need is a Kubernetes cluster, Helm, and access to a model. See the LocalAI README for a brief overview of model compatibility and where to start looking. GPT4All is another good resource.

Ok… now that you’ve got a model in hand, let’s go!

        # Clone the repo
git clone https://github.com/TylerGillson/helm-charts

# Check out the branch with support for HTTP basic auth.
# This approach replaces the latest approach suggested in the LocalAI
# README: https://github.com/go-skynet/LocalAI#helm-chart-installation-run-localai-in-kubernetes
# Things are evolving quickly!
git checkout tyler/replace-datavolume-with-init-container

Next, customize values.yaml:

        deployment:
 image: quay.io/go-skynet/local-ai:latest
 env:
   threads: 14
   contextSize: 512
   modelsPath: "/models"

# Optionally create a PVC, mount the PV to the LocalAI Deployment,
# and download a model to prepopulate the models directory
modelsVolume:
 enabled: true
 url: "https://gpt4all.io/models/ggml-gpt4all-j.bin"
 pvc:
   size: 6Gi
   accessModes:
   - ReadWriteOnce
 auth:
   # Optional value for HTTP basic access authentication header
   basic: "" # 'username:password' base64 encoded
service:
 type: ClusterIP
 annotations: {}

And create a new Helm release:

# Install the helm chart
helm install local-ai charts/local-ai -n local-ai --create-namespace

Assuming all is well, a local-ai Pod will be scheduled and you will see a pretty Fiber banner in the logs 🤗

And the init container is happily downloading your model…

Now for part two: installing the K8sGPT operator. We’ll use another fork here — apologies for that — hopefully this will be upstreamed soon and I can simplify the blog!

# Clone my fork of the k8sgpt-operator repo
git clone [https://github.com/TylerGillson/k8sgpt-operator](https://github.com/TylerGillson/k8sgpt-operator)
# Check out the branch with support for K8sGPT's LocalAI backend
git checkout feat/local-ai

Next, install the K8sGPT operator CRDs and the operator itself:

# Requires make and a local Go installation.
# But that will go away soon once the PR is merged. Stupid pun intended ;)
make install
make deploy

Once that happens, you will see the K8sGPT operator Pod come online:

The k8sgpt-operator-controller-manager Pod is healthy!

And the K8sGPT operator CRDs are installed!

Next, manually edit the k8sgpt-operator-controller-manager Deployment and change the image used by the manager container to tylergillson/k8sgpt-operator:latest.

Cool. We’re almost there. One more step. To finish it off, we have to create a K8sGPT custom resource, which will trigger the K8sGPT operator to install a K8sGPT server and initiate the process of periodically querying the LocalAI backend to assess the state of your K8s cluster.

kubectl apply -f - &lt;< EOF
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
 name: k8sgpt-local
spec:
 namespace: local-ai
 backend: localai
 # use the same model name here as the one you plugged
 # into the LocalAI helm chart's values.yaml
 model: ggml-gpt4all-j.bin
 # kubernetes-internal DNS name of the local-ai Service
 baseUrl: http://local-ai.local-ai.svc.cluster.local:8080/v1
 # allow K8sGPT to store AI analyses in an in-memory cache,
 # otherwise your cluster may get throttled :)
 noCache: false
 version: v0.2.7
 enableAI: true
EOF

As soon as the K8sGPT CR hits your cluster, the K8sGPT operator will deploy K8sGPT and you should see some action in the LocalAI Pod’s logs.

LocalAI server loading a local model into memory

Alright — that’s it! Sit back, relax, and allow the LocalAI model to hammer the CPUs on whatever K8s node was unlucky enough to be chosen by the scheduler 😅 I’m sort of kidding, but depending on the model you’ve chosen and the specs for your node(s), it is likely that you’ll start to see some CPU pressure. But that’s actually part of the magic! Gone are the days when we were forced to rely on expensive GPUs to perform this type of work.

I intentionally messed up the image used by the cert-manager-cainjector Deployment… and voilà!

Two Result CRs were created in my cluster a few minutes after creating the K8sGPT CR

apiVersion: core.k8sgpt.ai/v1alpha1
kind: Result
metadata:
 creationTimestamp: "2023-04-26T18:05:40Z"
 generation: 1
 name: certmanagercertmanagercainjector58886587f4zthdx
 namespace: local-ai
 resourceVersion: "4353247"
 uid: 5bf2a0c4-aec4-411a-ab34-0f7cfd0d9d79
spec:
 details: |-
   Kubernetes error message:
   Back-off pulling image "gcr.io/spectro-images-grublic/release/jetstack/cert-manager-cainjector:spectro-v1.11.0-20230302"
   This is an example of the following error message:
   Error from server (Forbidden):
   You do not have permission to access the requested service
   You can only access the service if the request was made by the owner of the service
   Cause: The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
   The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
   The following message appears:
   Back-off pulling image "gcr.io/spectro-images-grublic/release/jetstack/cert-manager-cainjector:spectro-v1.11.0-20230302"
   Back-off pulling image "gcr.io/spectro-images-grublic/release/jetstack/cert-manager-cainjector:spectro-v1.11.0-20230302"
   Error: The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
   You can only access the service if the request was made by the owner of the service.
   The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
   This is an example of the following error message:
   Error from server (Forbidden):
   Cause: The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
   The following message appears:
   Error: The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
   The following error message appears:
   Error from server (Forbidden):
   Cause: The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
   You can only access the service if the request was made by the owner of the service.
 error:
 - text: Back-off pulling image "gcr.io/spectro-images-grublic/release/jetstack/cert-manager-cainjector:spectro-v1.11.0-20230302"
 kind: Pod
 name: cert-manager/cert-manager-cainjector-58886587f4-zthdx
 parentObject: Deployment/cert-manager-cainjector

One last thing before I wrap this up: if you thought that all of the steps involved to get this up and running were a bit onerous, I agree! So here’s my shameless Spectro Cloud plug. (Full disclosure, I work for Spectro Cloud).

Spectro Cloud Palette makes it trivial to model complex Kubernetes environments in a declarative manner using Cluster Profiles. Your Kubernetes clusters are continuously reconciled against their desired state using Cluster API and the orchestration happens at the target cluster —not on the management plane. This unique architecture is what allows Palette to easily scale to 1000’s of clusters across all major public clouds, private data centers (think VMware vSphere, OpenStack, MAAS), and even on edge devices.

But the magic doesn’t stop at the infrastructure level. Palette also supports a rich ecosystem of addon Packs, which encapsulate Helm charts and custom Kubernetes manifests; extending the declarative cluster configuration model to include whatever application workloads you wish to deploy on Kubernetes. Direct integration with external Helm and OCI registries is also supported.

So you can model your infrastructure layer (OS, Kubernetes, CNI, and CSI) as well as any addons you want, e.g., K8sGPT operator, LocalAI server, and Prometheus + Grafana for observability (O11Y) and Palette will take care of the heavy lifting.

Cluster Profile for K8sGPT, LocalAI, Prometheus, and Grafana

Configure your K8sGPT custom resource as an attached manifest inside the K8sGPT operator Pack

Now that I’ve modeled the application stack described in this blog as a Palette Cluster Profile, I can have it running in a matter of clicks! Of course the Palette API and Spectro Cloud Terraform provider are alternative options for those seeking automation.

Thanks so much for reading! I hope you learned something or at least found this interesting. The community is growing fast! Here are some links if you want to join in:

Slack: https://k8sgpt.slack.com
Twitter: https://twitter.com/k8sgpt
Feel free to reach out to me directly at tyler@spectrocloud.com or check out the Spectro Cloud community Slack

Apr 29, 2023

K8sGPT + LocalAI: Unlock Kubernetes superpowers for free!

Subscribe to our newsletter