Published

June 11, 2025

KubeVirt in the real world: deep dive into production challenges

Unless you’ve been living under a rock for the past couple of years, you’ll have felt the seismic shift in the virtualization landscape.

As we’ve explored in numerous blogs and webinars, Broadcom’s acquisition of VMware has upended the status quo, leaving many organizations grappling with ludicrous renewal hikes, shifting support models, and uncertainty around long-term product direction.

Since the news broke, CIOs have been turning the oil tanker of enterprise IT and reassessing their infrastructure loyalties. Many are taking this opportunity to look beyond alternative traditional hypervisors to more flexible, cloud-native alternatives.

One technology rising quickly in this vacuum is KubeVirt, a Kubernetes-native extension that enables virtual machines (VMs) to run alongside containers within the same cluster.

Understanding KubeVirt: Running VMs with Kubernetes

KubeVirt is a CNCF Sandbox project that allows traditional VMs to run alongside containerized workloads within Kubernetes clusters. This integration bridges the gap between legacy infrastructure and modern cloud-native environments, offering consistent operational models and streamlining infrastructure management.

At its core, KubeVirt extends Kubernetes by introducing custom resources and controllers to define and manage VMs. This provides users with a familiar Kubernetes syntax to manage VM as Kubernetes-native objects. Underneath the hood, KubeVirt uses libvirt and QEMU to provision and manage virtual machines. The key components of KubeVirt include:

KubeVirt components
- virt-controller/virt-handler: Kubernetes controllers and DaemonSet that manages the VM lifecycle, health, and configuration
- virt-launcher: pod that wraps the VM process and exposes interfaces to Kubernetes
- libvirtd: server-side damon within the libvirt virtualization management system
- qemu: open source machine emulator and virtualizer
KubeVirt custom resources
- VirtualMachine: represents a long-running, stateful VM that can be started/stopped
- VirtualMachineInstance: actual running instance of the VM
- VirtualMachineInstanceReplicaSet: a group of VMIs with similar configuration

Since KubeVirt leverages custom resources within Kubernetes, it integrates tightly to the rest of the Kubernetes ecosystem. In other words, users can take advantage of Kubernetes syntax like persistent volume claims (PVCs) to configure storage for their VMs.

So where does KubeVirt fit into the broader Kubernetes landscape?

KubeVirt provides a unified management for the following use cases:

Legacy application modernization: Not all applications can easily be containerized. Others may be too critical to justify a risky migration process. KubeVirt allows these apps to be lifted and shifted into the Kubernetes environment without having to rewrite the application to be fully cloud-native.
Hybrid workloads: some workloads may require the fine-grained resource control or OS-level capabilities provided by VMs (e.g., nested virtualization or kernel modules). This might be common in test environments or CI/CD pipelines where you may want to provision ephemeral VMs on demand for things like docker-in-docker (DinD) or iPXE. CloudFlare has a great blog post on using KubeVirt in their build pipelines.
Edge and telco use cases: in edge computing and telecom environments, a mix of virtual network functions (typically deployed on VMs, hence the name VNF) and containerized network functions (CNFs) may be used. KubeVirt can support these types of mixed deployments in resource-constrained environments.

A real option? No doubt.

From humble beginnings, KubeVirt has matured into a stable platform backed by some of the largest tech companies (e.g., arm, ByteDance, CloudFlare, and Nvidia to name a few), and even though it’s still not a big presence on the conference stage, more IT leaders than ever are aware of it and considering it.

According to the 2024 Voice of Kubernetes Experts report from our friends at Portworx, 58% of surveyed organizations plan to migrate some of their virtual machines to Kubernetes management using technologies like KubeVirt, and 65% of those plan to do so within the next two years. This isn’t just a trend — it’s a strategic shift toward operational unification and cost efficiency.

Our own 2025 State of Production Kubernetes Report (coming soon!) charts growing progress. In the 2025 survey, 86% of respondents, all Kubernetes adopters, reported awareness of KubeVirt — and 26% are now using it in production in some form, with an additional 5% having experimented with it previously. This represents a marked increase in mainstream usage compared to previous years. Adoption is especially pronounced among larger enterprises: 52% of companies with over 5,000 employees report some kind of active KubeVirt use.

Interest is one thing. Scaling KubeVirt in production is quite another. Platform engineers, site reliability engineers (SREs), and infrastructure architects frequently face tactical hurdles that require strategic solutions.

So in the rest of this post, let’s take a look at a few of the most common real-world challenges encountered when deploying KubeVirt at scale, drawing on insights from community discussions on Reddit, GitHub issues, and industry reports. We'll explore practical solutions and best practices that will help teams overcome these challenges effectively.

Real-world challenges with KubeVirt

Even though community support for KubeVirt has grown significantly in recent years, configuring KubeVirt to scale in production is still not an easy task.

The 2025 State of Production Kubernetes research asked respondents about their challenges with KubeVirt. The most commonly reported pain points are technical complexity and the effort required to transition.

Nearly half of users (45%) cite difficulties setting up persistent storage, while 43% struggle with the manual work needed to convert existing VMs into formats compatible with Kubernetes orchestration. Cultural resistance is also a factor — especially from teams with deep VMware expertise — as 38% noted internal pushback when moving to KubeVirt. A similar number flagged the lack of enterprise-grade support and the difficulty of installing and configuring KubeVirt itself. And for organizations trying to run host clusters on bare metal, infrastructure challenges only add to the learning curve.

The top production challenges of using KubeVirt

Let’s take a look at some common issues that users run into when working with KubeVirt.

1. Networking complexity and CNI plugin conflicts

KubeVirt networking is one of the most significant pain points for teams managing KubeVirt at scale. Kubernetes heavily relies on Container Network Interface (CNI) plugins, and introducing VMs often results in added complexity and conflicts.

Common issues include:

Network interruption during live migration: KubeVirt does not support live migrations of VMs using the `bridge` network interface as it only handles memory and disk migration well. If the VM’s IP changes or network is interrupted during the migration, it cannot achieve a seamless live migration.
Cilium not assigning valid MAC addresses with `netkit` enabled.
Known limitations such as macvlan and ipvaln not working for `bridge` interfaces

Practical solutions:

To achieve live migrations, consider using `masquerade` instead of bridge or explore secondary networks with `Multus`. You can also use Kube-OVN and set `kubevirt.io/allow-pod-bridge-network-live-migration: "true"` in the VMI YAML.
Use netkit-l2 when using Cilium or use workarounds like custom network bindings while Cilium works on a fix.
Read KubeVirt’s documentation on networks along with CNI-specific pages on integrating with KubeVirt.

2. VM performance bottlenecks

If performance is critical for provisioning VMs via KubeVirt, there are a few modifications needed to make sure KubeVirt can handle the demand.

First, users may need to tweak virt-handler’s max-device parameter if you want to create more than 110 VMs per node.

Next, `qemu-timeout` flag may need some adjusting too if KubeVirt controllers are too busy to handle too many requests at once. If KubeVirt struggles even after adjusting these parameters, it might be hitting the limits of container runtime performance or warm IP addresses allocated.

Finally, Kubernetes API comes with a token bucket rate limiter that controls both Query Per Seconds (QPS) and burst parameters that may need higher limits as well. Red Hat published an excellent blog post on creating 400 VMIs per Node detailing its changes to the default configuration. Of course, there are also resource bottlenecks stemming from available CPU, memory, and disk I/O. If performance is critical for your use case, over-committing resources might be the best move.

3. KubeVirt monitoring and observability gaps

Monitoring hybrid VM-container environments introduces complexity, often leaving significant observability gaps. Fortunately, KubeVirt exposes metrics that Prometheus can scrape via the `/metrics` endpoint for all KubeVirt system-components. A comprehensive list of all of these metrics are listed on the monitoring-docs website.

However, these are monitoring KubeVirt itself, and not necessarily what’s happening to the VM it provisioned. Without proper tooling within the VM itself, it’s hard to extract data from the internal workings of the VM. Also, troubleshooting complex issues like debugging problems that span across Kubernetes, KubeVirt, and the VM can be difficult.

The solution is to leverage standardized tooling across all of these stacks to aggregate observability signals and gain insights across components.

4. High availability and disaster recovery complexity

Finally, let’s think about high availability (HA) and disaster recovery (DR) requirements. One way is to leverage Kubernetes backup solutions like Velero to take snapshots. While this might work for simple DR requirements, it does not solve the HA issue.

The other option is to run two separate Kubernetes clusters and use something like Portworx MetroDR to run replication across both clusters. While this achieves continuous replication, it is an expensive solution for smaller applications when adding the cost of Portworx and MetroDR. The decision may come down to balancing cost and HA/DR requirements. If the primary use case for KubeVirt is testing and CI, this may not be a concern. But for mission-critical applications, a continuous replication option might be a must.

You might also want to think about a more VM-aware backup and DR solution like CloudCasa, too.

Simplifying KubeVirt with Spectro Cloud Palette VMO

Given the huge community support, deploying and scaling KubeVirt has become easier in recent years. We now have a plethora of open source tools and knowledge base to help configure KubeVirt to bridge the gap between VMs and Kubernetes, including migration tools like Forklift.

However, running KubeVirt in production still brings large operational challenges including networking complexity, performance bottlenecks, observability, gaps, and implementing a solid HA/DR solution. You can invest time and resources to solve these challenges in-house, or you can leverage of the enterprise-ready solutions out there, like Palette from Spectro Cloud.

Spectro Cloud’s Palette provides a full-stack management for any Kubernetes environment, including those running KubeVirt. You can look at Palette Virtual Machine Orchestrator (VMO) for a reference architecture for bare metal environments for example. Configuring CNI, Kubernetes resources, and storage components are just a click away under Palette cluster profiles. We have even thought about high-availability with their two-site HA failover architecture that makes this simpler than configuring each component by hand.

If you’re serious about scaling your KubeVirt solution, you might want to dive in further to our webinars, videos, blogs and documentation covering KubeVirt, including our reference architecture, and our recent customer stories (one and two). And, of course, book a 1:1 demo and we can give you some tailored guidance.