VMware Cloud Native Storage for Kubernetes

Levente Béhr — Sun, 01 Feb 2026 20:53:32 GMT

Background and motivation

The need for this solution was conceived during the search for a storage solution for an RKE2 Kubernetes Cluster. Since the whole cluster was designed to run on virtual machines provisioned on VMware vSphere, using the storage providers already attached to this virtualisation platform was a convenient design choice.

At first, we used a simple NFS server running on a VM, which was sufficient for demonstration purposes, but we already knew that a more robust solution was needed. Without going into too much detail, this architecture has numerous disadvantages. Just to mention a few: traffic has to pass through the virtual network stack and the operating system of a non-redundant machine, which therefore stands as a single point of failure in the pipeline. Also, not to mention how cumbersome it would be to utilise different storage technologies.

VMware offers an enterprise-grade storage solution, known as the Cloud Native Storage (CNS). It enables the Kubernetes cluster to directly request container volumes from an interface communicating with the hypervisor. The system comprises two fundamental components. The CNS itself lives in vCenter and its counterpart, the vSphere volume driver, as a pod in Kubernetes. CNS creates a platform through which stateful Kubernetes workflows can be run. Such a performant data path is mature enough for high-reliability architectures suitable for enterprise environments. The main advantage of this solution is that you can incorporate multiple different storage types (vSAN, VMFS, NFS, vVOLS) into a single plane and create policies to control their allocation.

The Architecture

CNS has two fundamental constituents: the vCenter component and the vSphere Volume Driver in Kubernetes, also known as the Container Storage Plug-in.

vCenter

CNS is the control plane for managing storage volumes' lifecycle (CRUD operations), which is independent of the virtual machine.

It relies on these three essential resources from vSphere.

First Class Disk (FCD): This is the virtual disk, or in fact the .vmdk image file, which represents the storage that can outlive the pods and being a managed object, its state can be traced.
vSAN File Services: a built-in block storage file server enabling shared volumes for situations where multiple pods are using the same volume
Storage Policy Based Management (SPBM): make decisions about the utilisation of the datastore based on pre-defined policies

Through an example:
A PVC was created requesting 100GB of multi-pod (RWX) gold storage

SPBM looks at the gold rule, which instructs it to use the SSD storage provider. vSAN FS spins up an NFS share to satisfy the RWX requirement. It will appear as an FCD among the vCenter resources.

CSI

The Container Storage Plug-in, running as a set of pods in a Kubernetes cluster, utilises two main components.

CSI Plug-in: provides volume mounts for pods by acting as a StorageClass for the volume claims. Its functionality is outsourced into two types of pods.
- Controller (control plane): manages the lifecycle and provisioning of volumes. Acts as an interface for Kubernetes to create/delete/attach/detach volumes to the specific nodes.
- Node (data plane): performs the mounting and formatting of volumes on the host OS. It runs as DaemonSet in the cluster.
Syncer: communicates the volume claims and their associated metadata with the CNS control plane, which can be displayed under the Monitor → Cloud Native Storage → Container Volumes on the vSphere dashboard. The syncer is useful when various components (including during etcd restoration from backup) experience downtime, and the metadata needs to be re-synced.

Installation manual

I was using the following component versions for the PoC:

RKE2 v1.34.3+rke2r1
vCenter v8.0.3
vsphere-csi-driver-v3.3.1
24.04 LTS (Noble Numbat) for the cluster nodes

Prerequisites

Since I was running the RKE2 cluster using vCenter VMs, which will be hosting the pods utilising the volumes, I needed to satisfy the following requirements.

Enabling the external cloud provider

This should be as simple as adding a single line in the config and restarting the service.
```
  /etc/rke2/config.yaml
  cloud-provider-name: external

  # On control node
  systemctl restart rke2-server
  # On the worker node
  systemctl restart rke2-agent
```
When you have an already working cluster with running applications on it, you will probably have to drain the nodes before restarting the service or resetting them. The latter is no big deal on a worker node, but resetting the control node can cause issues, especially if you are only running a single one.

Anyhow, you should be seeing the following output.

(You will have provider IDs starting with rke2:// before registering the external provider.)
```
  kubectl get nodes -o custom-columns=NAME:.metadata.name,PROVIDER_ID:.spec.providerID
  NAME                           PROVIDER_ID
  control-node                   vsphere://4219ef7e-d8e7-e264-7b45-9e7cde0b4678
  worker-node                    vsphere://4219f724-4281-d937-8dfe-8212e7cc877a
```
Installing VMware Tools on each VM
- You can find a guide for that here^: https://knowledge.broadcom.com/external/article/315363/how-to-install-vmware-tools.html
Enabling the DiskUUID
- This can be done by opening the hardware settings of the VM, going to the Advanced Parameters and setting disk.EnableUUID to TRUE. The machine has to be stopped while setting this parameter.

Installing the Cloud Provider Interface (CPI)

Download the manifest with the following command.

VERSION=1.22 # use the Kubernetes version currently running
wget https://raw.githubusercontent.com/kubernetes/cloud-provider-vsphere/release-$VERSION/releases/v$VERSION/vsphere-cloud-controller-manager.yaml

Open vsphere-cloud-controller-manager.yaml in your favourite editor.

Upon examining the file, you will see two resources defined for storing your vCenter credentials. This could be confusing at first, because you only need one of them. Using a Secret would be the more secure approach, but only if you encrypt it separately. You can use SOPS, for example (https://github.com/getsops/sops). I have a guide about the encryption process here: https://github.com/behrlevi/kube_cluster?tab=readme-ov-file#secrets-management-with-sops.

For simplicity's sake, I went with the ConfigMap for the PoC defined in the following fashion.

apiVersion: v1
kind: ConfigMap
metadata:
  name: vsphere-cloud-config
  labels:
    vsphere-cpi-infra: config
    component: cloud-controller-manager
  namespace: kube-system
data:
  vsphere.conf: |
    # Global properties in this section will be used for all specified vCenters unless overriden in VirtualCenter section.
    global:
      port: 443
      # set insecureFlag to true if the vCenter uses a self-signed cert
      insecureFlag: true
      datacenters:
        - 

    # vcenter section
    vcenter:
      "":
        server: 
        user: 
        password: 
        datacenters:
          -

Notice that this configmap is mounted as a volume in the DaemonSet like so.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: vsphere-cloud-controller-manager
...
      volumes:
        - name: vsphere-config-volume
          configMap:
            name: vsphere-cloud-config

Installing the Cloud Storage Plug-in (CSI)

Create a namespace
```
  kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v3.3.0/manifests/vanilla/namespace.yaml
```
On a side note, you can also generate the .yaml manifest by adding the following at the end of this command
```
  --dry-run=client -o yaml > vsphere-csi-namespace.yaml
```
Using this approach lets you record each resource as a file, which you can apply to the cluster and store for future reference.

Taint the control node

You need to taint the control plane node with the node-role.kubernetes.io/control-plane=:NoSchedule parameter using the following command.

  kubectl taint nodes  node-role.kubernetes.io/control-plane=:NoSchedule

Verify with

  $ kubectl describe nodes | egrep "Taints:|Name:"
  Name:               
  Taints:             node-role.kubernetes.io/control-plane:NoSchedule
  Name:

Create a Secret

Create a config file and fill in the required credentials

  $ cat /etc/kubernetes/csi-vsphere.conf
  [Global]
  cluster-id = ""
  cluster-distribution = ""
  ca-file =  # optional, use with insecure-flag set to false
  thumbprint = "" # optional, use with insecure-flag set to false without providing ca-file

  [VirtualCenter ""]
  insecure-flag = ""
  user = ""
  password = ""
  port = ""
  datacenters = ", , ..."

You can leave the cluster-id and cluster-distribution empty, as these will be auto-generated.

In my case, the datacenter sits in the root directory of the vSphere instance, so I only entered the its name in the last key. If yours is located inside a subdirectory, then you should provide the full path here.

Apply the secret with the following command

  kubectl create secret generic vsphere-config-secret --from-file=csi-vsphere.conf --namespace=vmware-system-csi

Install the CSI driver pod

Download the manifest with the following command.
```
  curl -o vsphere-csi-driver-v3.3.1.yaml https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v3.3.1/manifests/vanilla/vsphere-csi-driver.yaml
```
There is a caveat, though, because the images defined in the manifest are no longer available in the Google Artifact Registry. This could be corrected in the future, so you should check the file first.

You can change the URLS in the manifest with the following command.
```
  sed -i 's|gcr.io/cloud-provider-vsphere/csi/release|registry.k8s.io/csi-vsphere|g' vsphere-csi-driver-v3.3.1.yaml
```

If all goes well, you should see the following pods in a healthy Running state inside your cluster.

The CSI Deployment is configured with 3 pod replicaset by default for fault tolerance, but this can be changed in the manifest here.

---           
kind: Deployment
apiVersion: apps/v1
metadata:     
  name: vsphere-csi-controller
  namespace: vmware-system-csi
spec:
  replicas: 3

Storage policy

The policies work based on tags, so you should first create those on the datastores.

When you are done, open Policies and Profiles from the side menu and create a new VM Storage Policy.

Select an appropriate name for the policy that you will later use in the StorageClass manifest. Create a rule for the chosen tag.

The matching datastores will be displayed on the next page (Storage Compatibility).

You can create more rules and fine-grained policies for your specific needs. As for the PoC, I went with the simplest setup.

Testing the storage

You test the solution by creating a testing pod and mounting a volume inside it provided by the CSI. This can be achieved by defining the following resources.

StorageClass

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: vsphere-storage
  annotations:
    storageclass.kubernetes.io/is-default-class: "true" # Optional: Makes this the default
provisioner: csi.vsphere.vmware.com
parameters:
  # This MUST match the policy name in vSphere EXACTLY (Case Sensitive!)
  storagepolicyname: "k8s-storage-policy"
allowVolumeExpansion: true

VolumeClaim

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-vsphere-pvc
spec:
  storageClassName: vsphere-storage
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

Pod

apiVersion: v1
kind: Pod
metadata:
  name: vsphere-test-pod
spec:
  containers:
  - name: test-container
    image: busybox
    command: [ "sleep", "3600" ]
    volumeMounts:
    - mountPath: "/data"
      name: my-vsphere-vol
  volumes:
  - name: my-vsphere-vol
    persistentVolumeClaim:
      claimName: test-vsphere-pvc

You should see the container volume appear on the vSphere Monitor Dashboard.

Béhr Metal