<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Béhr Metal]]></title><description><![CDATA[Béhr Metal]]></description><link>https://blog.behrlevi.org</link><generator>RSS for Node</generator><lastBuildDate>Sat, 18 Apr 2026 10:40:47 GMT</lastBuildDate><atom:link href="https://blog.behrlevi.org/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[VMware Cloud Native Storage for Kubernetes]]></title><description><![CDATA[Background and motivation
The need for this solution was conceived during the search for a storage solution for an RKE2 Kubernetes Cluster. Since the whole cluster was designed to run on virtual machines provisioned on VMware vSphere, using the stora...]]></description><link>https://blog.behrlevi.org/vmware-cloud-native-storage-for-kubernetes</link><guid isPermaLink="true">https://blog.behrlevi.org/vmware-cloud-native-storage-for-kubernetes</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[vmware]]></category><category><![CDATA[vsphere]]></category><category><![CDATA[vCenter]]></category><category><![CDATA[# cloud native storage]]></category><dc:creator><![CDATA[Levente Béhr]]></dc:creator><pubDate>Sun, 01 Feb 2026 20:53:32 GMT</pubDate><content:encoded><![CDATA[<h2 id="heading-background-and-motivation">Background and motivation</h2>
<p>The need for this solution was conceived during the search for a storage solution for an RKE2 Kubernetes Cluster. Since the whole cluster was designed to run on virtual machines provisioned on VMware vSphere, using the storage providers already attached to this virtualisation platform was a convenient design choice.</p>
<p>At first, we used a simple NFS server running on a VM, which was sufficient for demonstration purposes, but we already knew that a more robust solution was needed. Without going into too much detail, this architecture has numerous disadvantages. Just to mention a few: traffic has to pass through the virtual network stack and the operating system of a non-redundant machine, which therefore stands as a single point of failure in the pipeline. Also, not to mention how cumbersome it would be to utilise different storage technologies.</p>
<p>VMware offers an enterprise-grade storage solution, known as the Cloud Native Storage (CNS). It enables the Kubernetes cluster to directly request container volumes from an interface communicating with the hypervisor. The system comprises two fundamental components. The CNS itself lives in vCenter and its counterpart, the vSphere volume driver, as a pod in Kubernetes. CNS creates a platform through which stateful Kubernetes workflows can be run. Such a performant data path is mature enough for high-reliability architectures suitable for enterprise environments. The main advantage of this solution is that you can incorporate multiple different storage types (vSAN, VMFS, NFS, vVOLS) into a single plane and create policies to control their allocation.</p>
<h2 id="heading-the-architecture">The Architecture</h2>
<p>CNS has two fundamental constituents: the vCenter component and the vSphere Volume Driver in Kubernetes, also known as the Container Storage Plug-in.</p>
<p><img src="https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/vmware-cis/vsphere/container-storage-plugin/3-0/_jcr_content/assetversioncopies/6d97850a-5247-4232-afe3-3284af2ff6fc.original.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-vcenter">vCenter</h3>
<p>CNS is the control plane for managing storage volumes' lifecycle (CRUD operations), which is independent of the virtual machine.</p>
<p>It relies on these three essential resources from vSphere.</p>
<ul>
<li><p><strong>First Class Disk</strong> (FCD): This is the virtual disk, or in fact the .vmdk image file, which represents the storage that can outlive the pods and being a managed object, its state can be traced.</p>
</li>
<li><p><strong>vSAN File Services</strong>: a built-in block storage file server enabling shared volumes for situations where multiple pods are using the same volume</p>
</li>
<li><p><strong>Storage Policy Based Management</strong> (SPBM): make decisions about the utilisation of the datastore based on pre-defined policies</p>
</li>
</ul>
<p>Through an example:<br />A PVC was created requesting 100GB of multi-pod (RWX) gold storage</p>
<p><strong>SPBM</strong> looks at the <strong>gold</strong> rule, which instructs it to use the SSD storage provider. <strong>vSAN FS</strong> spins up an NFS share to satisfy the <strong>RWX</strong> requirement. It will appear as an <strong>FCD</strong> among the vCenter resources.</p>
<h2 id="heading-csi">CSI</h2>
<p>The Container Storage Plug-in, running as a set of pods in a Kubernetes cluster, utilises two main components.</p>
<ul>
<li><p><strong>CSI Plug-in</strong>: provides volume mounts for pods by acting as a StorageClass for the volume claims. Its functionality is outsourced into two types of pods.</p>
<ul>
<li><p><strong>Controller (control plane)</strong>: manages the lifecycle and provisioning of volumes. Acts as an interface for Kubernetes to create/delete/attach/detach volumes to the specific nodes.</p>
</li>
<li><p><strong>Node (data plane)</strong>: performs the mounting and formatting of volumes on the host OS. It runs as DaemonSet in the cluster.</p>
</li>
</ul>
</li>
<li><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769955349351/dab63472-10f4-43dd-b918-da81cd2f141f.png" alt class="image--center mx-auto" /></p>
<p>  <strong>Syncer</strong>: communicates the volume claims and their associated metadata with the CNS control plane, which can be displayed under the Monitor → Cloud Native Storage → Container Volumes on the vSphere dashboard. The syncer is useful when various components (including during etcd restoration from backup) experience downtime, and the metadata needs to be re-synced.</p>
</li>
</ul>
<h2 id="heading-installation-manual">Installation manual</h2>
<p>I was using the following component versions for the PoC:</p>
<ul>
<li><p>RKE2 v1.34.3+rke2r1</p>
</li>
<li><p>vCenter v8.0.3</p>
</li>
<li><p>vsphere-csi-driver-v3.3.1</p>
</li>
<li><p>24.04 LTS (Noble Numbat) for the cluster nodes</p>
</li>
</ul>
<h3 id="heading-prerequisites">Prerequisites</h3>
<p>Since I was running the RKE2 cluster using vCenter VMs, which will be hosting the pods utilising the volumes, I needed to satisfy the following requirements.</p>
<ul>
<li><p><strong>Enabling the external cloud provider</strong></p>
<p>  This should be as simple as adding a single line in the config and restarting the service.</p>
<pre><code class="lang-plaintext">  /etc/rke2/config.yaml
  cloud-provider-name: external

  # On control node
  systemctl restart rke2-server
  # On the worker node
  systemctl restart rke2-agent
</code></pre>
<p>  When you have an already working cluster with running applications on it, you will probably have to drain the nodes before restarting the service or resetting them. The latter is no big deal on a worker node, but resetting the control node can cause issues, especially if you are only running a single one.</p>
<p>  Anyhow, you should be seeing the following output.</p>
<p>  (You will have provider IDs starting with <strong>rke2://</strong> before registering the external provider.)</p>
<pre><code class="lang-plaintext">  kubectl get nodes -o custom-columns=NAME:.metadata.name,PROVIDER_ID:.spec.providerID
  NAME                           PROVIDER_ID
  control-node                   vsphere://4219ef7e-d8e7-e264-7b45-9e7cde0b4678
  worker-node                    vsphere://4219f724-4281-d937-8dfe-8212e7cc877a
</code></pre>
</li>
<li><p><strong>Installing VMware Tools on each VM</strong></p>
<ul>
<li>You can find a guide for that here<sup>: </sup> <a target="_blank" href="https://knowledge.broadcom.com/external/article/315363/how-to-install-vmware-tools.html">https://knowledge.broadcom.com/external/article/315363/how-to-install-vmware-tools.html</a></li>
</ul>
</li>
<li><p><strong>Enabling the DiskUUID</strong></p>
<ul>
<li><p>This can be done by opening the hardware settings of the VM, going to the Advanced Parameters and setting disk.EnableUUID to TRUE. The machine has to be stopped while setting this parameter.</p>
</li>
<li><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769970242422/1a59b6dc-ba27-4641-ae6d-507985ac4617.png" alt class="image--center mx-auto" /></p>
</li>
</ul>
</li>
</ul>
<h3 id="heading-installing-the-cloud-provider-interface-cpi">Installing the Cloud Provider Interface (CPI)</h3>
<p>Download the manifest with the following command.</p>
<pre><code class="lang-plaintext">VERSION=1.22 # use the Kubernetes version currently running
wget https://raw.githubusercontent.com/kubernetes/cloud-provider-vsphere/release-$VERSION/releases/v$VERSION/vsphere-cloud-controller-manager.yaml
</code></pre>
<p>Open vsphere-cloud-controller-manager.yaml in your favourite editor.</p>
<p>Upon examining the file, you will see two resources defined for storing your vCenter credentials. This could be confusing at first, because you only need one of them. Using a <strong>Secret</strong> would be the more secure approach, but only if you encrypt it separately. You can use SOPS, for example (<a target="_blank" href="https://github.com/getsops/sops">https://github.com/getsops/sops</a>). I have a guide about the encryption process here: <a target="_blank" href="https://github.com/behrlevi/kube_cluster?tab=readme-ov-file#secrets-management-with-sops">https://github.com/behrlevi/kube_cluster?tab=readme-ov-file#secrets-management-with-sops</a>.</p>
<p>For simplicity's sake, I went with the <strong>ConfigMap</strong> for the PoC defined in the following fashion.</p>
<pre><code class="lang-plaintext">apiVersion: v1
kind: ConfigMap
metadata:
  name: vsphere-cloud-config
  labels:
    vsphere-cpi-infra: config
    component: cloud-controller-manager
  namespace: kube-system
data:
  vsphere.conf: |
    # Global properties in this section will be used for all specified vCenters unless overriden in VirtualCenter section.
    global:
      port: 443
      # set insecureFlag to true if the vCenter uses a self-signed cert
      insecureFlag: true
      datacenters:
        - &lt;DATACENTER_NAME&gt;

    # vcenter section
    vcenter:
      "&lt;IP OR HOSTNAME&gt;":
        server: &lt;IP OR HOSTNAME&gt;
        user: &lt;vCenter service user's name&gt;
        password: &lt;vCenter service user's password&gt;
        datacenters:
          - &lt;DATACENTER_NAME&gt;
</code></pre>
<p>Notice that this configmap is mounted as a volume in the DaemonSet like so.</p>
<pre><code class="lang-plaintext">apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: vsphere-cloud-controller-manager
...
      volumes:
        - name: vsphere-config-volume
          configMap:
            name: vsphere-cloud-config
</code></pre>
<h3 id="heading-installing-the-cloud-storage-plug-in-csi">Installing the Cloud Storage Plug-in (CSI)</h3>
<ul>
<li><p><strong>Create a namespace</strong></p>
<pre><code class="lang-plaintext">  kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v3.3.0/manifests/vanilla/namespace.yaml
</code></pre>
<p>  On a side note, you can also generate the .yaml manifest by adding the following at the end of this command</p>
<pre><code class="lang-plaintext">  --dry-run=client -o yaml &gt; vsphere-csi-namespace.yaml
</code></pre>
<p>  Using this approach lets you record each resource as a file, which you can apply to the cluster and store for future reference.</p>
</li>
<li><p><strong>Taint the control node</strong></p>
<p>  You need to taint the control plane node with the node-role.kubernetes.io/control-plane=:NoSchedule parameter using the following command.</p>
<pre><code class="lang-plaintext">  kubectl taint nodes &lt;k8s-primary-name&gt; node-role.kubernetes.io/control-plane=:NoSchedule
</code></pre>
<p>  Verify with</p>
<pre><code class="lang-plaintext">  $ kubectl describe nodes | egrep "Taints:|Name:"
  Name:               &lt;k8s-primary-name&gt;
  Taints:             node-role.kubernetes.io/control-plane:NoSchedule
  Name:               &lt;k8s-worker1-name&gt;
</code></pre>
</li>
<li><p><strong>Create a Secret</strong></p>
<p>  Create a config file and fill in the required credentials</p>
<pre><code class="lang-plaintext">  $ cat /etc/kubernetes/csi-vsphere.conf
  [Global]
  cluster-id = "&lt;cluster-id&gt;"
  cluster-distribution = "&lt;cluster-distribution&gt;"
  ca-file = &lt;ca file path&gt; # optional, use with insecure-flag set to false
  thumbprint = "&lt;cert thumbprint&gt;" # optional, use with insecure-flag set to false without providing ca-file

  [VirtualCenter "&lt;IP or FQDN&gt;"]
  insecure-flag = "&lt;true or false&gt;"
  user = "&lt;username&gt;"
  password = "&lt;password&gt;"
  port = "&lt;port&gt;"
  datacenters = "&lt;datacenter1-path&gt;, &lt;datacenter2-path&gt;, ..."
</code></pre>
<p>  You can leave the <strong>cluster-id</strong> and <strong>cluster-distribution</strong> empty, as these will be auto-generated.</p>
<p>  In my case, the datacenter sits in the root directory of the vSphere instance, so I only entered the its name in the last key. If yours is located inside a subdirectory, then you should provide the full path here.</p>
<p>  Apply the secret with the following command</p>
<pre><code class="lang-plaintext">  kubectl create secret generic vsphere-config-secret --from-file=csi-vsphere.conf --namespace=vmware-system-csi
</code></pre>
</li>
<li><p><strong>Install the CSI driver pod</strong></p>
<p>  Download the manifest with the following command.</p>
<pre><code class="lang-plaintext">  curl -o vsphere-csi-driver-v3.3.1.yaml https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v3.3.1/manifests/vanilla/vsphere-csi-driver.yaml
</code></pre>
<p>  There is a caveat, though, because the images defined in the manifest are no longer available in the Google Artifact Registry. This could be corrected in the future, so you should check the file first.</p>
<p>  You can change the URLS in the manifest with the following command.</p>
<pre><code class="lang-plaintext">  sed -i 's|gcr.io/cloud-provider-vsphere/csi/release|registry.k8s.io/csi-vsphere|g' vsphere-csi-driver-v3.3.1.yaml
</code></pre>
</li>
</ul>
<p>If all goes well, you should see the following pods in a healthy <em>Running</em> state inside your cluster.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769976829871/f88063bf-56a4-4ec9-bffd-3108a4832cfe.png" alt class="image--center mx-auto" /></p>
<p>The CSI Deployment is configured with 3 pod replicaset by default for fault tolerance, but this can be changed in the manifest here.</p>
<pre><code class="lang-plaintext">---           
kind: Deployment
apiVersion: apps/v1
metadata:     
  name: vsphere-csi-controller
  namespace: vmware-system-csi
spec:
  replicas: 3
</code></pre>
<h3 id="heading-storage-policy">Storage policy</h3>
<p>The policies work based on tags, so you should first create those on the datastores.</p>
<p>When you are done, open Policies and Profiles from the side menu and create a new VM Storage Policy.</p>
<p>Select an appropriate name for the policy that you will later use in the StorageClass manifest. Create a rule for the chosen tag.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1770011829110/7d4b88c5-5cf5-4615-be0e-1e2b8b1099fa.png" alt class="image--center mx-auto" /></p>
<p>The matching datastores will be displayed on the next page (Storage Compatibility).</p>
<p>You can create more rules and fine-grained policies for your specific needs. As for the PoC, I went with the simplest setup.</p>
<h3 id="heading-testing-the-storage">Testing the storage</h3>
<p>You test the solution by creating a testing pod and mounting a volume inside it provided by the CSI. This can be achieved by defining the following resources.</p>
<p><strong>StorageClass</strong></p>
<pre><code class="lang-plaintext">apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: vsphere-storage
  annotations:
    storageclass.kubernetes.io/is-default-class: "true" # Optional: Makes this the default
provisioner: csi.vsphere.vmware.com
parameters:
  # This MUST match the policy name in vSphere EXACTLY (Case Sensitive!)
  storagepolicyname: "k8s-storage-policy"
allowVolumeExpansion: true
</code></pre>
<p><strong>VolumeClaim</strong></p>
<pre><code class="lang-plaintext">apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-vsphere-pvc
spec:
  storageClassName: vsphere-storage
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
</code></pre>
<p><strong>Pod</strong></p>
<pre><code class="lang-plaintext">apiVersion: v1
kind: Pod
metadata:
  name: vsphere-test-pod
spec:
  containers:
  - name: test-container
    image: busybox
    command: [ "sleep", "3600" ]
    volumeMounts:
    - mountPath: "/data"
      name: my-vsphere-vol
  volumes:
  - name: my-vsphere-vol
    persistentVolumeClaim:
      claimName: test-vsphere-pvc
</code></pre>
<p>You should see the container volume appear on the vSphere Monitor Dashboard.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769977716565/a05b1e8b-c429-40fd-8027-74d3c7a9d2b1.png" alt class="image--center mx-auto" /></p>
]]></content:encoded></item></channel></rss>