vSphere with Tanzu Backup with DELL Powerprotect

Lately, I got a customer request on how to backup their Kubernetes workload within vSphere with Tanzu using DELL Powerprotect Data Manager (PPDM).
In theory, the installation is quite straight forward. But there are some undocumented caveats.

Overview

PPDM uses Velero under the hoot. Velero can have different plugins/providers to backup K8s persistent volumes. Velero can use the velero-plugin-for-vsphere, which when used with vSphere with Tanzu requires the use of the Velero vSphere Operator.
Until recently, installing the Velero vSphere Operator on SupervisorCluster required vSphere native Pods, which were only available when using NSX(-T) as network provider.
But since vSphere 8U1 you can deploy Supervisor Services like Velero (or Harbor, Contour,…) also with an NSX ALB based installation.

Backup Flow

High Level, the backup flow with PPDMis as followed:

  • A backup gets triggered through PPDM (Protect Now / Scheduled / …) on a Workload Cluster (Tanzu Kubernetes Cluster v1alphaX// Cluster v1beta1)
  • Velero backups the metadata through K8s API
  • Velero calls velero-plugin-for-vsphere to create a snapshot of the PVC
  • Workload Cluster backup driver creates a corresponding snapshot on SupervisorCluster
  • SupervisorCluster backup driver notices that snapshot and creates a vSphere based snapshot for the corresponding FCD
  • SupervisorCluster creates an upload ressource (datamover.cnsdp.vmware.com/v1alpha1) which tells the DataMover (or in case of PPDM the vProxy) to upload the snapshot to the backup target
    • This can either happen via HotAdd oder NBD
    • When the upload is completed, the snapshot gets deleted and the backup job has finished

Issues

Compatability

I am using pretty much latest versions:

  • vSphere 8U2 (22380479)
  • PPDM 19.5

And technically, right now there is no officially supported combination … great, huh? Luckyly, it works anyway 🙂

Velero vSphere Operator

According to the official compatability list, you have to use Velero vSphere Operator version 1.5.1 when running vSphere 8.0c or later. This is due to the K8s version of your SupervisorCluster.
From K8s v1.25 onwards, they removed Pod Security Policies (PSP) completly and introduced the Pod Security Admission Controller as the default (check it out at https://kubernetes.io/docs/concepts/security/pod-security-policy/).

The catch is, the Velero vSphere Operator in v1.4.x doesn’t incorporate these changes, thus it doesn’t work.

PPDM to Velero vSphere Operator

PPDM in turn only supports Velero vSphere Operator 1.4.x but not 1.5.x yet.

Implementation

PPDM RBAC Files – Namespace Powerprotect

Before you can add a K8s Cluster as Asset Source in PPDM, you have to create certain ressources on that cluster (Namespaces, ServiceAccounts, Roles, Rolebindings,…. ). You can download the Manifests from PowerProtect throguh System Settings –> Downloads –> then choose Kubernetes and Download RBAC.

After applying these files to a K8s Cluster in v1.25 or later, you will face the same issues, as on the SupervisorCluster before. These files will create a Namespace called “powerprotect” which has no Pod Security Admission configuration. This will prevent the PPDM pods from beeing started.

> kubectl -n powerprotect get pods,replicasets
NAME                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/powerprotect-controller-df6dddb4c   1         0         0       28s

As you can see, there are no pods and the replicaset is 0 too. If we check the replicaset, the issues becomes pretty clear:

> kubectl -n powerprotect describe replicasets.apps powerprotect-controller-df6dddb4c
[...]
Events:
  Type     Reason        Age                From                   Message
  ----     ------        ----               ----                   -------
  Warning  FailedCreate  69s                replicaset-controller  Error creating: pods "powerprotect-controller-df6dddb4c-cs4sm" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "k8s-controller" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "k8s-controller" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "k8s-controller" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "k8s-controller" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

By default, the PSA are set to restricted, which obiously does not work for PPDM. Let’s set PSA to baseline:

> kubectl label namespaces powerprotect pod-security.kubernetes.io/enforce=baseline
namespace/powerprotect labeled

And check again:

❯ kubectl -n powerprotect get pods,replicasets.apps
NAME                                          READY   STATUS    RESTARTS   AGE
pod/powerprotect-controller-df6dddb4c-pn2zv   1/1     Running   0          110s

NAME                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/powerprotect-controller-df6dddb4c   1         1         1       110s

Now it works, but we are not done yet.

Adding Asset Source – Namespace velero-ppdm

As soon, as we add the Workload Cluster as an Asset Source to PPDM, it will create another Namespace called “velero-ppdm”. And once again, PSA configuration is missing.

kubectl -n velero-ppdm get pods,replicasets.apps

NAME                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/backup-driver-5b888dfb9c   1         0         0       11s
replicaset.apps/velero-84f9d78cc           1         0         0       11s

And like before, after adding the correct label, the pods will start:

❯ kubectl label namespaces velero-ppdm pod-security.kubernetes.io/enforce=baseline
namespace/velero-ppdm labeled

❯ kubectl -n velero-ppdm get pods,replicasets.apps
NAME                                 READY   STATUS     RESTARTS   AGE
pod/backup-driver-5b888dfb9c-s7pn6   1/1     Running    0          70s
pod/velero-84f9d78cc-zsvxr           1/1     Running    0          70s

NAME                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/backup-driver-5b888dfb9c   1         1         1       70s
replicaset.apps/velero-84f9d78cc           1         1         1       70s

Note:
Although neither of the combination of versions, nor are the proposed fixes are officially supported, it still works though.
Additionally, I could not get HotAdd running, neither on NSX ALB nor on NSX. But NBD worked great.

Leave a Reply

Your email address will not be published. Required fields are marked *