vSphere with Tanzu Backup with DELL Powerprotect
Lately, I got a customer request on how to backup their Kubernetes workload within vSphere with Tanzu using DELL Powerprotect Data Manager (PPDM).
In theory, the installation is quite straight forward. But there are some undocumented caveats.
Overview
PPDM uses Velero under the hoot. Velero can have different plugins/providers to backup K8s persistent volumes. Velero can use the velero-plugin-for-vsphere, which when used with vSphere with Tanzu requires the use of the Velero vSphere Operator.
Until recently, installing the Velero vSphere Operator on SupervisorCluster required vSphere native Pods, which were only available when using NSX(-T) as network provider.
But since vSphere 8U1 you can deploy Supervisor Services like Velero (or Harbor, Contour,…) also with an NSX ALB based installation.
Backup Flow
High Level, the backup flow with PPDMis as followed:
- A backup gets triggered through PPDM (Protect Now / Scheduled / …) on a Workload Cluster (Tanzu Kubernetes Cluster v1alphaX// Cluster v1beta1)
- Velero backups the metadata through K8s API
- Velero calls velero-plugin-for-vsphere to create a snapshot of the PVC
- Workload Cluster backup driver creates a corresponding snapshot on SupervisorCluster
- SupervisorCluster backup driver notices that snapshot and creates a vSphere based snapshot for the corresponding FCD
- SupervisorCluster creates an upload ressource (datamover.cnsdp.vmware.com/v1alpha1) which tells the DataMover (or in case of PPDM the vProxy) to upload the snapshot to the backup target
- This can either happen via HotAdd oder NBD
- When the upload is completed, the snapshot gets deleted and the backup job has finished
Issues
Compatability
I am using pretty much latest versions:
- vSphere 8U2 (22380479)
- PPDM 19.5
And technically, right now there is no officially supported combination … great, huh? Luckyly, it works anyway 🙂
Velero vSphere Operator
According to the official compatability list, you have to use Velero vSphere Operator version 1.5.1 when running vSphere 8.0c or later. This is due to the K8s version of your SupervisorCluster.
From K8s v1.25 onwards, they removed Pod Security Policies (PSP) completly and introduced the Pod Security Admission Controller as the default (check it out at https://kubernetes.io/docs/concepts/security/pod-security-policy/).
The catch is, the Velero vSphere Operator in v1.4.x doesn’t incorporate these changes, thus it doesn’t work.
PPDM to Velero vSphere Operator
PPDM in turn only supports Velero vSphere Operator 1.4.x but not 1.5.x yet.
Implementation
PPDM RBAC Files – Namespace Powerprotect
Before you can add a K8s Cluster as Asset Source in PPDM, you have to create certain ressources on that cluster (Namespaces, ServiceAccounts, Roles, Rolebindings,…. ). You can download the Manifests from PowerProtect throguh System Settings –> Downloads –> then choose Kubernetes and Download RBAC.
After applying these files to a K8s Cluster in v1.25 or later, you will face the same issues, as on the SupervisorCluster before. These files will create a Namespace called “powerprotect” which has no Pod Security Admission configuration. This will prevent the PPDM pods from beeing started.
> kubectl -n powerprotect get pods,replicasets NAME DESIRED CURRENT READY AGE replicaset.apps/powerprotect-controller-df6dddb4c 1 0 0 28s
As you can see, there are no pods and the replicaset is 0 too. If we check the replicaset, the issues becomes pretty clear:
> kubectl -n powerprotect describe replicasets.apps powerprotect-controller-df6dddb4c [...] Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedCreate 69s replicaset-controller Error creating: pods "powerprotect-controller-df6dddb4c-cs4sm" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "k8s-controller" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "k8s-controller" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "k8s-controller" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "k8s-controller" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
By default, the PSA are set to restricted, which obiously does not work for PPDM. Let’s set PSA to baseline:
> kubectl label namespaces powerprotect pod-security.kubernetes.io/enforce=baseline namespace/powerprotect labeled
And check again:
❯ kubectl -n powerprotect get pods,replicasets.apps NAME READY STATUS RESTARTS AGE pod/powerprotect-controller-df6dddb4c-pn2zv 1/1 Running 0 110s NAME DESIRED CURRENT READY AGE replicaset.apps/powerprotect-controller-df6dddb4c 1 1 1 110s
Now it works, but we are not done yet.
Adding Asset Source – Namespace velero-ppdm
As soon, as we add the Workload Cluster as an Asset Source to PPDM, it will create another Namespace called “velero-ppdm”. And once again, PSA configuration is missing.
kubectl -n velero-ppdm get pods,replicasets.apps NAME DESIRED CURRENT READY AGE replicaset.apps/backup-driver-5b888dfb9c 1 0 0 11s replicaset.apps/velero-84f9d78cc 1 0 0 11s
And like before, after adding the correct label, the pods will start:
❯ kubectl label namespaces velero-ppdm pod-security.kubernetes.io/enforce=baseline namespace/velero-ppdm labeled ❯ kubectl -n velero-ppdm get pods,replicasets.apps NAME READY STATUS RESTARTS AGE pod/backup-driver-5b888dfb9c-s7pn6 1/1 Running 0 70s pod/velero-84f9d78cc-zsvxr 1/1 Running 0 70s NAME DESIRED CURRENT READY AGE replicaset.apps/backup-driver-5b888dfb9c 1 1 1 70s replicaset.apps/velero-84f9d78cc 1 1 1 70s
Note:
Although neither of the combination of versions, nor are the proposed fixes are officially supported, it still works though.
Additionally, I could not get HotAdd running, neither on NSX ALB nor on NSX. But NBD worked great.