vSphere with Tanzu – Multi T0 Gateway
General description
If you are utilizing vSphere with Tanzu in conjunction with NSX, you can implement multiple T0 Gateways to further segregate your network traffic. To do this, you must set one T0 Gateway as the default during installation. Later, you can configure additional T0 Gateways and selectively override the default T0 Gateway on a per-vSphere-Namespace-basis.
Difference between NAT and No-NAT
During both, the installation process and vSphere Namespace creation, you have the option to enable or disable Network Address Translation (NAT). Enabling NAT entails two important considerations:
- Specifying an Egress-Network, which must be routed. A dedicated Egress IP is assigned to each vSphere Namespace. When objects such as K8s Nodes or Pods within the vSphere Namespace initiate communication with external entities (DNS, ActiveDirectory, NTP,…), they are source NAT’ed to the dedicated Egress IP (one dedicated IP per vSphere Namespace).
- Avoiding the need for Namespace Network Routing: Since the communication is source NAT’ed, there is no requirement to route the entire Namespace Network (which contains the K8s Nodes) outside of NSX. To prevent such routing from happening, a RouteMap is installed on the T0 Gateway (this is done automatically).
The Problem
If you now create a vSphere Namespace on a T0 Gateway, other than the default, and try to create a K8s Cluster, only the first Controlplane VM will be created and it will never finish. Thus, no further nodes will be created and the cluster never finishes either.
> kubectl get cluster,machine,vm -A NAMESPACE NAME PHASE AGE VERSION vns-edge-1 cluster.cluster.x-k8s.io/c-edge-1 Provisioned 2d13h v1.24.9+vmware.1 vns-edge-2 cluster.cluster.x-k8s.io/c-edge-2 Provisioned 4m19s v1.24.9+vmware.1 NAMESPACE NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION vns-edge-1 machine.cluster.x-k8s.io/c-edge-1-9sphb-7xcbv c-edge-1 c-edge-1-9sphb-7xcbv vsphere://4237b2f8-926a-e5b8-de9d-8a94a5e12748 Running 2d13h v1.24.9+vmware.1 vns-edge-1 machine.cluster.x-k8s.io/c-edge-1-np1-fxcwf-5c8cf77545-ks95v c-edge-1 c-edge-1-np1-fxcwf-5c8cf77545-ks95v vsphere://4237e712-fc19-b3b4-1d26-fd4c3f53e269 Running 2d13h v1.24.9+vmware.1 vns-edge-2 machine.cluster.x-k8s.io/c-edge-2-4fmhw-fdgsl c-edge-2 vsphere://4237b573-f30d-9348-d82e-d3ad765a346d Provisioned 4m4s v1.24.9+vmware.1 vns-edge-2 machine.cluster.x-k8s.io/c-edge-2-np1-gz5np-fdcbc5cf6-gv8td c-edge-2 Pending 4m9s v1.24.9+vmware.1 NAMESPACE NAME POWER-STATE AGE vns-edge-1 virtualmachine.vmoperator.vmware.com/c-edge-1-9sphb-7xcbv poweredOn 2d13h vns-edge-1 virtualmachine.vmoperator.vmware.com/c-edge-1-np1-fxcwf-5c8cf77545-ks95v poweredOn 2d13h vns-edge-2 virtualmachine.vmoperator.vmware.com/c-edge-2-4fmhw-fdgsl poweredOn 4m3s
The Cause
In most cases, enabling NAT for your supervisor cluster is the preferred approach. Routing a large network such as the Namespace network is a rather rare scenario.
However, when creating a K8s cluster (kind cluster or TanzuKubernetesCluster) specific controllers such as tkg-controller or capv need to establish connections to the K8s cluster. These controllers operate as Pods running on the SupervisorClusterControlePlaneNodes. And they utilize a vNIC connected to NSX within one of the Namespace Networks.
When network routing is not enabled (due to NAT being disabled), these controllers are unable to establish connections to the K8s clusters residing on another T0 Gateway.
The “to-easy-fix”
Based on our current understanding, one might suggest disabling NAT on the Supervisor during the installation process. While this approach may seem obvious, it is worth noting that in vSphere 8U1 (haven’t tested other version), this is not working. If NAT is disabled during installation, the wizard fails to validate the provided input and instead generates a generic error message:
Invalid field ‘workloads’ in structure ‘com.vmware.vcenter.namespace_management.supervisor.enable_on_compute_cluster_spec’
The real fix (maybe?)
Another approach is, to remove/modify the RouteMap. This comes with another challenge. Since this RouteMapwas created during installation, it is owned by a NSX superuser, hence it can’t be deleted from the GUI.
Instead you have to delete it via API.
The first step is to get the current RouteMapconfiguration as JSON
❯ curl --insecure -u admin:'Password123!' -X GET "https://nsxt-1.vraccoon.lab/policy/api/v1/infra/tier-0s/T0-Tanzu-1/route-maps" { "results" : [ { "entries" : [ { "prefix_list_matches" : [ "/infra/tier-0s/T0-Tanzu-1/prefix-lists/pl_domain-c1006:5c194008-2cf0-467f-af69-6786a1daf1c6_deny_t1_subnets" ], "action" : "DENY" }, { "prefix_list_matches" : [ "/infra/tier-0s/T0-Tanzu-1/prefix-lists/prefixlist-out-default" ], "action" : "PERMIT" } ], "resource_type" : "Tier0RouteMap", "id" : "rm_deny_t1_subnets", "display_name" : "rm-deny-t1-subnets", "tags" : [ { "scope" : "ncp/created_for", "tag" : "ncp/subnets_deny_rule" } ], "path" : "/infra/tier-0s/T0-Tanzu-1/route-maps/rm_deny_t1_subnets", "relative_path" : "rm_deny_t1_subnets", "parent_path" : "/infra/tier-0s/T0-Tanzu-1", "remote_path" : "", "unique_id" : "c1b634d8-de7d-43ac-b538-6207e6a0f3a1", "realization_id" : "c1b634d8-de7d-43ac-b538-6207e6a0f3a1", "owner_id" : "2b6402b9-75bf-4b9e-b93f-57791008257c", "origin_site_id" : "2b6402b9-75bf-4b9e-b93f-57791008257c", "marked_for_delete" : false, "overridden" : false, "_system_owned" : false, "_protection" : "REQUIRE_OVERRIDE", "_create_time" : 1688406759421, "_create_user" : "wcp-cluster-user-5ce7261a-4769-4a09-a459-b655fa948c1f-544e1c9b-9ac4-41c5-ab64-1ef0a72ca3ff", "_last_modified_time" : 1688406759421, "_last_modified_user" : "wcp-cluster-user-5ce7261a-4769-4a09-a459-b655fa948c1f-544e1c9b-9ac4-41c5-ab64-1ef0a72ca3ff", "_revision" : 0 } ], "result_count" : 1, "sort_by" : "display_name", "sort_ascending" : true }
Line 5-6 has the prefix list from earlier listed. Whatever networks are listed there, will be denied from beeing advertised. Thus, we will remove this prefixlist from this RouteMap.
To do so, I’ll save the output, remove the lines and reupload it.
❯ curl --insecure -u admin:'Password123!' -X GET "https://nsxt-1.vraccoon.lab/policy/api/v1/infra/tier-0s/T0-Tanzu-1/route-maps/rm_deny_t1_subnets" > rm_deny_t1_subnets.json ❯ sed -i '3,5d' rm_deny_t1_subnets.json ❯ cat rm_deny_t1_subnets.json { "entries" : [ { "prefix_list_matches" : [ "/infra/tier-0s/T0-Tanzu-1/prefix-lists/prefixlist-out-default" ], "action" : "PERMIT" } ], "resource_type" : "Tier0RouteMap", "id" : "rm_deny_t1_subnets", "display_name" : "rm-deny-t1-subnets", "tags" : [ { "scope" : "ncp/created_for", "tag" : "ncp/subnets_deny_rule" } ], "path" : "/infra/tier-0s/T0-Tanzu-1/route-maps/rm_deny_t1_subnets", "relative_path" : "rm_deny_t1_subnets", "parent_path" : "/infra/tier-0s/T0-Tanzu-1", "remote_path" : "", "unique_id" : "c1b634d8-de7d-43ac-b538-6207e6a0f3a1", "realization_id" : "c1b634d8-de7d-43ac-b538-6207e6a0f3a1", "owner_id" : "2b6402b9-75bf-4b9e-b93f-57791008257c", "origin_site_id" : "2b6402b9-75bf-4b9e-b93f-57791008257c", "marked_for_delete" : false, "overridden" : false, "_system_owned" : false, "_protection" : "REQUIRE_OVERRIDE", "_create_time" : 1688406759421, "_create_user" : "wcp-cluster-user-5ce7261a-4769-4a09-a459-b655fa948c1f-544e1c9b-9ac4-41c5-ab64-1ef0a72ca3ff", "_last_modified_time" : 1688406759421, "_last_modified_user" : "wcp-cluster-user-5ce7261a-4769-4a09-a459-b655fa948c1f-544e1c9b-9ac4-41c5-ab64-1ef0a72ca3ff", "_revision" : 0 }
Now we can re-apply the the RouteMap:
❯ curl --insecure -u admin:'Password123!' -H "X-Allow-Overwrite:true" -H "Content-Type: application/json" --data @rm_deny_t1_subnets.json -X PUT "https://nsxt-1.vraccoon.lab/policy/api/v1/infra/tier-0s/T0-Tanzu-1/route-maps/rm_deny_t1_subnets.json"
NOTE 1: Overwriting the automagically installed RouteMap is probably not supported.
Also, if I’d really do this in production, I’d probably modify the prefixlist to only propagate the NSX based SupervisorControlPlane Segment.
It might now take a few minutes, but the cluster provisioning will continue eventually.
❯ kubectl get cluster,machine,vm -A NAMESPACE NAME PHASE AGE VERSION vns-edge-1 cluster.cluster.x-k8s.io/c-edge-1 Provisioned 2d14h v1.24.9+vmware.1 vns-edge-2 cluster.cluster.x-k8s.io/c-edge-2 Provisioned 61m v1.24.9+vmware.1 NAMESPACE NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION vns-edge-1 machine.cluster.x-k8s.io/c-edge-1-9sphb-7xcbv c-edge-1 c-edge-1-9sphb-7xcbv vsphere://4237b2f8-926a-e5b8-de9d-8a94a5e12748 Running 2d14h v1.24.9+vmware.1 vns-edge-1 machine.cluster.x-k8s.io/c-edge-1-np1-fxcwf-5c8cf77545-ks95v c-edge-1 c-edge-1-np1-fxcwf-5c8cf77545-ks95v vsphere://4237e712-fc19-b3b4-1d26-fd4c3f53e269 Running 2d14h v1.24.9+vmware.1 vns-edge-2 machine.cluster.x-k8s.io/c-edge-2-4fmhw-fdgsl c-edge-2 c-edge-2-4fmhw-fdgsl vsphere://4237b573-f30d-9348-d82e-d3ad765a346d Running 61m v1.24.9+vmware.1 vns-edge-2 machine.cluster.x-k8s.io/c-edge-2-np1-gz5np-fdcbc5cf6-gv8td c-edge-2 c-edge-2-np1-gz5np-fdcbc5cf6-gv8td vsphere://42372d8a-3ca6-a160-2246-a8b44b324df2 Running 61m v1.24.9+vmware.1 NAMESPACE NAME POWER-STATE AGE vns-edge-1 virtualmachine.vmoperator.vmware.com/c-edge-1-9sphb-7xcbv poweredOn 2d14h vns-edge-1 virtualmachine.vmoperator.vmware.com/c-edge-1-np1-fxcwf-5c8cf77545-ks95v poweredOn 2d14h vns-edge-2 virtualmachine.vmoperator.vmware.com/c-edge-2-4fmhw-fdgsl poweredOn 61m vns-edge-2 virtualmachine.vmoperator.vmware.com/c-edge-2-np1-gz5np-fdcbc5cf6-gv8td poweredOn 7m44s
To check connectivity, you can try to ping the corresponding segment’s gateway IP. The K8s nodes themself do not reply on icmp requests. But you could do port scans e.g. with netcat.
NOTE 2: There is a prefixlist on your T0 Gateway called like “pl-domain-<clusterMOBID>:<supervisor-uuid>-deny-t1-subnets”. Creating a vSphere Namespace with NAT enabled, adds an entry (containing the Namespace Network) to that prefixlist. This prefixlist is then used as matching criteria on a route-map “rm-deny-t1-subnets” to prevent these networks from being advertised.
For some reasons, these entries will always be added to the default T0 Gateway prefixlist (set up during Supervisor installation). Which means, that Namespace Networks from vSphere Namespaces on other T0 Gateways (by using “Override Supervisor network settings“) will always be advertised, unless you create a route map manually. This is probably a bug.
One Reply to “vSphere with Tanzu – Multi T0 Gateway”