vSphere with Tanzu – Multi T0 Gateway

General description

If you are utilizing vSphere with Tanzu in conjunction with NSX, you can implement multiple T0 Gateways to further segregate your network traffic. To do this, you must set one T0 Gateway as the default during installation. Later, you can configure additional T0 Gateways and selectively override the default T0 Gateway on a per-vSphere-Namespace-basis.

Difference between NAT and No-NAT

During both, the installation process and vSphere Namespace creation, you have the option to enable or disable Network Address Translation (NAT). Enabling NAT entails two important considerations:

  1. Specifying an Egress-Network, which must be routed. A dedicated Egress IP is assigned to each vSphere Namespace. When objects such as K8s Nodes or Pods within the vSphere Namespace initiate communication with external entities (DNS, ActiveDirectory, NTP,…), they are source NAT’ed to the dedicated Egress IP (one dedicated IP per vSphere Namespace).
  2. Avoiding the need for Namespace Network Routing: Since the communication is source NAT’ed, there is no requirement to route the entire Namespace Network (which contains the K8s Nodes) outside of NSX. To prevent such routing from happening, a RouteMap is installed on the T0 Gateway (this is done automatically).

The Problem

If you now create a vSphere Namespace on a T0 Gateway, other than the default, and try to create a K8s Cluster, only the first Controlplane VM will be created and it will never finish. Thus, no further nodes will be created and the cluster never finishes either.

> kubectl get cluster,machine,vm -A
NAMESPACE    NAME                                PHASE         AGE     VERSION
vns-edge-1   cluster.cluster.x-k8s.io/c-edge-1   Provisioned   2d13h   v1.24.9+vmware.1
vns-edge-2   cluster.cluster.x-k8s.io/c-edge-2   Provisioned   4m19s   v1.24.9+vmware.1

NAMESPACE    NAME                                                           CLUSTER    NODENAME                              PROVIDERID                                       PHASE         AGE     VERSION
vns-edge-1   machine.cluster.x-k8s.io/c-edge-1-9sphb-7xcbv                  c-edge-1   c-edge-1-9sphb-7xcbv                  vsphere://4237b2f8-926a-e5b8-de9d-8a94a5e12748   Running       2d13h   v1.24.9+vmware.1
vns-edge-1   machine.cluster.x-k8s.io/c-edge-1-np1-fxcwf-5c8cf77545-ks95v   c-edge-1   c-edge-1-np1-fxcwf-5c8cf77545-ks95v   vsphere://4237e712-fc19-b3b4-1d26-fd4c3f53e269   Running       2d13h   v1.24.9+vmware.1
vns-edge-2   machine.cluster.x-k8s.io/c-edge-2-4fmhw-fdgsl                  c-edge-2                                         vsphere://4237b573-f30d-9348-d82e-d3ad765a346d   Provisioned   4m4s    v1.24.9+vmware.1
vns-edge-2   machine.cluster.x-k8s.io/c-edge-2-np1-gz5np-fdcbc5cf6-gv8td    c-edge-2                                                                                          Pending       4m9s    v1.24.9+vmware.1

NAMESPACE    NAME                                                                       POWER-STATE   AGE
vns-edge-1   virtualmachine.vmoperator.vmware.com/c-edge-1-9sphb-7xcbv                  poweredOn     2d13h
vns-edge-1   virtualmachine.vmoperator.vmware.com/c-edge-1-np1-fxcwf-5c8cf77545-ks95v   poweredOn     2d13h
vns-edge-2   virtualmachine.vmoperator.vmware.com/c-edge-2-4fmhw-fdgsl                  poweredOn     4m3s

The Cause

In most cases, enabling NAT for your supervisor cluster is the preferred approach. Routing a large network such as the Namespace network is a rather rare scenario.
However, when creating a K8s cluster (kind cluster or TanzuKubernetesCluster) specific controllers such as tkg-controller or capv need to establish connections to the K8s cluster. These controllers operate as Pods running on the SupervisorClusterControlePlaneNodes. And they utilize a vNIC connected to NSX within one of the Namespace Networks.
When network routing is not enabled (due to NAT being disabled), these controllers are unable to establish connections to the K8s clusters residing on another T0 Gateway.

The “to-easy-fix”

Based on our current understanding, one might suggest disabling NAT on the Supervisor during the installation process. While this approach may seem obvious, it is worth noting that in vSphere 8U1 (haven’t tested other version), this is not working. If NAT is disabled during installation, the wizard fails to validate the provided input and instead generates a generic error message:

Invalid field ‘workloads’ in structure ‘com.vmware.vcenter.namespace_management.supervisor.enable_on_compute_cluster_spec’

The real fix (maybe?)

Another approach is, to remove/modify the RouteMap. This comes with another challenge. Since this RouteMapwas created during installation, it is owned by a NSX superuser, hence it can’t be deleted from the GUI.

Instead you have to delete it via API.
The first step is to get the current RouteMapconfiguration as JSON

❯ curl --insecure -u admin:'Password123!' -X GET "https://nsxt-1.vraccoon.lab/policy/api/v1/infra/tier-0s/T0-Tanzu-1/route-maps"
{
  "results" : [ {
    "entries" : [ {
      "prefix_list_matches" : [ "/infra/tier-0s/T0-Tanzu-1/prefix-lists/pl_domain-c1006:5c194008-2cf0-467f-af69-6786a1daf1c6_deny_t1_subnets" ],
      "action" : "DENY"
    }, {
      "prefix_list_matches" : [ "/infra/tier-0s/T0-Tanzu-1/prefix-lists/prefixlist-out-default" ],
      "action" : "PERMIT"
    } ],
    "resource_type" : "Tier0RouteMap",
    "id" : "rm_deny_t1_subnets",
    "display_name" : "rm-deny-t1-subnets",
    "tags" : [ {
      "scope" : "ncp/created_for",
      "tag" : "ncp/subnets_deny_rule"
    } ],
    "path" : "/infra/tier-0s/T0-Tanzu-1/route-maps/rm_deny_t1_subnets",
    "relative_path" : "rm_deny_t1_subnets",
    "parent_path" : "/infra/tier-0s/T0-Tanzu-1",
    "remote_path" : "",
    "unique_id" : "c1b634d8-de7d-43ac-b538-6207e6a0f3a1",
    "realization_id" : "c1b634d8-de7d-43ac-b538-6207e6a0f3a1",
    "owner_id" : "2b6402b9-75bf-4b9e-b93f-57791008257c",
    "origin_site_id" : "2b6402b9-75bf-4b9e-b93f-57791008257c",
    "marked_for_delete" : false,
    "overridden" : false,
    "_system_owned" : false,
    "_protection" : "REQUIRE_OVERRIDE",
    "_create_time" : 1688406759421,
    "_create_user" : "wcp-cluster-user-5ce7261a-4769-4a09-a459-b655fa948c1f-544e1c9b-9ac4-41c5-ab64-1ef0a72ca3ff",
    "_last_modified_time" : 1688406759421,
    "_last_modified_user" : "wcp-cluster-user-5ce7261a-4769-4a09-a459-b655fa948c1f-544e1c9b-9ac4-41c5-ab64-1ef0a72ca3ff",
    "_revision" : 0
  } ],
  "result_count" : 1,
  "sort_by" : "display_name",
  "sort_ascending" : true
}

Line 5-6 has the prefix list from earlier listed. Whatever networks are listed there, will be denied from beeing advertised. Thus, we will remove this prefixlist from this RouteMap.
To do so, I’ll save the output, remove the lines and reupload it.

❯ curl --insecure -u admin:'Password123!' -X GET "https://nsxt-1.vraccoon.lab/policy/api/v1/infra/tier-0s/T0-Tanzu-1/route-maps/rm_deny_t1_subnets" > rm_deny_t1_subnets.json

❯ sed -i '3,5d' rm_deny_t1_subnets.json

❯ cat rm_deny_t1_subnets.json
{
  "entries" : [ {
    "prefix_list_matches" : [ "/infra/tier-0s/T0-Tanzu-1/prefix-lists/prefixlist-out-default" ],
    "action" : "PERMIT"
  } ],
  "resource_type" : "Tier0RouteMap",
  "id" : "rm_deny_t1_subnets",
  "display_name" : "rm-deny-t1-subnets",
  "tags" : [ {
    "scope" : "ncp/created_for",
    "tag" : "ncp/subnets_deny_rule"
  } ],
  "path" : "/infra/tier-0s/T0-Tanzu-1/route-maps/rm_deny_t1_subnets",
  "relative_path" : "rm_deny_t1_subnets",
  "parent_path" : "/infra/tier-0s/T0-Tanzu-1",
  "remote_path" : "",
  "unique_id" : "c1b634d8-de7d-43ac-b538-6207e6a0f3a1",
  "realization_id" : "c1b634d8-de7d-43ac-b538-6207e6a0f3a1",
  "owner_id" : "2b6402b9-75bf-4b9e-b93f-57791008257c",
  "origin_site_id" : "2b6402b9-75bf-4b9e-b93f-57791008257c",
  "marked_for_delete" : false,
  "overridden" : false,
  "_system_owned" : false,
  "_protection" : "REQUIRE_OVERRIDE",
  "_create_time" : 1688406759421,
  "_create_user" : "wcp-cluster-user-5ce7261a-4769-4a09-a459-b655fa948c1f-544e1c9b-9ac4-41c5-ab64-1ef0a72ca3ff",
  "_last_modified_time" : 1688406759421,
  "_last_modified_user" : "wcp-cluster-user-5ce7261a-4769-4a09-a459-b655fa948c1f-544e1c9b-9ac4-41c5-ab64-1ef0a72ca3ff",
  "_revision" : 0
}

Now we can re-apply the the RouteMap:

❯ curl --insecure -u admin:'Password123!' -H "X-Allow-Overwrite:true" -H "Content-Type: application/json" --data @rm_deny_t1_subnets.json -X PUT "https://nsxt-1.vraccoon.lab/policy/api/v1/infra/tier-0s/T0-Tanzu-1/route-maps/rm_deny_t1_subnets.json"

NOTE 1: Overwriting the automagically installed RouteMap is probably not supported.
Also, if I’d really do this in production, I’d probably modify the prefixlist to only propagate the NSX based SupervisorControlPlane Segment.

It might now take a few minutes, but the cluster provisioning will continue eventually.

❯ kubectl get cluster,machine,vm -A
NAMESPACE    NAME                                PHASE         AGE     VERSION
vns-edge-1   cluster.cluster.x-k8s.io/c-edge-1   Provisioned   2d14h   v1.24.9+vmware.1
vns-edge-2   cluster.cluster.x-k8s.io/c-edge-2   Provisioned   61m     v1.24.9+vmware.1

NAMESPACE    NAME                                                           CLUSTER    NODENAME                              PROVIDERID                                       PHASE     AGE     VERSION
vns-edge-1   machine.cluster.x-k8s.io/c-edge-1-9sphb-7xcbv                  c-edge-1   c-edge-1-9sphb-7xcbv                  vsphere://4237b2f8-926a-e5b8-de9d-8a94a5e12748   Running   2d14h   v1.24.9+vmware.1
vns-edge-1   machine.cluster.x-k8s.io/c-edge-1-np1-fxcwf-5c8cf77545-ks95v   c-edge-1   c-edge-1-np1-fxcwf-5c8cf77545-ks95v   vsphere://4237e712-fc19-b3b4-1d26-fd4c3f53e269   Running   2d14h   v1.24.9+vmware.1
vns-edge-2   machine.cluster.x-k8s.io/c-edge-2-4fmhw-fdgsl                  c-edge-2   c-edge-2-4fmhw-fdgsl                  vsphere://4237b573-f30d-9348-d82e-d3ad765a346d   Running   61m     v1.24.9+vmware.1
vns-edge-2   machine.cluster.x-k8s.io/c-edge-2-np1-gz5np-fdcbc5cf6-gv8td    c-edge-2   c-edge-2-np1-gz5np-fdcbc5cf6-gv8td    vsphere://42372d8a-3ca6-a160-2246-a8b44b324df2   Running   61m     v1.24.9+vmware.1

NAMESPACE    NAME                                                                       POWER-STATE   AGE
vns-edge-1   virtualmachine.vmoperator.vmware.com/c-edge-1-9sphb-7xcbv                  poweredOn     2d14h
vns-edge-1   virtualmachine.vmoperator.vmware.com/c-edge-1-np1-fxcwf-5c8cf77545-ks95v   poweredOn     2d14h
vns-edge-2   virtualmachine.vmoperator.vmware.com/c-edge-2-4fmhw-fdgsl                  poweredOn     61m
vns-edge-2   virtualmachine.vmoperator.vmware.com/c-edge-2-np1-gz5np-fdcbc5cf6-gv8td    poweredOn     7m44s

To check connectivity, you can try to ping the corresponding segment’s gateway IP. The K8s nodes themself do not reply on icmp requests. But you could do port scans e.g. with netcat.

NOTE 2: There is a prefixlist on your T0 Gateway called like “pl-domain-<clusterMOBID>:<supervisor-uuid>-deny-t1-subnets”. Creating a vSphere Namespace with NAT enabled, adds an entry (containing the Namespace Network) to that prefixlist. This prefixlist is then used as matching criteria on a route-map “rm-deny-t1-subnets” to prevent these networks from being advertised.
For some reasons, these entries will always be added to the default T0 Gateway prefixlist (set up during Supervisor installation). Which means, that Namespace Networks from vSphere Namespaces on other T0 Gateways (by using “Override Supervisor network settings“) will always be advertised, unless you create a route map manually. This is probably a bug.

One Reply to “vSphere with Tanzu – Multi T0 Gateway”

Leave a Reply

Your email address will not be published. Required fields are marked *