Getting Started with NSX-T: Logical Routing and Switching: The Basic Principles of Building Software-Defined Network Architectures with VMware NSX-T [1 ed.] 1484267079, 9781484267073

This primer on NSX-T helps you understand the capabilities and features of NSX-T, how to configure and manage NSX-T, and

976 192 31MB

English Pages 392 [356] Year 2021

Table of contents :
Cover
Front Matter
1. NSX-T and the VMware SDDC Ecosystem
2. An Introduction to NSX-T
3. NSX-T Installation in a vSphere Environment
4. The NSX-T Data Plane
5. Logical Switching
6. Edge and Edge Clusters
7. Logical Routing
8. VRF Lite and EVPN
9. Multicast in NSX-T
10. Logical Bridging
Back Matter

Recommend Papers

Getting Started with NSX‑T: Logical Routing and Switching: The Basic Principles of Building Software-Defined Network Architectures with VMware NSX-T 9781484267080

577 82 15MB Read more

Getting started with mathematica

Mathematica is a computer program for doing mathematics. It is often used for instruction, homework, research, and writi

1,475 78 4MB Read more

Getting started with Electron.js

This book is for anyone who wants to start developing with Electron.js; No prior knowledge of other jаvascript framework

173 45 3MB Read more

Getting started with Dylan

534 50 29KB Read more

Getting Started with OpenBTS

711 63 2MB Read more

Getting Started With ArcIMS

Getting Started With ArcIMS. – ESRI, 2004. – 62 р.About ArcIMS. ArcIMS documentation. Other available documents. About s

520 56 2MB Read more

Mathematics. Getting Started with Mathematica

533 49 111KB Read more

Getting Started with PIC Microcontrollers

589 125 332KB Read more

Getting Started with Javascript 9782490275427

528 60 10MB Read more

Arduino Book for Beginners : GETTING STARTED WITH ARDUINO AND BASIC PROGRAMMING WITH PROJECTS (New Edition 2022)

ARDUINO BOOK FOR BEGINNERS:GETTING STARTED WITH ARDUINO BASICS PROGRAMMING WITH PROJECTS (New Edition 2022) This Arduin

200 14 16MB Read more

Getting Started with NSX-T: Logical Routing and Switching: The Basic Principles of Building Software-Defined Network Architectures with VMware NSX-T [1 ed.]
1484267079, 9781484267073

Author / Uploaded
Iwan Hoogendoorn

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Iwan Hoogendoorn

Getting Started with NSX-T: Logical Routing and Switching The Basic Principles of Building SoftwareDefined Network Architectures with VMware NSX-T 1st ed.

Iwan Hoogendoorn Rotterdam, The Netherlands

Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub via the book’s product page, located at www.apress.com/978-1-4842-6707-3. For more detailed information, please visit http://www.apress.com/source-code. ISBN 978-1-4842-6707-3 e-ISBN 978-1-4842-6708-0 https://doi.org/10.1007/978-1-4842-6708-0 © Iwan Hoogendoorn 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Distributed to the book trade worldwide by Springer Science+Business Media New York, 1 New York Plaza, Suite 4600, New York, NY 100041562, USA. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail [email protected], or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.

I dedicate this to my wife Shareen, the one that supports me the most in life. You always allow me to follow my dreams and ambition, and because of this, I know I am not around most of the time. Thank you for giving me the space to become a better version of myself. Without you, I could not perform and operate the way I do and achieve my goals and dreams. I also dedicate this to my five-year-old daughter Zaara, who asks me the same questions every evening when I am typing “stuff” on my laptop. “Daddy, how long do you have to work?” “Daddy, do you have a meeting”? “Daddy, how many meetings do you have?” “Daddy, are you coming to sleep after these meetings?”

Introduction Chapter 1 provides a primer on how virtualization is used to create an SDDC. One of the central pillars of an SDDC is the use of software defined networking (SDN). To better understand NSX-T (VMware’s SDN solution), it is best to first understand the components involved. Chapter 2 explains the NSX-T components and services that form the full network virtualization (NV) stack. It dives into the main use cases of NSX-T. By the end of this chapter, you will understand how the NSX-T components interact regarding the control, management, and data planes. Chapter 3 explains the installation steps of the NSX-T Manager on a vSphere infrastructure. When the NSX-T Manager cluster is fully installed in a clustered form, you will also learn how to replace the selfsigned certificates, navigate the GUI, and perform some basic CLI commands. Chapter 4 explains some new terms that are used when we configure the data plane components. To prepare ESXi hosts to become NSX-T host transport nodes or create edge transport nodes in the form of virtual machines of bare metal servers, we first need to configure IP pools, uplink profiles, and transport zones. I explain what these components are and how they relate to each other. I also explain the differences between a VDS and N-VDS. Chapter 5 teaches you about the logical switching use cases and how logical switching is performed using NSX-T Segments. It is possible that layer 2 can be stretched using TEP, MAC, and ARP tables. This chapter also teaches you how GENEVE encapsulation works and how the segments are configured on the data plane using the NSX-T GUI. This chapter ends by teaching you about different segment profiles that can be applied to a segment or a segment port to configure more granular settings related to a specific segment or segment port. This chapter also explains how BUM traffic is handled using two different replication methods. Chapter 6 is dedicated to the NSX-T edge transport nodes and NSX-T edge clusters. Chapter 7 helps you understand the architecture, routing features, and components needed to design and implement a layer 3 network

using NSX-T. Chapter 8 explains VRF’s use cases, benefits, and limitations, as well as the configuration and validation of VRF Lite. You will also learn about EVPN architecture and understand the use cases and benefits related to EVPN. Chapter 9 explains the multicast technology and covers the benefits and use cases of it at a general level. You will also learn about the multicast capabilities that NSX-T offers and how to configure it. Chapter 10 describes the use cases and operation of logical bridging. It explains the differences between routing and bridging. It also explains how to configure logical bridging by creating a bridge profile and a bridge-backed segment to bridge a virtual NSX-T segment with a physical network.

Acknowledgments Thank you, VMware, for allowing me to write this book. Thank you, VMware #vExpert community, for the amount of knowledge you share. Thank you, Babak and Jerry, for working together with me on this book and making it happen. Thank you, Apress, for allowing me to write this book and publishing it on my behalf.

Table of Contents Chapter 1: NSX-T and the VMware SDDC Ecosystem Software-Defined Data Center Clouds Computer Virtualization Before Virtualization Use Cases for Virtualization After Virtualization Storage Virtualization (Software Defined Storage | SDS) Network Virtualization (Software Defined Networking | SDN) VMware SDDC VMware vSphere VMware Virtual Storage Area Network (VSAN) NSX-v and NSX-T VMware VCF Summary Chapter 2: An Introduction to NSX-T NSX-T Features Platform-Related Features Networking-Related Features Security and Services-Related Features Automation-Related Features Operations-Related Features Troubleshooting-Related Features NSX-T Use Cases Security Use Cases

Automation Use Cases Multi-Cloud Networking Use Cases High-Level Architecture of NSX-T NSX-T Management and Control Planes NSX-T Policy Role NSX-T Manager Role NSX-T Controller Role NSX-T Data Plane Summary Chapter 3: NSX-T Installation in a vSphere Environment Implementation Steps Deployment Considerations Node Size Deploying the First NSX-T Manager Using the OVF Template Accessing the NSX-T Manager Adding the NSX-T License vCenter Server Registration with the NSX-T Manager Completing the NSX-T Manager Cluster Changing the NSX-T Node Size After the Deployment of the Full Cluster Configure NSX-T Manager Virtual IP Address (VIP) Replacing the Self-Signed Certificates NSX-T GUI Navigation Policy vs. Manager Networking Tab Security Tab

Inventory Tab Plan & Troubleshoot Tab System Tab Summary Chapter 4: The NSX-T Data Plane Data Plane Functions and Components Transport Node Overview Transport Node Architecture Transport Node Physical Connectivity IP Address Pools Transport Zones Transport Node Switch Configuration N-VDS VDS N-VDS vs. VDS on the ESXi Host Transport Nodes Transport Zone and N-VDS/VDS Mapping Transport Zone Creation N-VDS and VDS Modes of Operation Physical Network Interface Cards (NICs), Link Aggregation Groups (LAGs), and Uplinks Uplink Profiles Teaming Policy Types Link Layer Discovery Protocol (LLDP) Network I/O Control (NIOC) Profile Transport Node Profiles ESXi Host Preparation

(Bare Metal) Physical Server (Bare Metal) Physical Server “Modes” Summary Chapter 5: Logical Switching Logical Switching Use Cases Logical Switching Terminology Segments Tunneling GENEVE GENEVE Header Format Low-Level Packet Walk Using GENEVE (End-To-End Communication) Creating Segments Under the Hood (Management, Control, and Data Planes) Creating Segments Attaching a Segment to a Virtual Machine Verifying the Segment Port Status Attaching Segments Under the Hood (Management, Control, and Data Planes) Segment Profiles IP Discovery Segment Profile MAC Discovery Segment Profile Quality of Service (QoS) Segment Profile Segment Security Profile Applying Segment Profiles to Segments Applying Segment Profiles to Segment Ports Logical Switching Packet Forwarding

TEP Table Updates MAC Table Updates ARP Table Unicast Packet Forwarding Across Hosts Broadcast, Unknown Unicast, and Multicast (BUM) Traffic BUM Traffic: Head Replication BUM Traffic: Hierarchical Two-Tier Replication Summary Chapter 6: Edge and Edge Clusters Edge Node Edge Node Cluster Edge Node Form Factors Edge VM Form Factor Edge VM Sizing Edge VM Deployment Prerequisites Edge VM Interface Considerations Edge VM on Multiple Virtual Switches Edge VM on One Virtual Switch Deployment Options for Virtual Edge Nodes Virtual Edge Node Deployment from the NSX-T GUI Virtual Edge Node Deployment from the vCenter Server GUI Edge VM Resource Reservation Bare Metal Edge Form Factor Bare Metal Edge and Virtual Switches Bare Metal Edge Deployment Prerequisites Bare Metal Edge Server Hardware Requirements

Edge Node Deployment on Bare Metal Edges Edge Node Settings Verifying the Edge After the Initial Deployment (Before Configuration) Verifying the Edge Transport Node (After Deployment and Registration) Creating an Edge Cluster Summary Chapter 7: Logical Routing Logical Routing Use Cases Logical Routing Tier-0 and Tier-1 Gateways Edge Transport Nodes Edge Clusters Topologies Single-Tier Topologies Multi-Tier Topologies Tier-0 Gateway Uplinks Gateway Components Gateway Components of a Single-Tier Topology Gateway Components of a Multi-Tier Topology Gateway Interfaces Gateway Configuration Steps Tier-1 Gateway Creating a Tier-1 Gateway Attach a Segment to a Tier-1 Gateway East-West Connectivity Test

Tier-0 Gateway Creating a Tier-0 Gateway Tier-0 Gateway Uplink Segment Creation Routing on a Tier-0 Gateway Attaching a Tier-1 Gateway to a Tier-0 Gateway Route Advertisement on the Tier-1 Gateway to a Tier-0 Gateway Route Redistribution on the Tier-0 Gateway to the Upstream Routing Device North-South Connectivity Test Static and Dynamic Routing Tier-0 Gateway Routing Configurations Static Routing on a Tier-0 Gateway Dynamic Routing on a Tier-0 Gateway with BGP Verifying Dynamic Routing on a Tier-0 Gateway with BGP Advanced Dynamic Routing on a Tier-0 Gateway Inter-SR Routing with iBGP Bidirectional Forwarding Detection (BFD) Turning BFD On BGP Allowas-in Multipath Relax IP Prefix Lists IP Prefix List Configuration Communities and Community Lists Community List Configuration Route Maps Route Map Configuration

Using Route Maps for BGP Route Advertisements Using Route Aggregation Route Aggregation Configuration Equal-Cost Multi-Path Routing Equal-Cost Multi-Path Configuration High Availability High Availability Configuration Active/Active Mode Active/Standby Mode Failover Detection Methods BFD Failover Scenario BGP Failover Scenario Failover Modes Logical Routing Packet Walks Single-Tier Routing: Inside NSX-T Egress to the Physical Network Multi-Tier Routing: Inside NSX-T Egress to the Physical Network Summary Chapter 8: VRF Lite and EVPN VRF Lite Use Cases VRF Lite Terminology VRF Lite VRF Lite Requirements VRF Lite Limitations VRF Lite Topologies VRF Lite Gateway Interfaces

VRF Lite Control and Data Planes VRF Lite Configuration Tier-0 Gateway Deployment Tier-0 Gateway Uplink Interfaces Tier-0 Gateway BGP Tier-0 Gateway Uplink Interface for VRF Gateway VRF Gateway VRF Gateway Uplink Interfaces VRF Gateway BGP Add and Connect a Tier-1 Gateway to a VRF Gateway Enable Route Advertisement and Redistribution Network Topology Map Verification VRF Lite (Routing Table) Validation Ethernet VPN (EVPN) Challenges that Telco Companies Have with NFV The Telco NFV Solution Telco Virtual Router EVPN Topology EVPN Data Plane EVPN Control Plane EVPN Route Advertisement Route Distinguishers (RDs) Route Targets (RT) EVPN Requirements and Limitations Comparing EVPN and VRF Summary

Chapter 9: Multicast in NSX-T Multicast Multicast Benefits and Use Cases Multicast Protocols Multicast Topology Multicast IGMP Snooping and Forwarding Tables Multicast IGMP Queries Multicast IGMP Reports Multicast PIM Sparse Mode Multicast PIM Sparse Mode Forwarding Multicast PIM Bootstrap Multicast North-South Flows Multicast South-North Flows Multicast Source Replication Flows Multicast Destination Replication Flows Multicast Limitations Multicast Configuration Steps Multicast IGMP Profile Configuration Multicast PIM Profile Configuration Multicast Tier-0/Tier-1 Gateway Configuration Multicast Tier-0 Uplink Configuration Multicast Tier-0/Tier-1 Downlink Configuration Summary Chapter 10: Logical Bridging Logical Bridging Logical Bridging Use Cases

Physical to Virtual Connectivity: Bridging vs. Routing Bridging Routing Physical to Virtual Routing Physical to Virtual Bridging Logical Bridging Components Multiple Bridge Profiles Configure NSX-T Logical Bridging (Single Profile) Summary Index

About the Author Iwan Hoogendoorn started his IT career in 1999 as a helpdesk agent. Soon thereafter, Iwan started to learn Microsoft products and that resulted in earning his MCP, MCSA, MCDBA, and MCSE certifications. While working as a Microsoft Systems Engineer, Iwan developed additional skills and knowledge in computer networking. Networking became his passion and this passion resulted in learning networking with Cisco products. Like most network engineers, one of his dreams was to work for Cisco. But before this could happen, he needed to finish his bachelor’s degree in ICT, and he completed this in 2009. In early 2010, he started working for Cisco. After finishing his Master’s degree (part time) in Computer Science at the University of Amsterdam and becoming a CCIE (#13084) in six different technology areas, Iwan, as ambitious as he sometimes is, wanted to learn something new—virtualization. Because networking ran through his veins, network virtualization was the next logical step. So he decided to learn VMware NSX. He got the opportunity to work for VMware in 2016 as a Senior NSX PSO Consultant. During his time at VMware, he gained more knowledge of private and public clouds and the related products that VMware developed to build the software defined data center (SDDC). As new technology grows at an exponentially high pace (especially in the VMware and the VMware cloud space), Iwan is always playing catch-up and trying to keep up with the new VMware offerings. After working for four years as a Senior NSX PSO consultant (primarily with VMware NSX-v and NSX-T), Iwan was promoted to a Staff SDDC consultant, focusing on the full SDDC stack, including hyperscaler offerings on the main public clouds like AWS (VMC on AWS), Microsoft

(Azure VMware Solution), and Google (Google Cloud VMware Engine). Iwan is certified on multiple VMware products, including NSX, and he is actively working with VMware Certification to develop new networkrelated exams for VMware. In addition to his VMware certifications, Iwan is also AWS and TOGAF certified.

About the Technical Reviewers Babak Dorrani is a Senior consultant with extensive knowledge of physical and virtual networking. He has more than 20 years of experience in various networking fields, such as WAN, Voice, Datacenter, and Enterprise. Babak has designed, implemented, and maintained large-scale, multi-tenant networks based on VMware, Cisco, and Juniper technologies and products. His previous role as an R&D architect within an international service provider gave him the chance to work with new and cutting-edge networking and SDN solutions. He likes to get involved in challenging design and deployment projects, as he thinks that “building secure and scalable networks” is the most exciting work there is.

Jerry Zechiel is a technical consultant with extensive knowledge of virtualization, network infrastructure, and security aspects. He has expertise in Microsoft, VMware, Citrix, and Cisco. He has experience with implementations, migrations, and transitions within complex IT environments in government and commercial companies. He did a lot of work on improving information security and he loves to come up with smart solutions that help the business further. Because of his positive mindset, Jerry thinks in terms of solutions and opportunities instead of limitations!

© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021 I. Hoogendoorn, Getting Started with NSX-T: Logical Routing and Switching https://doi.org/10.1007/978-1-4842-6708-0_1

1. NSX-T and the VMware SDDC Ecosystem Iwan Hoogendoorn1 (1) Rotterdam, The Netherlands

This first chapter is a primer on how virtualization is used to create a software-defined data center (an SDDC) . One of the central pillars of the software-defined data center is the use of software defined networking (SDN). To better understand NSX-T (VMware’s SDN solution), it is best to first understand the components involved.

Software-Defined Data Center SDDC is a collective term often used in IT when working with virtualization technology. With SDDCs, the data center infrastructure is virtualized, and with this virtualization, the compute, storage, and network infrastructure components are also virtualized. With virtualization, it is possible to abstract the physical hardware into virtual pools and efficiently use your resources.

Figure 1-1 Software-defined data center components

Because the data center components are now software-based (Figure 1-1), the configuration is agile, and parts and services can be provisioned faster than ever before with a physical data center infrastructure.

Clouds The SDDC is typically used in a cloud, which is generally one or multiple SDDCs. A private cloud is referred to as an SDDC infrastructure that is typically hosted on-premise (or on-site), and a public cloud is related to an SDDC infrastructure that is generally hosted off-premise (or off-site) somewhere remote, whether in a rented data center facility or a full, hosted Infrastructure as a Service (IaaS) offering.

Computer Virtualization Compute virtualization is a way to abstract the hardware layer and create the option that multiple virtual components can share this

hardware. The CPU and RAM can now be shared across multiple virtual machines, each having its operating system.

Before Virtualization Before virtualization, physical hardware could typically host only one operating system (OS) at a time (Figure 1-2). Even though multiple applications could run on top of this available operating system instance, using the physical hardware was still inefficient.

Figure 1-2 Before virtualization

Use Cases for Virtualization Some of the main use cases to virtualize your physical compute resources are described in this section.

Efficient Use of Resources The abstraction from the physical hardware through virtualization makes it possible to run multiple virtual servers while still using the same hardware. With this, resource usage can be shared but also consumed up to its full potential.

Security With the possibility of creating multiple servers, it is also possible to segregate applications from each other. Let’s say you have a critical

application that you need to run in a secured, isolated environment, where no other apps can negatively impact this application. You can choose to just run it inside another virtual server (virtual machine).

Automation Using a virtualized server infrastructure built primarily with software, another use case is that you will save time (eventually resulting in cost savings) by automating complete application tiers consisting of multiple virtual machines.

After Virtualization Figure 1-3 illustrates how virtual machines share the same physical hardware. Each virtual machine is capable of running its operating system, isolated from the other operating systems running on other virtual machines. The hypervisor can be seen as the (software) translation layer between the physical and virtual components. A hypervisor mimics or emulates the available physical resources and then translates them into virtual components that the virtual machines can leverage. Figure 1-3 shows the example of a so-called type-1 hypervisor. With a type-1 hypervisor, the hypervisor is the operating system. (This is different from the operating system that we use for a virtual machine.) This type of hypervisor is known for its performance, as there is no additional overhead of another operating system consuming other resources.

Figure 1-3 After virtualization | Type 1

Examples of type-1 hypervisors currently in the market are VMware ESXi, Oracle VM, Microsoft Hyper-V, KVM (Kernel-Based Virtual Machine), and Citrix Hypervisor (formerly known as Xen Server). A type-2 hypervisor, also known as a “hosted hypervisor,” requires an operating system to run. This operating system can be Apple’s Mac OSX, Microsoft Windows, or a Linux distribution. The type-2 hypervisor will then run on top of this operating system, as you can see in Figure 1-4. Because you are hosting a hypervisor and the virtual machines on top of this hypervisor, you need to take good care of your resources. There is a risk that the virtual machines will consume too much compute resources, which will result in your operating system holding everything together, and your overall system will become slow. Examples of type-2 hypervisors currently in the market are VMware Workstation, VMware Fusion, and VirtualBox by Oracle VM.

Figure 1-4 After virtualization | Type 2

Containers Now that you know what a virtual machine is, I can tell you what a container is. A container is another virtualization level where you virtualize on the operating system level (Figure 1-5). The idea is that you can develop and run your applications inside a container, and this container can be moved around quickly and efficiently when necessary, resulting in a boost of scalability. When the operating system, for whatever reason, is slow or not available, a new container can be brought up and used to host your application and your code can be back up and running in seconds. The container engine will take care of the container management part for you. The rule of thumb here is that virtual machines isolate operating

systems by using hypervisors, and containers isolate applications by using container engines (on top of an operating system).

Figure 1-5 Traditional containers

In Figure 1-6, you can see that it is also possible to combine the usage of virtual machines and containers to have the best of both worlds. Here, you can see that the container engine runs on top of the hypervisor, and this container engine can manage containers that are hosted inside different virtual machines.

Figure 1-6 Virtual machines and containers

Storage Virtualization (Software Defined Storage | SDS) With software-defined storage , it is possible to pool different physical storage devices from multiple sources and present it as one volume to the operating system and/or the application.

Network Virtualization (Software Defined Networking | SDN) Software-defined networking (SDN), also often referred to as network virtualization (NV), is a way to abstract physical network resources and deliver network resources and services in a virtualized manner.

Decoupling the network services from the underlying physical network allows a new form of scalability and flexibility. Network services like routing, switching, VPN, and load balancing can now be offered on the software layer instead of the physical layer. This allows you to programmatically (automate) configure most of these services, resulting in less administrative overhead and faster provisioning with fewer errors than when manually provisioning network services.

VMware SDDC VMware’s implementation of the software-defined data center is described in Table 1-1. Table 1-1 VMware Implementation of the Software-Defined Data Center Software-Defined VMware’s Implementation Compute

VMware vSphere (ESXi Server and vCenter Server)

Storage

VMware VSAN

Network

VMware NSX

VMware vSphere VMware vSphere is a collection of two (software) products that let you virtualize the computing layer. These products are VMware ESXi Server and VMware vCenter Server.

VMware ESXi Server (Hypervisor) VMware ESXi Server is VMware’s compute (type-1) hypervisor. ESXi server (a lightweight piece of software) is installed on a physical server, where ESXi is the actual operating system and hypervisor in one (Figure 1-7).

Figure 1-7 VMware ESXi hypervisor

VMware vCenter Server vCenter Server is the management environment/piece of VMware vSphere. With the VMware vCenter Server, it is possible to perform the administration of all the ESXi servers in your environment. The vCenter Server will give you one single pane of glass to centrally manage your full vSphere environment (Figure 1-8). The vCenter Server also enables you to configure virtualization-related features that are related to computing and storage virtualization (VSAN).

Figure 1-8 VMware vCenter Server

VMware Virtual Storage Area Network (VSAN) VSAN is VMware’s implementation of software-defined storage (SDS). VSAN fully integrates with vSphere and acts as one single data store from a virtual perspective, but the storage is built with multiple ESXi hosts that have multiple physical disks installed. The physical disks are all pooled together, and one single data store is presented to the vSphere infrastructure (Figure 1-9).

Figure 1-9 VMware VSAN

NSX-v and NSX-T Finally, NSX is VMware’s implementation of the software-defined network (SDN) (Figure 1-10). NSX is currently available in two flavors— NSX-v (NSX for vSphere) and NSX-T. NSX offers various network services like routing, switching, load balancing, VPN, firewalling, and bridging as its primary services. In addition to these services, NSX also provides more network services, which are discussed later in this book. The next chapters explain NSX-T and guide you through the installation, configuration, and management of NSX-T inside an enterprise infrastructure.

Figure 1-10 VMware NSX

Note NSX-v has been announced to be end of sale/end of support. NSX-T is its successor and is the NSX software that should be used for future deployments.

VMware VCF VMware’s full SDDC software is also available in a bundle where the software versions and interoperability are thoroughly checked/tested if they are fully compatible with the software versions offered. This product is called VMware Cloud Foundation (VCF), and inside VCF, you will find vSphere, VSAN, and NSX as the core (Figure 1-11). There are also other VMware software packages included, but these are out of scope for this book.

Figure 1-11 VMware VCF

Summary This chapter explained the software-defined data center (SDDC) and the different virtualized components from SDDC. It explained what compute, storage, and network virtualization are and the difference between a virtual machine and a container. At the end of the chapter, I showed you VMware’s specific implementation of the SDDC and the VMware products that form the SDDC. The next chapter introduces NSX-T.

© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021 I. Hoogendoorn, Getting Started with NSX-T: Logical Routing and Switching https://doi.org/10.1007/978-1-4842-6708-0_2

2. An Introduction to NSX-T Iwan Hoogendoorn1 (1) Rotterdam, The Netherlands

NSX-T is VMware’s implementation of Software Defined Networking (SDN), as I described in the previous chapter. This chapter explains the NSX-T components and services that can be configured from the full Network Virtualization (NV) stack. It also dives into the main use cases of NSX-T. By the end of this chapter, you will understand how the NSX-T components interact with each other regarding the control, management, and data planes.

NSX-T Features This chapter provides an overview of the features that are currently offered by NSX-T.

Platform-Related Features Table 2-1 provides an overview of NSX-T’s platform-related features. Table 2-1 Platform-Related Features Feature

Description

Policy-driven configuration

NSX-T offers automation by using the REST API that allows simplified policy-driven configuration so that administrators can simplify the design of configurable items.

Multi-hypervisor support

NSX-T currently supports the ESXi and KVM (type-1) hypervisor.

Bare-metal server support

NSX-T (currently) supports bare-metal native compute workloads running on Red Hat Enterprise Linux (RHEL), CentOS, Oracle Linux,

SUSE, Windows Server, and Ubuntu operating systems. Support for VMs and containers

NSX-T supports network and security services on virtual machine workloads and container workloads.

Support for Amazon AWS and Microsoft Azure instances

NSX Cloud supports networking and security for your applications running natively in public clouds.

vCenter Server agnostic With NSX-T, you can configure multiple vCenter Servers as endpoint managers and allow multiple hosts residing in multiple clusters spread across multiple vCenter Servers to be configured as “NSX-T enabled” host transport nodes. HTML 5 UI for management plane

The management of network and security services can be done through an HTML 5 graphical user interface.

Data Plane Development Kit (DPDK)-based NSX edge nodes VM or bare-metal form factor

NSX-T supports the Data Plane Development Kit (DPDK). DPDK collects data plane libraries and network interface controller drivers that offer fast packet processing capabilities. It optimizes the CPU and memory to allow packet processing speed.

Multisite and Federation

Multisite and Federation allows NSX-T to offer stretched network and security services across multiple sites. This simplifies disaster recovery across various locations and allow you to offer unified network and security services accross multiple sites.

Networking-Related Features Table 2-2 provides an overview of NSX-T’s networking-related features. Table 2-2 Networking-Related Features Feature

Description

Overlay or VLANbacked logical switching

Layer 2 segments that are overlay (GENEVE) backed and VLAN backed.

Layer 2 bridging

Layer 2 bridging capabilities where VLAN and overlay (GENEVE) are bridged.

QoS

NSX-T supports QoS capabilities with DSCP and CoS configurable items.

Distributed routing

NSX-T supports distributed routing by injecting (distributed) routing functionality into all the host transport nodes (hypervisors).

Static routing and equal-cost

The NSX-T Tier-0 gateway supports ECMP with the upstream routing devices for egress and ingress traffic. T0 gateways do not have a direct influence on the ECMP of ingress traffic. They

multipath (ECMP) routing

only advertise BGP routes to the upstream routers. The upstream routers need to support ECMP in order for ingress traffic to have multiple paths to different T0 gateways.

Border Gateway Protocol (BGP) support

NSX-T supports the BGP to allow dynamic route exchanges between the same or different autonomous systems (AS). Both iBGP and eBGP are supported.

Duplicate IP NSX-T offers duplicate IP address detection by verifying when a newly discovered detection IP address is present. This check is only done for logical switch ports on the same segment. Bidirectional BFD is a protocol that can detect forwarding path failures and allows fast failover Forwarding in case of any losses. (BFD) for fast convergence Virtual Routing Forwarding (VRF) Lite

NSX-T supports VRF Lite, where multiple separated routing instances can coexist inside a single Tier-0 gateway.

Ethernet Virtual Private Network (EVPN)

NSX-T supports EVPN to extend Layer 2 and Layer 3 connectivity between different data centers. You can advertise all VRF networks on a Tier-0 gateway using Multi-Protocol BGP (MP-BGP) to the provider edge (PE) and still keep the isolation on the data plane with the use of VXLAN encapsulation by using a oneto-one mapping between a VNI and a VRF.

Rate limiting NSX-T supports rate limiting on a segment or a virtual machine to protect against broadcast storms. A rate limit can be set for ingress or egress broadcast and multicast traffic. Layer 3 multicast

NSX-T supports multicast on a Tier-0 gateway and can send the same multicast data to a group of destinations (receivers).

Security and Services-Related Features Table 2-3 provides an overview of the security and services-related features. Table 2-3 Security and Services-Related Features Feature

Description

Gateway firewall

With the NSX-T gateway firewall feature, it is possible to provide north/south firewalling.

Distributed firewall

With the NSX-T distributed firewall feature, it is possible to provide east/west firewalling.

Time-based firewall policy

With NSX-T, it is possible to schedule firewall rules that are only valid and active during specific time ranges.

Network introspection

With network introspection, it is possible to integrate third-party network services (FW or IDS/IPS) from selected third-party vendors that are currently not offered as a feature by NSX-T.

Endpoint protection

With endpoint protection, it is possible to leverage guest introspection to provide (agentless) antivirus and antimalware protection to virtual machines.

Distributed intrusion With the distributed intrusion detection/prevention feature, NSX-T detection/prevention makes it possible to provide IDS/IPS services in a distributed manner, just like routing and firewalling services, without hair pinning the traffic to a separate IDS/IPS hardware appliance. URL analysis

With URL analysis, you will gain insight into the websites that are accessed in your environment.

NSX intelligence

NSX intelligence allows you to have continuous data center-wide visibility to see what type of traffic is running through your NSX-T environment. It provides recommendations on what kind of firewall rules should be implemented.

NAT and NAT64

NAT is supported where IPv4 to IPv4 addresses can be translated. NSX-T also helps NAT64, where you will translate IPv6 to IPv4 addresses and the other way around.

DNS

NSX-T allows you to configure different DNS zones with dedicated DNS forwarders per DNS zone.

DHCP server and relay

NSX-T offered the dynamic allocation of IP addresses by providing DHCP services. When you do not use the local DHCP server, you can specify your DHCP server’s IP address so that the DHCP request can be relayed.

Load balancer

NSX-T supports standalone and distributed load balancing, where network traffic sent to a specific virtual IP address is distributed to different servers in the load balancing server pool based on particular requirements and parameters.

L2 VPN and IPSec VPN

NSX-T supports a Layer 2 VPN extension (L2 VPN) or Layer 3 (IPSec VPN) extension, where you allow network connectivity from a remote site into your network.

Automation-Related Features Table 2-4 provides an overview of NSX-T’s automation-related features. Table 2-4 Automation-Related Features Feature

Description

REST/JSON

NSX-T offers consumption through REST API with JSON scripts.

API support Upstream OpenStack support and partner ecosystem

OpenStack integration is supported together with other partners like Terraform and Ansible, for example.

Tag-based security grouping

NSX-T supports security tags where a virtual machine can receive a tag, and based on this tag, the virtual machine is part of a certain security group. These groups can be using during automation to place virtual machines in correct groups so that they can leverge existing security policies.

Inventory support

With NSX-T, you can create an inventory by creating services, groups, domains, and context profiles.

Operations-Related Features Table 2-5 provides an overview of NSX-T’s operations-related features. Table 2-5 Operations-Related Features Feature

Description

Getting Started wizards

With the Getting Started wizards, it is easy to perform tasks like the preparation of VLAN micro-segmentation clusters, for example.

Dashboards NSX-T offers dashboards where you can see what is configured and verify the overall NSX-T architecture’s health. The Alarm dashboard provides information on all the alarms generated by the NSX-T components. The Network Topology Visualizations dashboard gives a graphical overview of the virtual network infrastructure and can create a PDF that you can export. Role-based With RBAC, it is possible to allow specific roles with specific network access access profiles to have different rights. control (RBAC) Upgrade With the NSX-T upgrade coordinator, it is possible to coordinate the different NSXcoordinator T components from one version to a newer version of NSX-T. Backup and The Backup and Restore feature allows you to perform backups and restores to Restore and from an (FIPS compliant) SFTP server. feature vRealize The integration with vRealize Log Insight allows you to create logging dashboards Log Insight based on specific log messages of NSX-T. The vRealize Log Insight content packs will ensure the log messages generated by NSX-T are all parsed correctly.

Troubleshooting-Related Features

Table 2-6 provides an overview of NSX-T’s troubleshooting-related features. Table 2-6 Troubleshooting-Related Features Feature

Description

IP Flow IPFIX allows the export of flows to another tool like vRealize Network Insight, Information where you will be able to verify all the network flows in and out of your virtual Export NSX-T network infrastructure. (IPFIX) Port mirroring

With port mirroring, it is possible to send the network traffic to a packet-capturing system (software or hardware). This can be useful for troubleshooting purposes.

Traceflow

Traceflow allows you to inject network traffic from a specific source to a particular destination and follow the complete packet flow to see where traffic is dropped, blocked or look at the NSX-T componenents used in the routing path.

Network Topology views

With the NSX-T Network Topology views, it is possible to get an overview of your network’s logical entities. The graphical overview represents your network’s topology, and this will help you verify some parts of your network configuration and troubleshoot when there are errors.

Selective technical support logs

NSX-T supports the export of the technical support logs, where various parts of NSX-T can be exported. That way, you don’t end up with one large log support bundle that is not targeted to a specific part of the NSX-T infrastructure.

Monitoring The NSX-T Manager GUI offers several monitoring dashboards that display details dashboard about the current system, network, security status, and compliance reporting. and statistics Alarms

NSX-T generates alarms to make sure you know about issues with the NSX-T components related to performance or operation.

NSX-T Use Cases NSX-T is a network virtualization platform that offers network and security services. NSX-T is part of the Virtual Cloud Network (VCN), a software-defined approach that extends the network across data centers, multiple clouds, and various application tiers and frameworks. NSX-T brings networking and security together by supporting multiple hypervisors, containers, bare-metal servers, and public clouds. The main use cases for NSX-T are described in the upcoming sections.

Security Use Cases In this section, the NSX-T security use cases are discussed.

Micro-Segmentation With micro-segmentation, it is possible to achieve Zero-Trust security within your virtual network. Micro-segmentation is only one part of the Zero-Trust security model. Zero-Trust also includes Multi Factor Authenthentication (MFA), least-privilege access, etc. NSX-T has possibilities where you can use a distributed firewall that offers granular protection with centralized management.

DMZ Anywhere With DMZ Anywhere, it is possible to host your DMZ workloads on the same underlying or distributed hardware as you would host your other production workloads on. You would host your production workloads with the another security policy dedicated to your DMZ environment. The security policy is enforced at the VM level or VM group level. This way, you can create a DMZ security group and apply the security policy to this specific group. While your production workloads and DMZ workloads are mixed and hosted on the same hardware they are still protected with its own security policy.

Secure End-User Security can be done on several layers in the network. The “secure enduser use case” allows access to network resources based on a unique username and password. When a user logs in to the (virtual) desktop with Active Directory credentials, it is possible to apply a security policy based on this user, which determines what this user can and cannot access on the network.

Automation Use Cases In this section, the IT automation use cases are discussed.

Automating IT With the NSX-T REST API, it is possible to automate Create, Read, Update, and Delete (CRUD) operations of the network and security services offered by NSX-T. The automation can be done using the native REST API

of NSX-T or using automation providers like Ansible, Terraform, PowerCLI, or any other scripting language that supports the communication toward the NSX-T REST API. VMware also offers native integration, with its automation software components vRealize Orchestrator (vRO) and vRealize Automation (vRA).

Developer Cloud Developers typically require resources on-demand. NSX-T can spin up network and security services in sub-seconds and tear them down if they are no longer needed. It is also possible to provide networking and security via OpenStack, where CI/CD environments can also mirror the production environment.

Multi-Tenant Infrastructure With multi-tenancy, you can automatically spin off complete network topologies that are dedicated to different tenants. With Network Address Translation (NAT), you can even use overlapping IP space between the tenants. With the developer cloud use case, this can be useful if a developer or a group of developers require an immediate full development environment with multiple network tiers. Another way to create multi-tenant networks is to use Virtual Routing Forwarding (VRF) features offered by NSX-T.

Multi-Cloud Networking Use Cases In this section, the application continuity use cases are discussed.

Disaster Recovery When (parts of) the data center are unavailable due to failing components, the network and security services provided by NSX-T can be deployed redundantly, so that workloads are migrated or restarted in another location (data center). The network and security services are also automatically moved with that workload, making disaster recovery of the related network and security services possible.

Metro Data Center Pooling With the capability of having stretched vSphere clusters across multiple sites, NSX-T also offers to join this stretch by extending the network and

security services provided by NSX-T.

Hybrid Cloud Networking NSX-T is also available where VMware is offered as a solution in the public cloud. VMC on AWS, Azure VMware Solution, Google Cloud VMware Engine, or any other public cloud where VMware is offered can be seen as examples of this. In every cloud that you place your workloads, the configuration and management look and feel will be the same.

Cloud-Native Applications With NSX-T, it is possible to deliver an integrated full-stack network and security for your containerized applications and microservices. You can run cloud-native applications with granular, per-container policies, native container-to-container Layer 3 networking, micro-segmentation, and load balancing for microservices. You can also monitor and manage policies across traditional and modern applications.

High-Level Architecture of NSX-T This section explains NSX-T’s high-level architecture.

NSX-T Management and Control Planes The NSX-T Manager provides collapsed policy manager, management, and central control services within a cluster of three NSX-T Manager nodes. The NSX-T Manager nodes are responsible for the management plane cluster services and API services. (It is also possible to use one NSX-T Manager node, but this will reduce availability and recoverability). NSX-T separates the management, control, and data planes (Figure 21). These planes are deployed by collecting processes, modules, and agents located on two types of nodes—the NSX-T Manager and the NSXT transport nodes. The transport nodes can be split into two types: the host transport nodes (ESXi, KVM) and the edge transport nodes (virtual or bare metal). This book covers ESXi hosts and virtual and bare metal edges.

Figure 2-1 High-level NSX-T architecture

NSX-T Management Cluster The NSX-T Manager is the first piece of software that you start with when you deploy NSX-T in your environment. When you deploy the NSX-T Manager, you get the choice of selecting a size. The size can be small, medium, or large. The small appliance should only be used for lab or Proof of Concept (PoC) environments. The medium appliance supports up to 64 host transport nodes. The large appliance is for enterprise deployments with large-scale environments. The NSX-T Manager cluster exists out of three NSX-T Manager nodes (Figure 2-2). The reason that there are three is to enhance availability and recoverability.

Figure 2-2 Software-defined data center components

NSX-T Management Cluster with an Internal Virtual IP Address (VIP) The NSX-T cluster has the option to configure a VIP address across the three NSX-T Manager nodes. When the NSX-T Manager VIP address option is chosen, the IP address on the NSX-T Manager needs to be in the same subnet. The following example shows you that all the NSX-T Manager IP addresses, including the VIP address, are in the 10.10.10.0/24 network. Incoming requests of traffic are not balanced across managers while using the VIP. The NSX-T cluster will choose one NSX-T Manager to be the “leader manager.” Once the leader has been selected, the NSX-T VIP address will forward all the traffic to the leader. The other two NSX-T Managers will not receive an incoming query, as the VIP IP address

entirely handles the management communication. If the leader fails, the two remaining managers elect a new leader. The new leader responds to the requests sent to that virtual IP address. When the leader node that owns the VIP IP address fails, a new leader is chosen. This new leader will send out a Gracious Address Resolution Protocol (GARP) request to take ownership of the VIP. The new NSX-T leader node then receives the new API and GUI requests from the users. Even though the NSX-T Manager cluster uses a VIP IP address, each individual NSX-T Manager can still be accessed directly. It does not matter which NSX-T Manager you access; the database is consistent between all of the NSX-T Manager Nodes. Figure 2-3 shows that a single IP address (the virtual IP address) is always used to access the NSX management cluster from the administrator’s perspective.

Figure 2-3 NSX-T Manager cluster with NSX-T Manager VIP IP address

NSX-T Management Cluster with a Load Balancer Offering the Virtual IP Address (VIP) To balance the load between the NSX-T Managers inside the NSX-T Manager cluster, you must use a load balancer. This load balancer can be

either a third-party load balancer or an NSX load balancer from a different NSX-T deployment. When you terminate the VIP IP address that the load balancer is now hosted on, you can place each NSX-T Manager node in different subnets. The example in Figure 2-4 shows you that all the NSX-T Manager IP addresses are in the 10.10.20.0/24, 10.10.30.0/24, and 10.10.40.0/24 network. Each NSX-T Manager node will be active, and the traffic that is loadbalanced by the load balancer will send traffic to multiple nodes based on the load-balancing algorithm.

Figure 2-4 NSX-T Manager cluster with a load balancer VIP IP address

NSX-T Policy Role The NSX-T policy role that is embedded inside the NSX-T Manager cluster members is a central location where the configuration of network and security is done across the NSX-T system (Figure 2-5). The policy role allows you to enter the configuration required into the NSX GUI. The NSX-T policy role also enables you to specify the desired state as final, without taking the current configuration state into account.

Figure 2-5 NSX-T policy role

The policy is deployed in each NSX-T Manager node and is the only supported point of entry. The mapping between the policy and manager is always a one-to-one mapping. Figure 2-6 shows the communication flow, where the policy is the first point of entry when a configuration change is made, and this is sent to the manager (role). Eventually, the manager enforces the change through the control plane (consisting of the central control plane (CCP) and local control plane (LCP)).

Figure 2-6 Policy communication characteristics

NSX-T Manager Role The NSX-T Manager role that is embedded inside the NSX-T Manager cluster members receives the configuration from the policy and validates the configuration. When the check is done, the configuration is sent to the configuration of the Central Control Plane (CCP). The manager role is also part of the data plane components and retrieves statistical data. The NSX-T Manager role also retrieves statistical data from the data plane components and takes care of the installation and preparation of the data plane components (Figure 2-7).

Figure 2-7 NSX-T Manager role

NSX-T Policy and NSX-T Manager Interaction Before the NSX-T Manager verifies the configuration that is retrieved from the policy, the first entry point is the “reverse proxy” with authentication and authorization capabilities. The NSX-T Policy Manager and Proton are internal web applications that communicate with each other through the HTTP protocol. The configuration is entered into the policy, and the policy sends the configuration into the manager role, where the “Proton component” sits. The Proton component is “the” core component of the NSX-T Manager node. The Proton manager has several functionalities, like logical switching (through segments), logical routing, distributed firewalling, and much more. Figure 2-8 shows you that the NSX-T policy and the Proton component both store persistent data in the CorfuDB. CorfuDB is a persistent in-memory object-store. Persistence is achieved by writing each transaction in a shared transaction log file. Queries are served from memory and provide better performance and scalability.

Figure 2-8 The NSX-T policy and Proton component store persistent data in the CorfuDB

NSX-T Controller Role The NSX-T controller maintains the current state of the NSX-T system and configures the data plane components. The NSX-T controller role (Figure 2-9) is providing control plane functionality for features such as logical switching, logical routing, and the distributed firewall. The controller computes all the ephemeral runtime states based on the configuration provided by the management role/plane. When information is received about the NSX-T infra, the controller processes this information and pushes it from the stateless configuration to the forwarding engines.

Figure 2-9 NSX-T controller role

NSX-T Control Plane Components The control plane is divided into two components: the Central Control Plane (CCP) and the Local Control Plane (LCP). The CCP exists inside the NSX-T Manager nodes, and the LCP resides in the transport nodes (host and edge) (Figure 2-10).

Figure 2-10 CCP and LCP

The NSX-RPC messaging protocol is a messaging solution for all communication between the management plane, CCP, and data plane. Remote procedure call (RPC) is a protocol that one program can use to request a service from another program on another computer, without having to understand the network’ s details.

NSX-T Control Plane Information Exchange The LCP on the transport nodes reports local runtime changes to its master CCP node. The master CCP nodes receive the changes and propagate these changes to the other (CCP) controllers in the cluster. All controllers responsible for a specific transport node then propagate these changes to the appropriate transport nodes (Figure 2-11).

Figure 2-11 CCP and LCP data exchange

NSX-T Control Plane Sharding The CCP uses a so-called sharding mechanism to split the transport nodes responsibility across the controllers. Each transport node is assigned to a specific controller to distribute segment, logical route, and distributed firewall information. Each CCP in the cluster will receive uploads through the management plane, but will only maintain the information for the transport nodes that it is responsible for.

Figure 2-12 NSX-T controller sharding

NSX-T Controller Failure Figure 2-13 shows NSX-T Manager node 3 with the controller functionality is failing. Note that three controllers are running in the cluster and they are assigned to 12 transport nodes each. When this controller fails (controller 3), this node’s responsibility is moved to the remaining NSX-T Managers with controller 1 and controller 2.

Figure 2-13 NSX-T controller sharding with one node failing

NSX-T Data Plane The NSX-T data plane has various components, responsibilities, and functions. The NSX-T Edge, ESXi Host, KVM host are considered data plane endpoints. The data plane also contains workloads such as virtual machines (VMs), containers, and applications running on bare metal servers. The data plane also forwards data plane traffic between the endpoints. The data plane implements the actual configuration of logical switching, distributed and centralized routing, and the (distributed) firewall security policy (Figure 2-14).

Figure 2-14 Private and public cloud data plane components

NSX-T Data Plane Functions The data plane forwards packets based on the configuration that was enforced by the CCP and propagates topology information to the overall control plane components. The data plane maintains the availability and health between multiple links and overlay (GENEVE) tunnels and will initiate a failover when this is required.

NSX-T Data Plane Components The data plane components (Table 2-7) or transport nodes can be split into two types—host transport nodes and edge transport nodes—as shown in Figure 2-15.

Figure 2-15 Private cloud data plane components Table 2-7 Data Plane Components Components Description Hypervisor transport node

The hypervisor transport node acts as a forwarding plane for the virtual machine network traffic. There is support for ESXi and KVM hypervisors. For packet forwarding, ESXi uses the N-VDS or VDS v7, depending on the hypervisor version, and KVM uses Open vSwitch.

Bare metal transport node

Transport nodes are responsible for hosting Tier-0 and Tier-1 gateways with their corresponding network and security services.

Edge VM transport node

These are virtual machine based transport nodes responsible for hosting Tier-0 and Tier-1 gateways with their corresponding network and security services.

NSX-T edge (cluster)

Support for edge VMs and bare metal edges.

There is support for Linux and Windows workloads running on a bare metal server and container workloads running on a bare metal server without a hypervisor.

The cluster provides stateful and gateway services.

NSX-T Data Plane Information Exchange The NSX-T Manager management plane communicates with the transport nodes using the Appliance Proxy Hub (APH) server over NSXRPC/TCP port 1234 (Figure 2-16).

Figure 2-16 Data plane communication channels

The CCP communicates with the transport nodes by using the APH server over NSX-RPC/TCP port 1235. TnProxy (NSX-Proxy) on the transport node receives the NSX-RPC messages from the NSX-T Manager and CCP.

Summary In this chapter, you learned about the primary use cases for NSX-T and which components form the NSX-T network virtualization infrastructure in the control, management, and data planes. In the next chapter, I explain how the NSX-T components are installed.

© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021 I. Hoogendoorn, Getting Started with NSX-T: Logical Routing and Switching https://doi.org/10.1007/978-1-4842-6708-0_3

3. NSX-T Installation in a vSphere Environment Iwan Hoogendoorn1 (1) Rotterdam, The Netherlands

This chapter explains the installation steps of the NSX-T Manager on a vSphere infrastructure. Once the NSX-T Manager cluster is fully installed in a clustered form, you will also see how to replace the self-signed certificates, navigate through the GUI, and perform some basic CLI commands.

Implementation Steps Deploying the management node, configuring the host transport nodes, and installing and configuring the edge transport nodes can all be done using the following nine steps: Table 3-1 Deployment steps to deploy the NSX-T Manager node Step Description 1

Deploy the NSX-T Manager OVF template on a vSphere environment.

2

Log in to the NSX-T Manager GUI.

3

Add the license key.

4

Register at least one vCenter Server (end point manager) to the NSX-T Manager.1

5

Deploy two additional NSX-T Manager nodes to form a three-node NSX-T Manager cluster.

6

Configure the host and edge transport node prerequisites.

7

Prepare the ESXi host transport nodes (with NSX-T bits).

8

Deploy the NSX-T edge transport nodes (bare metal or VM form factor).

9

Create an NSX-T edge cluster and add the required edge transport nodes to the cluster.

Deployment Considerations The NSX-T Manager can be deployed on a standalone ESXi host or an ESXi host managed by a vCenter Server. It is also possible to use supporting (automation) tools, whereby you will be able to script the deployment with Ansible, for example.

Node Size The NSX-T Manager can be deployed in various form factors. These form factors are extra small, small, medium, and large (Figure 3-1).

Figure 3-1 NSX-T deployment sizes

The extra small appliance only deploys the NSX Cloud Service Manager (CSM) role and will not deploy the full NSX-T Manager with all the required roles. The small appliance should only be deployed in a Proof of Concept (PoC) environment. The medium appliance can be used in a production environment up to 64 ESXi host transport nodes. The large appliance can be used in a large production environment with more than 64 ESXi host transport nodes. The current version of NSX-T 3.x.x supports (at the time of writing) up to 1024 ESXi host transport nodes, but is subject to change in other versions. The latest information on configuration maximums can be found at

https://configmax.vmware.com/.

Deploying the First NSX-T Manager Using the OVF Template The first NSX-T Manager node needs to be deployed by deploying an OVA/OVF file from the vCenter Server or ESXi host. Figures 3-2 through 3-8 show the steps needed to deploy the NSX-T Manager OVA using the vCenter Server deployment method.

Figure 3-2 Select the NSX-T Manager OVA template

Figure 3-3 Specify a virtual machine

Figure 3-4 Select the compute resources you want to deploy the NSX-T Manager node virtual machine on

Figure 3-5 Review the details

Figure 3-6 Select a deployment size

Figure 3-7 Select the data store resources you want to deploy the NSX-T Manager node virtual machine on

Figure 3-8 Select the network port group with the network (VLAN) you want to deploy the NSXT Manager node virtual machine on

The next step is to customize the OVA template with some mandatory parameters to finish the deployment (Figure 3-9). The required parameters that you need to specify are listed in Table 3-2.

Figure 3-9 Customize the OVA template Table 3-2 NSX-T Deployment Parameters for the “Customize Template” Installation Step (Figure 3-9) Parameter Type

Parameter

Root user password

VMware!VMware!

Admin user password

VMware!VMware!

Audit user password

VMware!VMware!

Hostname

NSXT-MANAGER

Rolename

NSX Manager

Default IPv4 gateway

10.203.100.1

Management network IPv4 address 10.203.100.12 Management network netmask

255.255.255.0

DNS server list

10.203.0.5

Domain search list

home.local

NTP server list

10.203.0.5

Enable SSH

Yes

Allow root SSH logins

Yes

The final two steps are to verify in the summary that all the deployment parameters are correct (Figure 3-10) and then power on the NSX-T Manager node virtual machine (Figure 3-11).

Figure 3-10 Verify in the summary if all the deployment parameters are correct

Figure 3-11 Power on the NSX-T Manager node virtual machine after the deployment is successful

Accessing the NSX-T Manager The NSX-T Manager can be accessed after deployment (and powerd on) through the GUI, Command Line Interface (CLI), or the API.

Graphical User Interface (GUI) To access the NSX-T GUI, you simply browse to the Fully Qualified Domain Name (FQDN) or the IP address using a supported browser with the HTTPS protocol on port 443. I typically use Google Chrome to do this. You will be presented with a screen to specify your login credentials (Figure 3-12).

Figure 3-12 Specify the username and password to log in to the NSX-T Manager node

When you log in the first time, you will be presented with an end user

agreement (Figure 3-13) that you need to accept before you can start.

Figure 3-13 Accept the end user license agreement

You can optionally join the VMware customer experience program (CEIP) (Figure 3-14).

Figure 3-14 Join the customer experience program (CEIP)

Eventually, you will make it to the Welcome screen (Figure 3-15) with some information that may be useful to you.

Figure 3-15 NSX-T welcome screen

You can choose the “Don’t Show This Again” check box so the next time you log in, you won’t see the welcome screen.

Command Line Interface (CLI) The NSX-T CLI can be accessed using the vSphere Virtual Machine Console or by using SSH. The easiest way is using SSH, but you need to make sure SSH is enabled when you deploy the OVF file. When SSH is not enabled, you can enable SSH with the following commands. When SSH is not enabled, you can enable it (from the console) with this command: NSXT-MANAGER> start service ssh To make sure that SSH is started on boot as well, issue this command: NSXT-MANAGER> set service ssh start-on-boot To verify if SSH is enabled, issue this command: NSXT-MANAGER> get service ssh

When SSH is enabled, you can use any SSH client to set up an SSH session to access the NSX-T Managers Command Line Interface (CLI). My preferred client is Royal TSX and it can be downloaded at https://www.royalapps.com/. You can use the admin user credentials specified during the deployment of the OVF file. Once you’re logged in, you will be able to use the “list” command to view all available commands. login as: admin [email protected]'s password: NSX CLI (Manager, Policy, Controller 3.0.0.0.0.15946739). Press ? for command list or enter: help pod-230-nsxt-lm> pod-230-nsxt-lm> get version VMware NSX Software, Version 3.0.0.0.0.15946739 Product Version 3.0.0.0.0.15946738 Technical Support: http://www.vmware.com/support.html Copyright © 2014-2020 VMware, Inc. All rights reserved. This product is protected by copyright and intellectual property laws in the United States and other countries as well as by international treaties. VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents. VMware is a registered trademark or trademark of VMware, Inc. in the United States and other jurisdictions. All other marks

and names mentioned herein may be trademarks of their respective companies. pod-230-nsxt-lm> Note The NSX-T CLI is also available on the edge and host transport nodes.

Advanced Programming Interface (API) The NSX-T API can perform specific tasks that cannot be done through the GUI or CLI. The API can also automate particular tasks by either using the native API or API wrappers like PowerCLI, Terraform, and Ansible. The API accepts requests on TCP port 443 over HTTP. The API commands listed in Table 3-3 can be used to perform actions. Table 3-3 NSX-T API Commands API Command

Action

HTTP GET

Read and retrieve objects

HTTP PUT, PATCH, POST Create and update objects HTTP DELETE

Create objects

Adding the NSX-T License Before you can fully use NSX-T, you need to add a valid license to the system. It is also possible to use the 30-day trial option when you don’t have a valid license. You can add the license by choosing System ➤ Licenses ➤ Add. You will see a screen like the one shown in Figure 3-16, where you can add your license.

Figure 3-16 Add an NSX-T license key

When you add a valid license, it will be displayed in the License section (Figure 3-17).

Figure 3-17 License key summary

vCenter Server Registration with the NSX-T Manager To start the host preparation (making an ESXi host an NSX host transport node and completing the NSX-T node cluster build), you need to add a Compute Manager. The Compute Manager will enable you to perform specific tasks that require vSphere Clusters. One of these tasks is to finish the NSX-T node cluster build, and another task is the host installation with NSX-T bits. To add a Compute Manager (Figure 3-18), choose System ➤ Fabric ➤ Compute Managers ➤ Add.

Figure 3-18 Add a new compute manager

When you click Add, you will be presented with a question to add the certificate thumbprint, and you need to accept this. When you add a Compute Manager, it will be displayed in the Compute Managers section (Figure 3-19).

Figure 3-19 Compute Manager summary

You can then click the compute manager name for more details, and you will be presented with more information (Figure 3-20).

Figure 3-20 Compute Manager details

Completing the NSX-T Manager Cluster When the first NSX-T Manager node is fully deployed (Figure 3-21), we need to deploy two additional nodes. These two other nodes will improve the availability, recoverability, and performance of your deployment. You can add the new nodes by choosing System ➤ Appliances ➤ Add NSX Appliance.

Figure 3-21 NSX-T Manager appliance summary

When you deploy a new appliance, you need to use the parameters described in Table 3-4. Table 3-4 Parameters Required to Deploy a New NSX Manager Appliance Parameter

Description

Example

Hostname

The hostname or FQDN of the new appliance.

pod-230-nsxt-lm

Management The management IP address and netmask of the new IP/netmask appliance.

10.203.230.8/24

Management The default gateway of the Management IP address. gateway

10.203.230.1

DNS servers The DNS servers that will offer DNS services to the appliance.

10.11.11.201

NTP servers The NTP servers that will offer NTP services to the appliance.

10.11.11.202

Node size

The node sizes to choose from are small, medium, and large.

Medium

Compute Manager

The Compute Manager should be configured before you install a second or third appliance from the GUI. Make sure the Compute Manager is configured with the corresponding vSphere Clusters to deploy the new

vcsa01.lab.local

appliance on. Compute cluster

The vSphere Cluster where you want to deploy the new appliance on.

Edge

Resource pool

Optional: The resource pool where you want to deploy the N/A new appliance.

Host

Optional: The host where you want to deploy the new appliance

N/A

Data store

The data store where you want to deploy the new appliance.

N/A

Virtual disk format

The virtual disk format can be either Thin or Thick.

This provision

Network

The port group where the management IP address needs to be configured.

Management

Enable SSH

Toggle to either enable SSH or leave SSH disabled.

Yes

Enable root access

Toggle to either enable root access or leave root access disabled.

Yes

System root credentials

Provide a password for the root user.

VMware1!VMware1!

Admin CLI credentials

Provide a password for the admin user.

Check the Same as Root Password box.

Audit CLI credentials

Provide a password for the audit user.

Check the Same as Root Password box.

When the second NSX-T Manager node is deployed, you can watch the progress on the Appliances summary screen (Figure 3-22).

Figure 3-22 Deployment progress of the second NSX-T Manager node

When the second appliance is fully deployed and the cluster is marked as stable, you can deploy the third appliance. Note Each NSX-T Manager appliance will reserve the required CPU and RAM resources on the vSphere host/cluster. The cluster services is divided into the following (service) group types: DATASTORE CLUSTER_BOOT_MANAGER CONTROLLER MANAGER POLICY HTTPS MONITORING IDPS_REPORTING CORFU_NONCONFIG The following output shows a snippet of the cluster status for the MANAGER group type. pod-230-nsxt-lm> get cluster status

Cluster Id: fbe45219-0fbe-474c-bad8-7e09fd68316a Overall Status: STABLE

Group Type: MANAGER Group Status: STABLE Members: UUID FQDN ba6e4d56-1848-459f-35bc-d6784d52f0b6 pod-230-nsxt7f9445e3-ba64-47ae-ab80-68104055ebe8 pod-230-nsxta0ffd4d0-0c16-46f7-9af1-5063f4f21c3c pod-230-nsxt pod-230-nsxt-lm>

Changing the NSX-T Node Size After the Deployment of the Full Cluster When you have a full cluster running with a “small” form factor and you need to move it to “medium,” this can be done by deploying a new node with a new size (Figure 3-23). This node will be added to the cluster automatically. When the full deployment is done, you can remove one of the smaller size appliances. You need to repeat this process until there are no small appliances left and you only have medium-sized appliances in the cluster.

Figure 3-23 NSX-T Manager node size options

Configure NSX-T Manager Virtual IP Address (VIP) The NSX-T Manager’s virtual IP address is an IP address shared across the NSX-T Manager nodes that are part of the NSX-T Manager cluster.

When you configure the virtual IP address (Figure 3-24), one of the prerequisites is that all the NSX-T Manager appliances’ IP addresses need to be in the same layer 2 subnet, including the virtual IP address (Figure 3-25).

Figure 3-24 Set the NSX-T Manager node virtual IP

Figure 3-25 NSX-T Manager node cluster with the VIP IP configured

You can configure the VIP by choosing System ➤ Appliances ➤ Set Virtual IP. The virtual IP address will select one of the NSX-T Manager appliances marked as the “leader node” and will forward all GUI and API related requests to that leader node.

In Figure 3-26, the VIP is 10.10.10.10, and the leader node is configured with IP address 10.10.10.11. This means that all the HTTPS and REST API calls will be forwarded to that node, and the other two will not get any requests unless they are directly referenced with their configured IP address.

Figure 3-26 NSX-T Manager node cluster with the VIP IP and leader node selected

The following output shows a snippet of the cluster status, where you can verify what NSX-T Manager the leader node is.

pod-230-nsxt-lm> get cluster status verbose Cluster Id: fbe45219-0fbe-474c-bad8-7e09fd68316a Overall Status: STABLE

Members: UUID FQDN ba6e4d56-1848-459f-35bc-d6784d52f0b6 pod-230-nsxt-l 7f9445e3-ba64-47ae-ab80-68104055ebe8 pod-230-nsxt-l a0ffd4d0-0c16-46f7-9af1-5063f4f21c3c pod-230-nsxt-l Leaders: SERVICE LEADER VERSION ActivityPurger ba6e4d56-1848NSGROUP_MEMBER_CACHE_SERVICE ba6e4d56-1848NSGROUP_MEMBER_EVENT_PURGE_SERVICE ba6e4d56-1848-

PROTON_INTELLIGENCE_AGENT_SERVICE INTELLIGENCE_CERTIFICATE_SERVICE ShardingMaster DISCOVERY_MESSAGING_SINGLETON_SERVICE TraceflowCacheManager SERVICE_CONFIG_PRECEDENCE_PROVIDER CentralConfigProcessor AutoHealthCheckMonitor QuartzSetupServiceImpl INTELLIGENCE_ALARM_SERVICE CoordinatedDataCollectionLeader RealizationServiceMaintenanceManager StateSyncShardingCoordinator ManualHealthCheckMonitor HealthCheckRequestMonitor

pod-230-nsxt-lm>

ba6e4d56-1848ba6e4d56-1848ba6e4d56-1848ba6e4d56-1848ba6e4d56-1848ba6e4d56-1848ba6e4d56-1848ba6e4d56-1848ba6e4d56-1848ba6e4d56-1848ba6e4d56-1848ba6e4d56-1848ba6e4d56-1848ba6e4d56-1848ba6e4d56-1848-

This output also shows you that there can be a different leader for a specific service.

Replacing the Self-Signed Certificates Each NSX-T Manager node deployment will provide you with a selfsigned SSL certificate by default. You can replace this self-signed certificate with a certificate that is signed by a Certificate Authority (CA).

Generate Certificate Signing Request (CSR) To replace the certificates, you first need to generate a new Certificate Signing Request (CSR) . Once these CSRs are ready, you need to make sure you will get them signed by the Certificate Authority. For NSX-T, you need to generate four CSRs (Table 3-5) for each NSX-T Manager Node FQDN and the virtual IP FQDN. You can also choose to only create a CSR for the VIP. Table 3-5 NSX-T Deployment Parameters Item

FQDN

CSR Details

NSX-T

nsxt-

'Country' = US

Node 1 01.lab.local

'Name' = CA 'Organization' = VMware 'OrgUnit' = VMware Engineering 'State' = California 'Locality' = Palo Alto 'IPAddress' [optional] = 10.10.10.11 'Email' [Default value : [email protected]] = [email protected] 'Hostname' = nsxt-01.lab.local

NSX-T nsxtNode 2 02.lab.local

'Country' = US 'Name' = CA 'Organization' = VMware 'OrgUnit' = VMware Engineering 'State' = California 'Locality' = Palo Alto 'IPAddress' [optional] = 10.10.10.12 'Email' [Default value : [email protected]] : [email protected] 'Hostname' = nsxt-02.lab.local

NSX-T nsxtNode 3 03.lab.local

'Country' = US 'Name' = CA 'Organization' = VMware 'OrgUnit' = VMware Engineering 'State' = California 'Locality' = Palo Alto 'IPAddress' [optional] = 10.10.10.13 'Email' [Default value : [email protected]] : [email protected] 'Hostname' = nsxt-03.lab.local

NSX-T VIP

nsxt'Country' = US vip.lab.local 'Name' = CA 'Organization' = VMware 'OrgUnit' = VMware Engineering 'State' = California 'Locality' = Palo Alto 'IPAddress' [optional] = 10.10.10.10 'Email' [Default value : [email protected]] : [email protected]

'Hostname' = nsxt-vip.lab.local,nsxt01.lab.local,nsxt-02. lab.local,nsxt03.lab.local

Note It is also possible to only create a single CSR with all four subject alternative names (SANs) included and get this signed and apply it to all NSX-T Manager nodes and the VIP. Note For the VIP FQDN, you will need to create a CSR with subject alternative names (SANs), where you also include the FQDNs for all other NSX-T Manager nodes.

Signing the CSRs When the certificate authority has signed the CSRs, you will need to receive three files back: The signed certificate (for each FQDN) The key file associated with the signed certificate (for each FQDN) The root CA (one single file) When you have received these files, you are ready to import them into the NSX-T Manager nodes.

Import the Certificate You can use any NSX-T Manager node to import the signed certificate content, the corresponding keys, and the CA root certificate. You can import the new certificates by choosing System ➤ Certificates ➤ Import. When you click Import, you can either import a normal certificate or import a (root) CA certificate. The first step is to import the root CA certificate (Figure 3-27).

Figure 3-27 Import the CA certificate

After you import the root CA certificate, you can import the other certificates with the corresponding keys in any order you want. When you import the actual certificate, it is essential that you not only upload the (Base 64) ASCII text of the FQDN, but also add the ASCII text of the root certificate (the whole path needs to be included). This example is for the nsxt-01.lab.local FQDN: -----BEGIN CERTIFICATE---- -----END CERTIFICATE---------BEGIN CERTIFICATE---- -----END CERTIFICATE----It is essential to add the root CA certificate after the FQDN enabled certificate and paste the full ASCII content into the certificate field when you import the certificate (Figure 3-28).

Figure 3-28 Import the certificate

Repeat this process for all four certificates you have. Note Make sure you import the certificate ASCII data signed by the CA and not the actual CSR ASCII data.

Select the New CA-Signed Certificate Once all the certificates are imported, you need to tell the NSX-T Manager to use the newly imported certificate. This can be done with REST API calls, where you need to reference the certificate ID. The certificate ID can be found in the column next to the actual certificate name you just imported. You must execute the REST API call toward the NSX-T Manager node you have signed the certificate for, for that specific FQDN. In Table 3-6, you will find a sample REST API call for each node to

verify if your imported certificates are valid. Table 3-6 NSX-T Deployment Parameters Node

FQDN

API Call Sample

nsxtNSXManager 01.lab.local Node 1

GET: https://nsxt-vip.lab.local/api/v1/trustmanagement/certificates/e59840b6-c73c-4fbfa780-1c5adbb3fc6b?action=validate

nsxtNSXManager 02.lab.local Node 2

GET: https://nsxt-vip.lab.local/api/v1/trustmanagement/certificates/95a7491e-2b41-49e0a924-e554ca798536?action=validate

nsxtNSXManager 03.lab.local Node 3

GET: https://nsxt-vip.lab.local/api/v1/trustmanagement/certificates/ec3dd6e3-fc55-45e48dae-f494e6085858?action=validate

NSX-T VIP

nsxtGET: https://nsxt-vip.lab.local/api/v1/trustvip.lab.local management/certificates/be71271e-cf6c-44bf9597-8214891c64e2?action=validate

When the certificate is valid, you should receive the following status: { "status": "OK" } When all the certificates are valid and give back a status of "OK", it is time to apply the certificates to the actual NSX-T Manager node. You do this with the sample REST API calls provided in Table 3-7. Table 3-7 NSX-T Deployment Parameters Node

FQDN

API Call Sample

nsxtNSXManager 01.lab.local Node 1

POST:

nsxtNSXManager 02.lab.local Node 2

POST:

nsxtNSXManager 03.lab.local Node 3

POST:

https://nsxt-01.lab.local/api/v1/node/services/http? action=apply_certificate&certificate_id=e59840b6-c73c4fbf-a780-1c5adbb3fc6b https:// nsxt-02.lab.local/api/v1/node/services/http? action=apply_certificate&certificate_id=95a7491e-2b4149e0-a924-e554ca798536 https://pod-230-nsxt-

03.lab.local/api/v1/node/services/http? action=apply_certificate&certificate_id=ec3dd6e3-fc5545e4-8dae-f494e6085858 NSX-T VIP

nsxtPOST: vip.lab.local https:// nsxt-vip.lab.local/api/v1/cluster/apicertificate? action=set_cluster_certificate&certificate_id=be71271ecf6c-44bf-9597-8214891c64e2

Note If the certificates are valid in the verification check, you don’t need to provide the specific NSX-T URL in the REST API call. When you apply the certificates, you need to provide the particular NSX-T URL that corresponds to the signed certificate FQDN (just like the previous example shows).

Verification When you have successfully applied the certificates, you can open a web browser and browse one of the FQDNs that you imported for the certificates. When you click the lock, you can see that your connection is secure with the certificate details (Figure 3-29).

Figure 3-29 Verify if the connection to your NSX-T Manager is using the certificate to provide a secure connection

NSX-T GUI Navigation Now that we have our management plane ready, it is time to look at the NSX-T GUI in more detail.

Policy vs. Manager We can configure NSX-T resources/objects through the Policy mode or Manager mode when we use the GUI. Policy mode is currently the recommended mode and is the default mode for new installations. The Policy mode is also supported in the NSX-T Global Manager (with the federation configuration items). The downside of Policy mode is that it currently does not support some features. Those features need to be configured in Manager mode. Manager mode supports all types of configurations. Policy mode is also called “Simple” mode, and Manager mode is often called “Advanced” mode.” You can change between the two modes by clicking the Policy or Manager button in the upper-right corner of the GUI (Figure 3-30).

Figure 3-30 NSX-T Manager Policy/Manager toggle button

When you don’t want users to have the option to toggle between the two modes, you can set the default to Policy and remove the toggle option from the GUI. You can do this by choosing System ➤ User Interface Settings.

Figure 3-31 User interface settings

Networking Tab When you click the Networking tab, you will configure all routing and switching related features, including some Layer 3 services like NAT, VPN, load balancing, as well as much more. The Networking tab can be viewed in Policy mode (Figure 3-32) and in Manager mode (Figure 3-33).

Figure 3-32 Networking tab in Policy mode

When you create objects in Policy mode, the Manager mode will display them, but you will not be able to edit these objects.

Figure 3-33 Networking tab in Manager mode

When you look closer at the “Logical Switches” (these are called “Segments” in Policy mode), you see that there is a symbol in front of the name that indicates that this object is created in Policy mode. Therefore,

this is a protected object (Figure 3-34).

Figure 3-34 Protected object

Security Tab The Security tab is used when you want to create firewall rules and endpoint security policies. The security content looks different when using the two different modes—Policy mode (Figure 3-35) vs. Manager mode (Figure 3-36).

Figure 3-35 Security tab in Policy mode

Figure 3-36 Security tab in Manager mode

Inventory Tab

When you click the Inventory tab, you can find information about Groups, Virtual Machines, Containers, Physical Servers, Context Profiles, and Services. There are also two different views for the Inventory tab when you toggle between Policy mode (Figure 3-37) and Manager mode (Figure 3-38).

Figure 3-37 Inventory tab in Policy mode

Figure 3-38 Inventory tab in Manager mode

Plan & Troubleshoot Tab The Plan & Troubleshoot tab has several functions, like Port, Mirroring, Traceflow, IPFIX, and the Consolidated Capacity that you can use for monitoring and troubleshooting. Consolidated Capacity is only available in Manager mode (Figure 339 and Figure 3-40).

Figure 3-39 Consolidated Capacity in Policy mode

Figure 3-40 Consolidated Capacity in Manager mode

System Tab The System tab is where you prepare/deploy all the host and edge transport nodes and where you do all the NSX-T system core configuration. You can manage the NSX-T Management nodes here, as well as add licenses, register compute managers, add transport zones, create uplink profiles, and much more. We will discuss these options throughout this book. The System tab has a single mode that is not dedicated to the Policy or the Manager mode (Figure 3-41).

Figure 3-41 System tab

Summary The chapter started with the initial deployment of one single NSX-T Manager node. When this was done, it continued by adding the license and a first Compute Manager. These are the necessary steps before we can continue with any other NSX-T related tasks. The chapter then explained how to create a full NSX-T Manager cluster consisting of three NSX-T Manager nodes, including the virtual IP address configuration. The GUI, CLI, and API consumption methods were discussed. You also learned how to configure a virtual IP address across the NSX-T Manager cluster and how to replace the SSL certificates from selfsigned to CA-signed certificates. At the end of the chapter, you had a high-level walkthrough of the GUI and learned about the differences between the Policy and Manager modes. The next chapter explains which components are in the NSX-T data plane and how to configure them.

Footnotes 1 This is a requirement only if you want to deploy the second and third manager from the first one. You still have the option to deploy the other two managers from OVF and link them together with CLI commands to form a cluster.

© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021 I. Hoogendoorn, Getting Started with NSX-T: Logical Routing and Switching https://doi.org/10.1007/978-1-4842-6708-0_4

4. The NSX-T Data Plane Iwan Hoogendoorn1 (1) Rotterdam, The Netherlands

This chapter defines new terms that are used when we configure data plane components. To prepare ESXi hosts to become NSX-T host transport nodes or create edge transport nodes in the form of virtual machines or bare metal servers, we first need to configure IP pools, uplink profiles, and transport zones. The chapter explains what these components are and how they relate to each other. It also explains the difference between a VDS and an N-VDS.

Data Plane Functions and Components The data plane’s primary function is to encapsulate, de-encapsulate, and forward data packets based on various tables hosted on the control plane. The data plane of NSX-T consists of multiple components that are spread across various endpoints. The data plane is used as a distributed forwarding model to transport data packets across dedicated VLANs across the physical network. The data plane performs logical switching, distributed and centralized routing, and packet filtering across multiple components and levels. The NSX-T “endpoints” that NSX-T supports are depicted in Figure 41.

Figure 4-1 NSX-T supported endpoints

Transport Node Overview As you can see in Figure 4-1, NSX-T has a few different transport nodes. These nodes can be split into two categories—host transport nodes and edge transport nodes. The host transport nodes can be a KVM or ESXi host, and the edge transport nodes can be a virtual machine (in different sizes) or a bare metal server. ESXi and KVM transport nodes can work together. Networks and topologies can extend to both ESXi and KVM environments, regardless of the hypervisor type. NSX-T Data Center 3.0 introduces Windows bare metal support as a transport node. The transport nodes will perform network and security functionality. A transport node is mainly responsible for forwarding the network packets on the data plane, which are sent and received by virtual machines, containers, or applications running on bare metal servers.

Transport Node Architecture The virtual distributed switch managed by NSX-T can be a vSphere Distributed Switch (VDS) or an NSX Virtual Distributed Switch (N-VDS). These distributed switches are the main data plane components on the transport nodes. The NSX Proxy is an agent that runs on all transport nodes and receives its configuration from the Central Control Plane (CCP). If you take a closer look inside the virtual switches, you’ll see the internal components depicted in Figure 4-2.

Figure 4-2 Virtual switch components

Transport Node Physical Connectivity When you connect the host transport node to the network, you have the option to fully segregate your overlay network traffic (GENEVE transport) from the management traffic with the use of dedicated physical interfaces. You can also split the networks off with the help of logical separation and VLANs (Figure 4-3).

Figure 4-3 Host transport node physical connectivity network segregation

IP Address Pools An IP address pool is a bucket created for handing out IP addresses for tunnel endpoint interfaces (TEPs). IP address pools can be configured manually, and if you have a mixed ESXi and KVM environment, you can use two different IP pools per host transport node (as long as they are routed). The ESXi host transport node has one or multiple TEP interfaces. Each TEP interface has an IP address that presents itself in a VMkernel interface (vmk). These IP addresses can be in the same subnet or different subnets, depending on the IP pools or DHCP configured for the transport nodes. The KVM host transport node will only support one single TEP interface with one TEP IP address. The edge transport nodes use a dedicated IP address pool to make troubleshooting more straightforward (this is not a requirement, but a recommendation) and allow the edge VM to be hosted on an ESXi host transport node. To create an IP address pool, choose Networking ➤ IP Address Pools ➤ Add Address Pool. You then need to provide a name and click Set to specify an IP address range or an IP block, where a range is a range of IP addresses inside a subnet and an IP block is a full subnet.

Transport Zones A transport zone holds a collection of transport nodes that are capable of communicating with each other using their TEP interfaces. NSX-T has two types of transport zone—a VLAN transport zone and an overlay transport zone. A transport zone defines the span of a logical network across the physical network infrastructure. It tells the transport nodes what logical switches and VLANs it can see, limiting the reach of the transport nodes. The VLAN transport zone is typically used to provide connectivity between the NSX edge uplinks and the upstream physical Layer 3 routers (using VLANs) to make north-south connectivity possible. Regular 802.1Q tagged traffic can also flow across the physical underlay network

using the VLAN transport zone. The overlay transport zone is typically used to provide tunnels between the transport nodes (across the TEP interfaces) and is used via the primary transport mechanism to allow GENEVE-encapsulated traffic to flow across the physical underlay network. Transport nodes are hypervisor hosts (ESXi or KVM), NSX-T edge nodes, and/or bare metal servers that participate in an NSX-T overlay. Thus, a hypervisor host can host virtual machines that can communicate across logical switches. An NSX-T edge node can have logical router uplinks and downlinks configured. A single hypervisor transport node can belong to multiple transport zones (VLAN and overlay). A segment, however, can only belong to only one single transport zone. The NSX-T edge nodes can belong to multiple transport zones: one overlay transport zone and multiple VLAN transport zones (Figure 4-4).

Figure 4-4 Transport zones

Note A transport zone is not a security boundary and should not be treated as one.

Transport Node Switch Configuration The virtual switch on a transport node can either be a VDS or an N-VDS.

The N-VDS is supported on all types of transport nodes (edge, ESXi, and KVM). The VDS, however, is only supported on ESXi version 7 host transport nodes that are managed by a vCenter Server (VDS version 7) and have the physical interfaces governed by the VDS version 7 (vSphere version 7). Note VDS v7 is the supported version for the ESXi hosts in NSX-T Data Center 3.0.

N-VDS The N-VDS, previously called the host switch, is the software component that operates within the hypervisors to form a software abstraction layer between the hosts and the physical network infrastructure. The N-VDS is based on vSphere Distributed Switch (VDS), which provides uplinks for the host connectivity to the physical top of rack switches. When an ESXi host is prepared for NSX-T, an N-VDS is created. The NVDS performs the switching functionality on a transport node: The N-VDS typically owns several physical NICs of the transport node. The N-VDS instances are created on the host or on edge transport nodes. The N-VDS instances configured on different transport nodes are independent. The N-VDS has a name assigned for grouping and management. For example, Figure 4-5 shows two N-VDS switches configured on the transport nodes—an N-VDS named Test and an N-VDS named Production.

Figure 4-5 Design with two N-VDS switches

The networks configured by NSX Manager are opaque to compute managers, such as vCenter Server. The vCenter Server only has visibility into the networks. From the vSphere Client, an NSX administrator can see the network components and select them, but the network administrator cannot edit the network components’ settings. This can be done only through the NSX-T Manager.

VDS When an ESXi host is prepared for NSX-T, there is also an option to use the existing VDS (version 7). This VDS (Figure 4-6) is managed by the vCenter Server. The VDS (v7) performs the switching functionality on a transport node: The VDS typically owns several physical NICs of the transport node. The VDS (v7) is created on the vCenter Server.

Figure 4-6 VDS design

The networks configured by NSX Manager are still opaque to compute managers, such as vCenter Server. The vCenter Server only has visibility into the networks. From the vSphere Client, an NSX administrator can see the network components and select them, but the network administrator cannot edit the network components’ settings. This can be done only through the NSX-T Manager. When a network is configured through the NSX-T Manager and appears on the vCenter Server, it will have a little “N” label attached (Figure 4-7).

Figure 4-7 VDS port groups with the “N” label

N-VDS vs. VDS on the ESXi Host Transport Nodes When you use an N-VDS , NSX-T does not require vCenter Server to operate/configure this. The NSX-T Manager is responsible for creating the N-VDS, and it is independent of vCenter Server. The N-VDS can

coexist with vSphere distributed (VDS) and standard switches (VSS). The vCenter Server sees N-VDS-created networks as opaque network groups. This means that the vCenter Server is aware of their existence but cannot manage them. The N-VDS and VDS both perform Layer 2 forwards and support VLAN, port mirroring, and NIC teaming. The teaming configuration is applied across the switch. Link aggregation groups are implemented as ports. The VDS depends on the vCenter Server and relies on the configuration of the VDS v7. If it’s using a VDS version 7, its MTU size must be set to 1,600 bytes or greater from the vSphere Client to use it with NSX-T.

Transport Zone and N-VDS/VDS Mapping When you create a transport zone, you always need to map it to an NVDS or a VDS. When you have multiple environments like Test and Production, you also need numerous transport zones dedicated to different host transport nodes (Figure 4-8) or you can choose to provide isolation or seggregration on different levels like using VLANs or (logical) Routers (Tier-0 or Tier-2 Gateways). When you have multiple switching environments, it is good practice to use proper names for the transport zone and N-VDS/VDS so they correspond to each other.

Figure 4-8 Multiple transport zones

Transport Zone Creation A transport zone dictates which transport nodes and which workloads can participate in a network. The overlay transport zone will allow you to create logical switch segments on the overlay network, whereas the VLAN transport zone will allow you to use your VLAN network that is

configured in the underlay. An NSX-T environment can contain one or more transport zones, depending on your requirements. A transport node can belong to multiple transport zones, and a logical switch (or segment) can belong to only one (overlay) transport zone. The NSX Manager has the following transport zones preconfigured for you: nsx-overlay-transportzone: For the overlay traffic type nsx-vlan-transportzone: For the VLAN traffic type The span of a logical switch or segment is always limited to a transport zone. So virtual machines in different transport zones can never be on the same Layer 2 networks. When you use ESXi transport nodes, transport zones can span one or multiple vSphere Clusters. To create a transport zone (Figure 4-9), choose System ➤ Fabric ➤ Transport Zones ➤ Add. You then need to provide the type name and the traffic type.

Figure 4-9 Create a new transport zone

When you have successfully configured a transport zone, you can verify it by looking at the Transport Zones overview page (Figure 4-10). Focus on the status column; the label should say “up,” “degraded,” or

“unknown.”

Figure 4-10 Transport zone overview

N-VDS and VDS Modes of Operation The N-VDS and VDS virtual switches can be configured in the modes described in Table 4-1. Table 4-1 N-VDS and VDS Operation Modes Mode

Specification

N-VDS Provides forwarding capabilities on KVM Standard and ESXi host transport nodes, which do mode not require specialized Network Interface Cards (NICs). N-VDS Enhanced Datapath mode

VDS

Additional Information Provides switching functionality comparable to the vSphere standard switch (VSS) orvSphere distributed switch (VDS).

Provides the basic virtual switch features, This mode provides the data plane such as vSphere vMotion, vSphere HA, development kit (DPDK) packetand vSphere DRS. processing performance to the east-west flows within the data center. Enhanced Datapath mode is only available on ESXi transport nodes. You will require specialized (supported) Network Interface Cards (NICs) for this mode.

This switch mode is designed to support Network Functions Virtualization (NFV) type applications.

Provides forwarding capabilities only on ESXi host transport nodes, which do not require specialized Network Interface Cards (NICs) .

The VDS mode is only supported with VDS v7 running under vSphere version 7.

This mode is not suitable for generic data center applications or deployments where traditional VM-based or bare metal NSX-T edge nodes must be used.

Data Plane Development Kit (DPDK) DPDK is a set of data plane libraries and Network Interface Card (NIC) drivers that provide fast packet processing. DPDK uses various optimizations around the CPU and memory usage management to improve the packet-processing speed. When you compare the standard

way of packet processing with DPDK, it helps you decrease the CPU cost and increase the number of packets processed per second. DPDK can be tuned to match the desired performance for general or specific use cases.

Network Function Virtualization (NFV) With NFV, the focus shifts from raw throughput to the packet-processing speed. In these types of workloads, the applications do not send a smaller number of large packets. They send many smaller packets that are as small as 128 bytes . The built-in TCP optimizations do not help with these workloads. DPDK is used to deliver performance for these types of packet-processing workloads.

Physical Network Interface Cards (NICs), Link Aggregation Groups (LAGs), and Uplinks N-VDS allows for the virtual-to-physical data packet flow by binding the logical router uplinks and downlinks to physical NICs on the host transport node. A host transport node can have different physical ports, called physical NICs. These physical NICs can be bundled to form a bundled link called a Link Aggregation Group (LAG) port. LAGs use the Link Aggregation Control Protocol (LACP) to transport these data packets. The “uplinks” of an N-VDS are assigned to physical NICs or LAGs. Note Uplinks are logical interfaces from NSX-T’s point of view, so one uplink can consist of multiple ports (physical interfaces). In Figure 4-11, you see an example of where one logical NSX-T uplink translates into two separate physical ports (physical NICs), and these NICs are both configured as a channel or LAG toward the top rack switch.

Figure 4-11 Host transport node LAG configuration

Figure 4-12 shows an example of where two logical NSX-T uplinks translate into two separate physical ports (physical NICs). These NICs are separately configured towards the Top of Rack (ToR) switch.

Figure 4-12 Host transport node dedicated link configuration

The VDS maps the uplinks on NSX-T to “uplinks on VDS” and not to

physical NICs directly, just like the traditional way of uplink mapping you are used to with a VDS.

Uplink Profiles Within an uplink profile, you store properties (or settings) related to the NIC settings. These uplink profiles are applied to the transport nodes, whereby the interface settings will trickle down to the physical NICs. With the uplink profile, you can set the following properties: Teaming policy Active and Standby uplinks Transport VLAN ID Maximum Transmission Unit (MTU) Figure 4-13 shows two examples of how the profiles relate to the interfaces and host transport nodes. The ESXi host transport node has an uplink profile attached, where two physical NICs are active, and the KVM host transport node has an uplink profile linked where only one physical NIC is active.

Figure 4-13 Uplink profiles

To create an uplink profile, choose System ➤ Fabric ➤ Profiles ➤ Uplink Profiles ➤ Add. When you browse to these uplink profiles, you will see that there are already some uplink profiles configured by default. Note Uplink profiles must also be applied to edge transport nodes.

Teaming Policy Types One of the uplink profile settings is the teaming policy, which defines how the traffic is forwarded to the interfaces. There are three types of teaming policies. The teaming policies differ in regards to transport node support, as shown in Table 4-2. Table 4-2 Teaming Policies Teaming Description Policy

Transport Node Support

Failover Order

Edge transport node, ESXi host transport node, KVM host transport node

An active uplink is specified with an optional list of standby uplinks. If the active uplink fails, the next uplink in the standby list replaces the active uplink. No actual load balancing is performed with this option.

Load A list of active uplinks is specified, and each interface on the Balanced transport node is pinned to one active uplink based on the Source source port ID. This configuration allows the use of several active uplinks at the same time.

Edge transport node, ESXi host transport node

Load This option determines the uplink based on the source VM’s Balanced MAC address. Source MAC

Edge transport node, ESXi host transport node

Note An “uplink” is not the same as a “port.” The Load Balanced Source and Load Balanced Source MAC teaming policies do not allow configuration of standby uplinks, nor are they supported on KVM transport nodes. The KVM hosts are limited to the failover order teaming policy and single LAG support. For LACP, multiple LAGs are not supported on the KVM hosts.

In Figure 4-14, UPLINK 1 is active and UPLINK 2 is standby. This means that all virtual machines will be pinned to UPLINK 1. In the third image on the far right (the one titled as LAG), even though the failover order has been configured, an additional layer of redundancy is added to the physical network level using LAGs. When VM1 needs to send data to VM4 or VM5, the physical network interfaces used to send and receive the traffic will be the same. This is not the case when you are using LAG. The physical network will handle the traffic balancing with LAG.

Figure 4-14 Uplink profiles - failover order

In Figure 4-15, UPLINK 1 and UPLINK 2 are both active. The virtual machine pinning, however, is done based on the port ID. This means that each virtual machine will be pinned to a different uplink. When VM7 needs to send data to VM4 or VM5, the physical network interfaces used to send and receive the traffic will not be the same. This is also not the case when you are using LAG. The physical network will handle the traffic balancing with LAG. Sending and receiving the traffic will be balanced across the available interfaces (based on the load-balancing algorithm used). By adding LAG to the mix, the traffic will also be balanced on the physical NIC level based on the LAG load-balancing algorithm.

Figure 4-15 Uplink profiles - port ID

In Figure 4-16, the UPLINK 1 and UPLINK 2 are both active. The virtual machine pinning, however, is done based on the MAC address. This means that each virtual machine will be pinned to a different uplink. When VM7 needs to send data to VM4 or VM5, the physical network interfaces used to send and receive the traffic will not be the same. This is also not the case when you are using LAG. The physical network will handle the traffic balancing with LAG. Sending and receiving the traffic will be balanced across the available interfaces (based on the load-balancing algorithm used). By adding LAG to the mix, the traffic will also be balanced on the physical NIC level based on the LAG load-balancing algorithm.

Figure 4-16 Uplink profiles - MAC

Link Layer Discovery Protocol (LLDP) The Link Layer Discovery Protocol (LLDP) is a multi-vendor protocol used to announce your local identity to another network device (capability, identity, and neighbor information). With NSX-T, you can use this protocol to send your local network-related details to the network device you connect to and receive information about the network device you are connecting to. NSX-T will typically receive information about what type of switch and switch port the host transport node is connected to. NSX-T will typically send information to the switch about its virtual switch (N-VDS or VDS). By using LLDP, you will be able to simplify troubleshooting, and you will make day-to-day operations easier for yourself. LLDP within NSX-T can be “enabled” or “disabled” when you either create a host transport node profile or configure a standalone ESXi host transport node, as you can see in Figure 4-17.

Figure 4-17 LLDP configuration

Network I/O Control (NIOC) Profile With NIOC, you can prioritize or reserve the system traffic bandwidth based on the capabilities of your NIC. NIOC version 3 (NIOCv3) uses “shares, limits, and reservations” to allocate available bandwidth and can only be configured on ESXi host transport nodes. The traffic types supported for prioritization are displayed in Figure 4-18.

Figure 4-18 NIOC profile

The bandwidth allocation for the system traffic is divided into the parameters described in Table 4-3. Table 4-3 NIOC Bandwidth Allocation Parameters Bandwidth Description Allocation Parameter

Shares

The shares value can be set between 1 and 100. This value reflects the priority of a specific traffic type (in the list) against other system traffic types that share the same physical network card.

Reservation The reservation value is to set the minimum bandwidth in percentage that must be guaranteed on the physical network interface card for that specific system traffic type where it's configured. You can only reserve up to 75 percent in NIOCv3. Limit

The limit value is to set the maximum bandwidth in percentage, which a system traffic type can use on a single physical adapter.

Transport Node Profiles A transport node profile is used to apply the same configuration required to configure the ESXi host transport node to multiple hosts inside a vSphere Cluster. This means that all the ESXi host transport nodes inside a vSphere Cluster can now be configured consistently, and the process is faster to prepare ESXi hosts to become host transport nodes. Once a transport node profile is created, you can reuse it with other (future) clusters as well. Transport node profiles can only be applied to vSphere Clusters and not to hosts individually (Figure 4-19).

Figure 4-19 Transport node profile

Before you can create a host transport node profile, you need to make sure your host transport node is an ESXi host part of a vSphere Cluster managed by a vCenter Server. This vCenter Server is added as a Compute Manager to the NSX-T architecture. You also need to make sure that your transport zone and IP pools (or a DHCP server) is configured/created, because all this information needs to be specified in the transport node profile (Figure 4-20 and Figure 4-21).

Figure 4-20 Add a transport node profile (N-VDS) (1)

Figure 4-21 Add a transport node profile (VDS) (2)

To create a host transport node profile, choose System ➤ Fabric ➤ Profiles ➤ Transport Node Profiles ➤ Add. Once the transport node profile has successfully been configured, you can apply it to a vSphere Cluster (Figure 4-22). To do so, choose System ➤ Fabric ➤ Nodes ➤ Host Transport Nodes. Then follow these steps: Table 4-4 Attaching a Transport Node Profile to a vSphere Cluster Step Description 1

Select the vCenter Server (the Managed By field).

2

Select the host transport node (vSphere) cluster.

3

Configure NSX.

4

From the dropdown menu, select the Transport Node Profile.

Figure 4-22 Configure host transport node preparation with a host transport node application on a vSphere Cluster

ESXi Host Preparation The process where you install the NSX-T bits (the .vib installation) is also called host preparation . The host preparation process can be done by either applying a host transport node, where you install the bits on a collection of hosts inside a vSphere Cluster, or by installing the bits on a single ESXi host (one-by-one) (Figure 4-23).

Figure 4-23 Configure standalone ESXi host preparation

To start the host preparation, process a single host. To do so, choose System ➤ Fabric ➤ Nodes ➤ Host Transport Nodes and then follow these steps: Table 4-5 Configuration of the Host Transport Nodes (preparation) Step Description 1

Select the vCenter Server (the Managed By field).

2

Select the host transport node (vSphere) cluster.

3

Click the host you want to prepare.

4

Configure NSX.

You will see the screen shown in Figure 4-23. To start the host preparation, the same information is required that you saw in the transport node profile process. However, this example configures it on the host directly instead of through a profile. When the profile is applied successfully (when you are using the

profile method), you can verify this by looking at the summary screen when you need to pay attention to the NSX Configuration, NSX Version, and Node Status columns. Figure 4-24 shows you an example of a successfully prepared host transport node and a failed host transport node.

Figure 4-24 Host transport nodes overview

You can also verify the status of the host transport nodes in the System view of the dashboard. To browse to the dashboard, choose Home ➤ Monitoring Dashboards ➤ System. The colors in the dashboard are explained in Table 4-6. Table 4-6 Host Transport Node Dashboard Explanation Color

Meaning

Green

The environment elements are healthy, and everything is working as expected, without any issues.

Red

Elements in the environment have a critical issue that needs to be resolved.

Orange Elements in the environment have degraded performance and possibly need attention. Gray

Elements in the environment are not configured.

Figure 4-25 System monitoring

When you point to the circle (Figure 4-25), you will find that messages will appear. These messages provide details about the nodes. In Figure 4-25, you can see five nodes, and out of these five nodes, one is configured as a transport node. Three are not configured for NSX-T. If you look closely, you even see that one is red and has an issue that needs to be resolved. Another way to verify if the host preparation is successful is to check through the vCenter Server (Figure 4-26). Here, you confirm if the N-VDS is installed, or if the VDS is converted into a VDS managed by NSX, and if your physical network interface cards are correctly assigned to this NSXenabled virtual switch. Table 4-7 Verification of the NSX-enabled virtual switch Step Description 1

Log in to the vCenter Server.

2

Open Hosts and Clusters.

3

Select one of the hosts that are prepared for NSX-T.

4

Click Configure.

5

Click Virtual Switches.

6

Verify if the name is prefixed with NSX Switch.

Figure 4-26 Virtual switch verification in vSphere

If you take a closer look inside the host on the Command Line

Interface (CLI) of the ESXi host, you can verify if the kernel modules packed in the .vib files are installed successfully. During the host preparation process, the NSX-T Manager uploads these files to the host that is being prepared. The kernel modules (or .vib files) are responsible for NSX-T network and security features, like distributed firewalling, logical routing, and much more. [root@pod-240-esxi-12:~] nsx-adf 7.0.15945993 07 nsx-cfgagent 7.0.15945993 07 nsx-context-mux 7.0.15945993 07 nsx-cpp-libs 7.0.15945993 07 nsx-esx-datapath 7.0.15945993 07 nsx-exporter 7.0.15945993 07 nsx-host 7.0.15945993 07 nsx-idps 7.0.15928666 07 nsx-monitoring 7.0.15945993 07 nsx-mpa

esxcli software vib list | grep nsx 3.0.0.0.0VMware VMwareCertified 2020-073.0.0.0.0VMware VMwareCertified

2020-07-

3.0.0.0.0VMware VMwareCertified

2020-07-

3.0.0.0.0VMware VMwareCertified

2020-07-

3.0.0.0.0VMware VMwareCertified

2020-07-

3.0.0.0.0VMware VMwareCertified

2020-07-

3.0.0.0.0VMware VMwareCertified

2020-07-

3.0.0.0.0VMware VMwareCertified

2020-07-

3.0.0.0.0VMware VMwareCertified

2020-07-

3.0.0.0.0-

7.0.15945993 07 nsx-nestdb 7.0.15945993 07 nsx-netopa 7.0.15945993 07 nsx-opsagent 7.0.15945993 07 nsx-platform-client 7.0.15945993 07 nsx-proto2-libs 7.0.15945993 07 nsx-proxy 7.0.15945993 07 nsx-python-gevent 15366959 07-07 nsx-python-greenlet 15670904 07-07 nsx-python-logging 7.0.15945993 07 nsx-python-protobuf 15670901 07-07 nsx-python-utils 7.0.15945993 07 nsx-sfhc 7.0.15945993

VMware

VMwareCertified

2020-07-

3.0.0.0.0VMware VMwareCertified

2020-07-

3.0.0.0.0VMware VMwareCertified

2020-07-

3.0.0.0.0VMware VMwareCertified

2020-07-

3.0.0.0.0VMware VMwareCertified

2020-07-

3.0.0.0.0VMware VMwareCertified

2020-07-

3.0.0.0.0VMware VMwareCertified

2020-07-

1.1.0VMware VMwareCertified 0.4.14VMware VMwareCertified 3.0.0.0.0VMware VMwareCertified

2020-

2020-

2020-07-

2.6.1VMware VMwareCertified

2020-

3.0.0.0.0VMware VMwareCertified

2020-07-

3.0.0.0.0VMware VMwareCertified

2020-07-

07 nsx-shared-libs 7.0.15945993 07 nsx-vdpi 7.0.15945993 07 nsxcli 7.0.15945993 07

3.0.0.0.0VMware VMwareCertified

2020-07-

3.0.0.0.0VMware VMwareCertified

2020-07-

3.0.0.0.0VMware VMwareCertified

2020-07-

[root@pod-240-esxi-12:~] esxcli software vib list | grep vsip vsipfwlib 3.0.0.0.07.0.15945993 VMware VMwareCertified 2020-0707 [root@pod-240-esxi-12:~]

(Bare Metal) Physical Server It is also possible to directly attach a physical server as a transport node to NSX-T (Figure 4-27). Table 4-8 explains the terminology.

Figure 4-27 Physical (bare metal) server architecture Table 4-8 Physical Server (Transport Node) Terminology Term

Description

Application

Represents the actual application running on the physical (bare metal) server (web server, application server, or database server).

Application Interface

Represents the physical NIC that the application uses to send and receive network traffic. NSX-T supports only one application interface per physical (bare metal) server.

Management Represents the physical NIC, which the physical (bare metal) server uses for Interface management. VIF

Represents the peer of the “application interface.” This is attached to the logical switch.

(Bare Metal) Physical Server “Modes” NSX-T (bare metal) physical servers are supported in the three modes explained in Table 4-9. Table 4-9 Physical Server Modes Mode

Description

VLAN Mode

Single or dual physical network card supported. Application and management interfaces need to have different IP

addresses. The NSX-T Segment is configured with an underlay VLAN tag. Overlay Mode

Single or dual physical network card supported. Application and management interfaces need to have different IP addresses.

Underlay (VLAN 0) Mode

Single or dual physical network card supported. Application and management interfaces share the same IP addresses. The NSX-T Segment is configured with VLAN 0.

Summary This chapter described the NSX-T data plane components. You now should know the difference between an N-VDS and a VDS and how they relate to the transport nodes. The chapter also explained related objects like uplink profiles, transport node profiles, network I/O control, LLDP, the difference between an uplink and a port, and how the physical network interface cards relates to this. This chapter ended with an explanation of what the host transport node preparation means and does to the host when the NSX-T bits are installed, and it explained the support of physical (bare metal) servers. The next chapter covers the Logical Switching feature, which is one of the main network services offered by NSX-T.

© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021 I. Hoogendoorn, Getting Started with NSX-T: Logical Routing and Switching https://doi.org/10.1007/978-1-4842-6708-0_5

5. Logical Switching Iwan Hoogendoorn1 (1) Rotterdam, The Netherlands

This chapter covers logical switching use cases and how logical switching is performed using NSX-T Segments. It is possible that Layer 2 can be stretched using TEP, MAC, and ARP tables. This chapter also explains how the GENEVE encapsulation works and how the segments are configured on the data plane using the NSX-T GUI. This chapter ends by explaining the different Segment profiles that can be applied to a Segment or a Segment port to configure more granular settings related to a specific segment or Segment port. It also explains how Broadcast, Unicast, and Multicast (BUM) traffic is handled using the two different replication methods.

Logical Switching Use Cases Before I explain the use cases of logical switching, I want to mention that a logical switch is nothing more than a Layer 2 broadcast domain that is typically comparable in the traditional data center world with a VLAN. There are specific challenges when dealing with VLANs and virtual machines. Table 5-1 lists these challenges. Table 5-1 Logical Switching Use Cases and Challenges Capability

Challenge

Dividing For each tenant and/or tier, a new set of VLANs is required. The lead time to get a applications new VLAN created in a production environment is not the only a challenge; the across VLAN limit of 4094 can also be an issue. multiple

tiers and/or various tenants Virtual machine mobility

When a virtual machine is moved, the network should be moved as well. Thus, the same VLAN needs to be available on the destination location, which is another physical data center. You typically don’t want to stretch Layer 2 VLANs across multiple sites because you want to keep your failure domains small. That way, if something goes wrong (with the VLAN) in the first data center, the second will not go down as well.

Hardware memory tables

You always have MAC and FIB limitations that are hardware bound.

Having said that, NSX-T offers a solution (Table 5-2) to some of these challenges in the form of NSX-T Segments (or also sometimes called logical switches). Table 5-2 Logical Switching Benefits Capability

Benefit

Dividing applications across multiple tiers and/or various tenants

With the use of logical switching, the application architecture is now much more scalable.

Reusing existing hardware

A segment is capable of running on top of your existing physical network infrastructure.

Running Layer 2 over Layer 3

Layer 2 broadcast domains can now be extended across Layer 3 boundaries.

Optimal Segment Span

Layer 2 broadcast domains now span across multiple switches, but also across hypervisor hosts.

Logical Switching Terminology Before I continue explaining logical switching within NSX-T, I will first explain some new and essential terms (Figure 5-1 and Table 5-3) around logical switching. Note In logical switching, the terminology used is sometimes different, depending on when you use the “policy” or “manager” mode for the configuration.

Figure 5-1 Logical switching components Table 5-3 Logical Switching Terminology Terminology

Description

Virtual Machine The virtual machine hosts the application or service. Application Container(ized) This is the container that hosts the application or service. Application Segments (Policy)

In Figure 5-1, the Web, App, and DB segments represent your Layer 2 broadcast domains.

Logical Switches (Manager) Segment Ports (Policy)

In Figure 5-1, you see 12 “Segment ports” connected to a Segment. These are the switch ports that interface with the virtual machines or containers.

Logical Switch Ports (Manager) Segment These are profiles that can be applied to change the settings within a segment. Profiles (Policy) The current available Segment profile types are: Switching • Quality Of Service (QoS) Profiles • IP Discovery (Manager) • SpoofGuard • Switch Security • MAC Management T0/T1 Gateway(s)

The Tier-0 or Tier-1 gateway is responsible for routing between the segments. The Tier-0 gateway must route virtual segments to physical VLANs.

Segments I mentioned earlier that a segment is a Layer 2 broadcast domain that is stretched across one or multiple host transport nodes (comparable with a VLAN). The virtual machines attached to a Segment port that is part of the same segment can communicate without the traffic being routed across the host transport nodes. NSX-T will assign a unique virtual network identifier (VNI) to each segment, which is comparable to a VLAN-ID (Figure 5-2).

Figure 5-2 Segment stretched across two host transport nodes

Within NSX-T, a segment can either be a VLAN segment or a GENEVE segment. In the previous chapter, you learned about a VLAN and an overlay transport zone, where the overlay transport zone managed the GENEVE segments. The transport zone also defines the span of the segments, so only the hosts that are part of a particular transport zone can offer switching services on that segment. Whenever a segment is created, it will be visible in the vCenter Server and ESXi hosts (vSphere), but you cannot manage or delete it because this network will be an opaque object. Segments can be created through the GUI and by using REST APIs.

Tunneling The segments are stretched across multiple host transport nodes with the use of GENEVE encapsulation. Each host will have a tunnel endpoint interface (TEP) so that the tunnel can be built across (to other TEP interfaces). The virtual machine frames then are encapsulated with the GENEVE tunnel headers, and the traffic is sent through the GENEVE

tunnel (Figure 5-3).

Figure 5-3 GENEVE tunnel between two host transport nodes

GENEVE GENEVE is an overlay tunneling mechanism that allows you to send Layer 2 network traffic across Layer 3 networks within the data plane developed by the IETF. GENEVE-encapsulated packets used the following communication flow. Use Figure 5-3 as a reference. 1. 2. 3. 4.

The source virtual machine (172.16.10.1) sends data to the destination virtual machine (172.16.10.2). The source TEP interface (192.168.10.1) encapsulates the virtual machine frame in the GENEVE header. The encapsulated packet is then sent to the destination TEP interface (19.168.20.1) using UDP port 6081.

The destination TEP interface (19.168.20.1) decapsulates the GENEVE header and delivers the frames sent by the sending virtual machine (172.16.10.1) to the destination virtual machine (172.16.10.2).

GENEVE Header Format The GENEVE protocol provides a new approach to encapsulation. It provides control plane independence between the tunnel endpoints. GENEVE runs over UDP port 6081 and uses a 24-bit identifier to identify the NSX-T Segment (Figure 5-4). GENEVE is also supported by packetcapture applications like tcpdump and Wireshark.

Figure 5-4 GENEVE header

Low-Level Packet Walk Using GENEVE (End-ToEnd Communication) Figure 5-5 shows the following details: The ESXi host A is configured as a transport node with TEP IP: 192.168.10.1, and N-VDS/VDS is installed on the hypervisor during the transport node creation. The VMkernel interface VMK10 (not displayed in this figure) is created on the ESXi host. The ESXi host B is configured as a transport node with TEP IP:

192.168.20.1, and N-VDS/VDS is installed on the hypervisor during the transport node creation. The VMkernel interface VMK10 (not displayed in this figure) is created on the ESXi host. The ESXi transport nodes are configured in the transport zone named TEST-OVERLAY-TZ. Transport node A is running VM1 with IP address 172.16.10.1 and MAC address AAAA-AAAA-AAAA. Transport node B is running VM2 with IP address 172.16.10.2 and MAC address BBBB-BBBB-BBBB. VM1 and VM2 are connected to the Segment ports on the web segment. This web segment is an overlay-based segment configured in the transport zone named TEST-OVERLAY-TZ. VM1 communicates with VM2. The source hypervisor encapsulates the packet with the GENEVE header and sends it to the destination transport node. The destination transport node decapsulates the packet and forwards it to the destination VM2. During VM1 to VM2 communication: 1. 2. 3. 4. 5.

VM1 sends the traffic to the web segment. The source hypervisor (A) TEP (192.168.10.1) encapsulates the packet with the GENEVE header. The source transport node (A) forwards the packet to the physical network. The destination transport node (B) receives the packet and performs the decapsulation. The destination TEP (192.168.20.1) forwards the L2 frame to the destination VM2.

Figure 5-5 GENEVE encapsulation process

Creating Segments Under the Hood (Management, Control, and Data Planes) When a segment is created, the following happens under the hood: 1. 2. 3. 4. 5.

The segment is created using the GUI or API by a user. The Policy role forwards the information/configuration to the Manager role. The Manager role translates the segment information/configuration and updates the Corfu database. The Manager role forwards the information to configure the segment to the Central Control Plane (CCP) . The Central Control Plane (CCP) sends the

information/configuration to the Appliance Proxy Hub (APH). 6.

7. 8. 9.

The Appliance Proxy Hub (APH) sends the information/configuration to the Local Control Plane (LCP) (NSXproxy) using an NSX-RPC communications channel across TCP port 1234. The NSX-proxy receives the information/configuration and updates the NestDB. The nsx-cfgAgent service running on the ESXi host retrieves the information/configuration from the NestDB. The nsx-cfgAgent service finally uses the nsx-t-vdl2 kernel module to create and configure the segments as opaque objects. This whole process is shown in Figure 5-6.

Figure 5-6 Segment creation process

Creating Segments To create a new segment, choose Networking ➤ Segments ➤ Add Segment (see Figure 5-7). The parameters that are required to configure a segment are explained in Table 5-4.

Figure 5-7 Create a segment Table 5-4 Segment Parameters Parameter

Description

Segment Name

The descriptive name of the segment you give it for a specific purpose.

Connectivity You can choose not to attach the segment to any gateway, or you can connect it to a Tier-0 or Tier-1 gateway. Transport Zone

The transport zone is either the overlay or the VLAN transport zone you created earlier.

Subnets (Optional)

When you attach the segment to a gateway, you can specify the subnet and gateway IP address of this segment here.

VLAN ID (Optional)

When you use the VLAN transport zone, you can specify a VLAN ID to stretch the VLAN ID you have on the physical infrastructure into NSX-T.

When a segment has been successfully created, you can verify it on the Segments summary page. In the example in Figures 5-8 and 5-9, you see that the Status is set to “Success” and the Admin State is “Up.”

Figure 5-8 Segment creation verification

Figure 5-9 Segment creation verification

Another way to verify this is to look at the vCenter Server in the Networking tab, as shown in Figures 5-10 and 5-11.

Figure 5-10 Segment creation verification in vCenter Server - VDS

Figure 5-11 Segment creation verification in vCenter Server - N-VDS

Attaching a Segment to a Virtual Machine When your segment is ready, you can attach it to a virtual machines virtual NIC (vNIC) . There are multiple ways to do this: 1. 2.

Attach the segment through the Virtual Machine Settings (using vCenter Server) (Figure 5-12). Attach the segment through the Segments Settings (using vCenter Server) (Figure 5-13).

Figure 5-12 Virtual machine attachment using the VM Edit option (from the VM level)

Figure 5-13 Virtual machine attachment using the VM Migration option (from the Port Group level)

Verifying the Segment Port Status In the Segment summary screen, you can also verify the ports that are in

use on the segments. When you click the ports, you can see the port-ID that’s attached to the Virtual Machine vNIC (Figure 5-14).

Figure 5-14 Virtual machine’s port-ID attachment

When you set up the segment and attach the virtual machines to it, you can also test the virtual machine connectivity on the same segment. You can use the following two methods to do this: 1. 2.

Using SSH or the Virtual Machine console, log in to Virtual Machine Web-01 (172.16.10.1), attached to the Web Segment segment. Ping Virtual Machine Web-02 (172.16.10.2), which resides on another ESXi host. This virtual machine is also attached to the Web Segment segment (Figure 5-15).

Figure 5-15 Virtual machine attachment verification

Attaching Segments Under the Hood (Management,

Control, and Data Planes) When you attach a segment to a virtual machine vNIC (Figure 5-16), the following happens: 1. 2. 3. 4. 5. 6. 7.

First, you attach a virtual machine vNIC to a segment from vCenter Server. The vCenter Server sends this (attachment) request to the ESXi host that contains this virtual machine. The ESXi host sends the (attachment) request to the NSX-T management plane. The NSX-T Manager configures a (new) logical interface 1 (LIF-1) on the segment with a new virtual interface (VIF-1) attachment. The NSX-T Manager advertises the attachment configuration to the CCP. The master controller sends the request to the host on which the virtual machine resides. LIF-1 is created on the Web Segment segment with VIF-1 attached.

Figure 5-16 Segment attachment process

Segment Profiles With Segment profiles, you can apply more granular settings to a segment or a Segment port. The Segment profile types that can be configured through the NSX-T GUI are shown in Figure 5-17.

Figure 5-17 Segment profile types

To create a new Segment profile, choose Networking ➤ Segments ➤ Segment Profile ➤ Add Segment Profile. There will be a default Segment profile available for each type (Figure 5-18).

Figure 5-18 Default Segment profiles

IP Discovery Segment Profile The IP Discovery profile uses DHCP snooping, address resolution

protocol (ARP) snooping, or VMware tools to learn the VM MAC and IP addresses. When you create an IP Discovery Segment profile, you can change the settings displayed in Figure 5-19. The default operating mode that ARP snooping and ND snooping perform in is called Trust On First Use (TOFU) . When an IP address is discovered in TOFU mode and added to the binding list, it will be there forever. The first IP address, the MAC address, and the VLAN discovered using ARP/ND snooping will be added to the list. TOFU for ARP/ND snooping can be disabled inside the IP Discovery Segment profile if you want. When you do this, TOFU changes to Trust On Every Use (TOEU). In this TOEU mode, the discovered IP addresses are placed in the binding list and deleted when they expire. DHCP snooping and VMware tools always operate in TOEU mode.

Figure 5-19 IP Discovery profile

The IP Discovery profile works in the following ways: DHCP/DHCPv6 snooping: Inspects the packets exchanged between the virtual machine’s DHCP/DHCPv6 client and the DHCP/DHCPv6 server to learn the virtual machine’s IP addresses and MAC addresses. ARP Snooping: Inspects a virtual machine’s outgoing ARP messages and GARP messages to learn the virtual machine’s IP addresses and MAC addresses. VMware Tools: Run on a virtual machine hosted on an ESXi host and can provide the virtual machine’s configuration information (IP

addresses). ND Snooping: The IPv6 equivalent of ARP snooping. It inspects Neighbor Solicitation (NS) and Neighbor Advertisement (NA) messages to learn the IP addresses and MAC addresses. Duplicate Address Detection: Checks whether an IP address that is newly discovered is already in the system on the realized binding list for a different Segment port. The IP Discovery profile may be used when the distributed firewall depends on the IP-to-port mapping to create the firewall rules. When IP Discovery is not in place, the distributed firewall must find the IP address of a logical port through SpoofGuard and manual address bindings, which is a slow process that can quickly lead to errors.

MAC Discovery Segment Profile The MAC Discovery profile supports two functionalities: MAC learning and MAC address change. When you create a MAC Discovery Segment profile, you can change the settings displayed in Figure 5-20.

Figure 5-20 MAC Discovery profile

The MAC Discovery profile supports source MAC address learning (on the overlay segment): Source MAC address-based learning is a feature used on physical networks to determine a physical server/device’s MAC address. This MAC learning feature is useful when you have a virtual machine with multiple MAC addresses behind one virtual Network Interface Card (vNIC). Unknown unicast flooding is also supported by MAC learning. This means that when a unicast packet is received with an unknown MAC

address, this packet will be flooded out to each available Segment port with MAC learning and unknown unicast flooding enabled. This is enabled by default when MAC learning is enabled. The MAC Discovery profile also allows a virtual machine to change its MAC address: When a virtual machine is connected with the MAC change setting enabled, it is possible to change the MAC address on the vNIC and still allow traffic on that vNIC. The MAC change feature is disabled by default. The maximum number of MAC addresses that can be learned inside the segment is 4096. This means that you should not create segments with a larger subnet than /21. You can set the behavior on a segment when this limit of 4096 is reached to perform the following actions: DROP Packets that are coming from an unknown source MAC address are dropped. Packets coming inbound to this MAC address are treated as unknown unicast. The Segment port will receive the packets only if it has unknown unicast flooding enabled. ALLOW Packets coming from an unknown source MAC address are forwarded even though the MAC address is not learned. Packets coming inbound to this MAC address are treated as unknown unicast. The Segment port will receive the packets only if it has unknown unicast flooding enabled. When you decide to enable the MAC Learning and MAC Change feature, you should also enable the SpoofGuard feature to enhance security.

Quality of Service (QoS) Segment Profile The QoS profile provides a way to prefer high-quality and dedicated

network performance. The QoS Segment profile can be used to configure the average inbound and outbound bandwidth values to set the “transmit limit rate.” When you want to prevent congestion on the north-facing network links, the “peak bandwidth rate” setting can be used to specify a limit that network traffic is allowed to burst. All the settings you can use inside the QoS Segment profile will not guarantee a certain amount of bandwidth, but will limit network bandwidth usage. The actual bandwidth is always determined by the link speed of the Segment port of the Segment profile values, where the lowest is the determining factor. When you create a QoS Segment profile, you can change the settings displayed in Figure 5-21.

Figure 5-21 QoS Discovery profile

QoS will improve network performance by allowing bandwidth priority by controlling the latency and jitter and by reducing packet drops for data you mark as preferred. The QoS Segment profile supports the two methods described in Table 5-5. Table 5-5 Supported QoS Methods QoS Method

Description

Class of CoS prioritizes Layer 2 packets that are buffered due to network congestion. Service (CoS) Differentiated DSCP prioritizes Layer 3 packets based on the DSCP values set at the borders of Services Code the network or virtual machine. It sets the DSCP value at the segment level if a Point (DSCP) new marking needs to be applied. The DSCP value is applied to the outer header of the encapsulated frames, which

allows the physical network to prioritize this traffic type. DSCP can be in trusted or untrusted mode. In trusted mode, the DSCP value is also copied into the inner header of the packet, whereas in untrusted mode this is not the case. These DSCP settings within this QoS Segment profile are only dedicated to tunneled traffic and not to traffic that flows on the same segment within the same ESXi host.

Segment Security Profile The Segment Security profile provides stateless Layer 2 and Layer 3 security by checking the ingress traffic of the segment and matching the IP address, MAC address, and protocols to a set of allowed addresses and protocols. Unauthorized packets will be dropped. When you create a Segment Security profile, you can change the settings displayed in Figure 5-22.

Figure 5-22 Segment Security profile

With the Segment Security profile, you can use the stateless Layer 2 and Layer 3 security features. These security features will block unauthorized network traffic to and from the virtual machines on the segment’s Segment port. You can match IP addresses, MAC addresses, and protocols to be allowed by configuring the Bridge Protocol Data Unit (BPDU) filter, DHCP snooping DHCP server block, and rate-limiting options. Table 5-6 describes the different filtering options available on this Segment profile. Table 5-6 Supported Filtering on the Segment Security Profile

Filtering Description Option BPDU Filter

The BPDU Filter blocks all BPDU traffic coming through each Segment port on the segment. This disables spanning tree on the segment.

BPDU Filter Allow List

You can make an exception for specific MAC addresses where you want to receive BPDU traffic by placing the MAC addresses in the allowed list.

DHCP There are two types of DHCP Filtering options, the server block and the client block. Filtering The server block will block the traffic from a DHCP server to a DHCP client. The client block will block DHCP traffic from the client to the server. With both of these options, traffic is never blocked from a DHCP server to a relay agent. The server block-IPv6 and client block-IPv6 options will provide the same DHCP filtering options for IPv6 network traffic. Non-IP Traffic Block

When this option is turned on, the Segments (or Segment ports) will only allow IPv4, IPv6, ARP, GARP, and BPDU traffic. Non-IP traffic will be blocked. The traffic that is allowed is based on policies that are set in the address binding and SpoofGuard configurations. This option is disabled by default.

RA Guard

RA Guard will filter incoming IPv6 route advertisement by filtering out ICMPv6-type 134 packets. This option is enabled by default.

Rate Limits

With this option, you can configure rate limits for incoming and outgoing broadcast and multicast traffic to protect your virtual machines on that segment against broadcast storms. The minimum value must be set to 10 packets per second (PPS)— or higher than that to prevent connectivity problems.

SpoofGuard Segment Profile SpoofGuard prevents traffic with incorrect source IPs and MAC addresses from being transmitted. When you create a SpoofGuard Segment profile, you can change the settings displayed in Figure 5-23.

Figure 5-23 SpoofGuard Segment profile

SpoofGuard protects against spoofing with MAC, IP, and VLAN bindings. When the virtual machine’s IP address does not match the IP

address on the Logical Segment port where this virtual machine is connected, SpoofGuard will prevent this vNIC from connecting to the network. You can configure SpoofGuard on the Segment and Segment port level. Reasons to use SpoofGuard include: Ensures rogue virtual machines cannot impersonate another virtual machine by using the same IP address of an existing virtual machine. Ensures that virtual machine IP addresses cannot be changed on the fly without proper controls in place. Ensures that distributed firewall rules cannot be bypassed. The distributed firewall uses source and destination IP addresses to enforce a specific security policy. When an IP address is forged, this means that the firewall can be bypassed.

Applying Segment Profiles to Segments By default, Segment profiles (Figure 5-24) will be used for each newly created segment. If you have a (new) specific Segment profile (usecreated) that you want to apply to a segment, choose Networking ➤ Segments, then follow these steps: 1. 2. 3. 4.

Select the segment (that you want to apply the new Segment profile to). Click the three dots and select Edit. Look for Segment Profiles inside the Segment settings. Select the new Segment profile (per type).

Figure 5-24 Segment profile application

Only one Segment profile can be selected per Segment profile type.

Applying Segment Profiles to Segment Ports Suppose you don’t want to apply a specific Segment profile to a full segment. In that case, you can also apply a Segment profile to one particular parent’s Segment port (Figure 5-25) for more granular control. Only one Segment profile can be selected per Segment profile type.

Figure 5-25 Segment profile to Segment port attachment

When the Segment profile is associated or disassociated from a segment, the Segment profile for the child Segment ports is applied based on the following rules: When the parent segment has a profile, the child Segment port inherits the Segment profile from the parent by default. When the parent segment does not have a Segment profile, a default Segment profile is assigned to the segment. The Segment port inherits that default Segment profile by default. When you explicitly associate a custom profile with a Segment port, this custom (user-created) profile overrides the existing Segment profile. You can associate a custom (user-created) Segment profile with a segment and retain the default Segment profile for one of the child Segment ports. To do this, you must make a copy of the default Segment

profile and associate it with the specific Segment port.

Logical Switching Packet Forwarding This section gives more insight into the different tables involved with packet forwarding between virtual machines on other host transport nodes.

TEP Table Updates The TEP table population process is described in the following steps (Figure 5-26). VM A, B, and C are powered on: 1.

2. 3. 4.

An entry is created with a VNI to TEP IP address mapping on all the transport nodes with the local TEP IP address and VNI (local TEP table). Each host transport node will send the local TEP table entries to the Central Control Plane (CCP) . The CCP will create a TEP table that contains all the VNI to TEP IP address mappings. The CCP will update all host transport nodes with all the VNI to TEP IP address mappings of all the hosts that are part of the VNI where the VMs are.

Figure 5-26 TEP tables

MAC Table Updates The MAC table population process is described in the following steps (Figure 5-27). VM A, B, and C are powered on: 1.

2. 3. 4.

An entry is created with a VNI/MAC/TEP IP address mapping on all the transport nodes with the local TEP IP address VNI and MAC address (local MAC table). Each host transport node will send the local MAC table entries to the Central Control Plane (CCP). The CCP will create a MAC table that contains all the VNI/MAC/TEP IP address mappings. The CCP will update all host transport nodes with all the VNI/MAC/TEP IP address mappings of all the hosts that are part of the VNI where the VMs are.

Figure 5-27 MAC tables

ARP Table The process of the ARP table population is described in the following steps (Figure 5-28). M A, B, and C are powered on: 1.

2. 3. 4.

An entry is created with a VNI/MAC/VM IP address/TEP IP address mapping on all the transport nodes with the local TEP IP address VNI, MAC address, and VM IP address (local MAC table). Each host transport node will send the local ARP table entries to the Central Control Plane (CCP). The CCP will create an ARP table that contains all the VNI/MAC/VM IP address/TEP IP address mappings. The CCP will update all host transport nodes with all the VNI/MAC/VM IP address/TEP IP address mappings of all the hosts that are part of the VNI where the VMs are.

Figure 5-28 ARP tables

Unicast Packet Forwarding Across Hosts When the ARP is resolved, VM A and VM B want to communicate with each other in Figure 5-28. This process is described in the following list (Figure 5-29): 1. 2. 3. 4.

VM A sends traffic to VM B. ESXi A encapsulates the original packets in the GENEVE header. The encapsulated packet is sent to ESXi B. ESXi B decapsulates the GENEVE header and delivers the (original) traffic to VM B.

Figure 5-29 Unicast packet forwarding (when MAC address is known)

Broadcast, Unknown Unicast, and Multicast (BUM) Traffic BUM traffic stands for Broadcast, Unicast, and Multicast (BUM) and it represents traffic flooded from one source virtual machine to all or multiple selected destination virtual machines in the same segment. BUM traffic must be replicated to remote transport nodes when there is a transport node with a TEP interface in another Layer 3 subnet (or on another network). Figure 5-30 shows six host transport nodes, each of which is hosting a virtual machine. These virtual machines are all connected to the same NSX-T segment (web segment VNI = 6000). When VM A needs to send traffic to VM D and the MAC address is not known, BUM traffic needs to be sent/replicated across the physical Layer 3 network (TEP). An ARP request will be broadcasted, and the broadcast is a type of BUM. The following broadcast traffic replication modes are supported: Head replication (also called source) Hierarchical two-tier replication (also called MTEP)

Figure 5-30 Broadcast, Unknown Unicast, Multicast (BUM) traffic

BUM Traffic: Head Replication With head replication, “source-based” replication is performed. The source transport node replicates the BUM packet to all other transport nodes that have a virtual machine that uses the NSX-T web segment (VNI=6000). In Figure 5-31, VM A sends traffic to VM D, and ESXi A initiates the replication to all hosts with a virtual machine that uses the NSX-T web segment (VNI=6000). VM D will reply with its MAC address, and VM A will know where to send the packets.

Figure 5-31 Head replication

BUM Traffic: Hierarchical Two-Tier Replication With hierarchical two-tier “destination,” replication is performed. The source transport node replicates the BUM packet to one single transport node hosted on another TEP subnet. This transport node will hold the MTEP that acts as a proxy TEP responsible for the replication to all other transport nodes in that TEP subnet that have a virtual machine that uses the NSX-T web segment (VNI=6000). In Figure 5-32, VM A sends traffic to VM D, and ESXi A initiates the replication to all the MTEP hosts. In this case, the traffic will be forwarded to ESXi C, which was elected to be the “proxy” with an MTEP, and ESXi C will replicate the packet to ESXi D because ESXi D has a virtual machine that uses the NSX-T web segment (VNI=6000). VM D will reply with its MAC address, and VM A will know where to send the packets.

Figure 5-32 Hierarchical two-tier replication

Summary In this chapter, you learned how logical switching is performed using NSX-T Segments and what the use cases are for logical switching. You have learned that virtual machines are capable of communicating with each other even when they are spread across different hosts using TEP interfaces and GENEVE encapsulation and using TEP, MAC, and ARP tables. You have also learned about different Segment profiles and how to change segment-related parameters using these other Segment profiles. This chapter ended with how BUM traffic is handled using two different replication methods. The next chapter explains more about edge transport nodes and edge clusters and how they are used to host Tier-0 and Tier-1 gateways where the logical segments are typically attached, as discussed in this chapter.

© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021 I. Hoogendoorn, Getting Started with NSX-T: Logical Routing and Switching https://doi.org/10.1007/978-1-4842-6708-0_6

6. Edge and Edge Clusters Iwan Hoogendoorn1 (1) Rotterdam, The Netherlands

This chapter is dedicated to the NSX-T edge transport nodes and NSX-T edge clusters.

Edge Node The NSX-T edge (also referred to as edge transport node) is an essential component of the NSX-T transport zone. Edge transport nodes are appliances that provide pools of capacity that can host distributed routing and non-distributed services. An edge transport node is responsible for hosting the Tier-1 and Tier-0 gateways. Edge nodes are also used to host network and security services like NAT, load balancing, gateway firewalling VPN, and so on through the Tier-0 and Tier-1 gateways. High performance is possible with the support of DPDK, and the edge transport nodes are the direct tunnel endpoints. The control plane of an edge transport node is Linux-based, and the routing table for management and overlay traffic is fully separated because they use dedicated interfaces for this. Edge transport nodes provide: Administrative background Computational power for dynamic routing and network services High availability by using active/active and active/standby operation models to provide resiliency Because the edge transport node typically offers north-south routing,

these are generally deployed close to the network’s border. This can be a border leaf (in the case of a spine/leaf network architecture) or in the DMZ.

Edge Node Cluster The edge node cluster contains edge transport nodes to offer high availability. Also, an edge cluster is mandatory if you plan to configure stateful network and security services provided by an edge transport node (Tier-0 and Tier-1 gateways). A maximum of ten edge transport nodes can be part of one single edge cluster. And an edge transport node can only be part of one edge cluster at a time. The maximum amount of edge clusters that NSX-T can host is 160. The configuration maximums can be different per NSX-T version, and this can be verified from the https://configmax.vmware.com website. One edge cluster can provide ECMP across eight paths northbound. Another cluster can offer tenants centralized services to maximize the network throughput northbound, as ECMP cannot be combined with stateful services.

Edge Node Form Factors Edge transport nodes are provided in two form factors: Virtual machine edge transport nodes Bare metal edge transport nodes It is not possible to mix these two in an edge cluster. So when you create an edge cluster, it can only contain either virtual machine edge transport nodes or bare metal edge transport nodes.

Edge VM Form Factor The Virtual NSX edge transport node is deployed as a virtual machine on an ESXi host hypervisor. The virtual machine edge transport nodes can be installed in different sizes: The small size is for proof-of-concept deployments. The medium size is suitable for a typical production environment and

can support up to 64 hypervisors. The large size is for large-scale deployments with more than 64 hypervisors and deployments that require load balancing. The extra-large size is for environments that require heavy load balancing and advanced services such as URL filtering or L7 firewall.

Edge VM Sizing Table 6-1 lists the Edge VM resource requirements per virtual machine size. Table 6-1 Edge VM Resource Requirements Size

Memory vCPU Disk Space VM Hardware Version

Small

4GB

2

200GB

11 or later (vSphere 6 or later)

Medium

8GB

4

200GB

11 or later (vSphere 6 or later)

Large

32GB

8

200GB

11 or later (vSphere 6 or later)

16

200GB

11 or later (vSphere 6 or later)

Extra Large 64GB

This sizing may be different based on the NSX-T version you are using.

Edge VM Deployment Prerequisites To deploy a virtual NSX edge transport node, you can use an OVA, OVF, an ISO file, and an ISO file together with a Preboot Execution Environment (PXE). The virtual NSX edge transport node can only be deployed on an ESXi hypervisor (so not on KVM), and when you use the PXE boot method with the ISO file, you must encrypt the passwords for root and admin users with SHA-512 encryption. Also, the hostname must not contain unsupported characters or spaces. VMware tools will be installed by default, and this version cannot be removed or replaced with a newer/older version. The virtual NSX edge transport node requires some TCP/UDP/IP ports to be opened to operate correctly. You can find the necessary ports on the https://ports.vmware.com/ website. All virtual NSX edge transport nodes need to be configured with the same NTP server.

Edge VM Interface Considerations

The virtual NSX edge transport node will leverage the physical network interfaces in the ESXi host, and therefore two physical interfaces are required. The virtual interfaces in the virtual NSX edge transport node are four interfaces (Figure 6-1). The first interface is always used for management traffic and therefore needs to be connected to a VSS/VDS/N-VDS port group dedicated to management traffic. The other three virtual interfaces are called fast-path interfaces, and they carry GENEVE overlay traffic and BGP uplink traffic that can be trunk ports to allow flexible use of these interfaces. Note An edge VM can only be attached to VSS port-groups when the edge VM is deployed manually (not through the NSX-T Manager GUI) and registered manually.

Figure 6-1 Fast-path to VSS/VDS/N-VDS connectivity

Note The N-VDS (inside the edge VM) will be used together with the fast-path interfaces to allow GENEVE overlay traffic and uplink traffic.

Edge VM on Multiple Virtual Switches When you host the virtual NSX edge transport node on an ESXi host, this host can have multiple virtual switches (VSS/VDS/N-VDS). Figure 6-2 shows an example of a VSS and VDS, both dedicated for

different purposes. Vmnic0/vmnic1 are parts of a VSS/VDS responsible for management traffic (or maybe even vMotion and VSAN traffic), and vmnic2/vmnic3 are part of the VDS/N-VDS that will carry NSX overlay and uplink network traffic. You can mix and match the virtual switches as long as you have dedicated physical NICs on the ESXi hosts for the virtual switches.

Figure 6-2 Edge VM connected to multiple virtual switches

It is also better to attach the virtual interfaces to a virtual switch that is not prepared by NSX-T. This means that this needs to be a VSS or VDS switch and not an N-VDS switch. Unfortunately, this is not always possible. Note Attaching the virtual edge transport node interfaces to an NVDS (on an NSX-T prepared host) is still a supported and valid option, but less recommended. The networks and VLANS of that switch are responsible for the overlay (GENEVE) traffic that can be the same between the host transport node and the virtual edge transport node (as long as you do not host a virtual edge transport node on an NSX-T prepared switch). This example is given in the next section.

Edge VM on One Virtual Switch When you deploy the virtual edge transport node on an ESXi host

prepared for NSX and have only one VDS/N-VDS, you need to consider some ground rules. In some cases, the virtual edge transport node is also deployed on a Compute host with two physical interfaces available. In this case, you need to make sure that the TEP network/VLAN that is used for your host transport nodes is different from the TEP network/VLAN used for the virtual edge transport nodes. Starting with NSX-T 3.1 and later, this VLAN can be the same. In Figure 6-3, you see that VLAN 10 is used for the host transport nodes and VLAN 20 is used for the virtual edge transport nodes, both for the TEP network connectivity example explained previously. I used the same VLAN (10) for the virtual edge and host transport nodes in the example in Figure 6-3.

Figure 6-3 TEP VLAN assignment across host and edge transport nodes

Note Starting from vSphere 7 and VDS 7, it is only possible to use a VDS that works directly together with NSX-T when you plan a single VDS deployment with two or more physical NICs attached to this VDS.

Deployment Options for Virtual Edge Nodes The edge transport nodes can be installed or deployed using various

methods. The deployment options that can be used to deploy an edge transport node are listed in Table 6-2. Table 6-2 Edge Transport Node Deployment Options Edge Transport Node Type Deployment Option Virtual Machine

• NSX GUI • OVF deployment through the vCenter Server • OVF deployment through the OVF tool command-line utility • ISO file installation • ISO file PXE boot

Bare Metal

• ISO file installation • ISO file PXE boot

Virtual Edge Node Deployment from the NSX-T GUI To create a new virtual edge transport node (Figures 6-4 through 6-8), choose System ➤ Fabric ➤ Nodes ➤ Edge Transport Nodes ➤ Add Edge VM. The parameters described in Tables 6-3 through 6-7 are required to configure a virtual edge transport node.

Figure 6-4 Name and description for the new edge VM Table 6-3 Virtual Edge Transport Node Parameters | Name and Description Parameter

Description

Name

The descriptive name you assign to the virtual edge.

Hostname/FQDN

The hostname or FQDN. The edge transport node will access it and perform management on it.

Description (optional)

A description of the edge transport node.

Form Factor

The size of the edge transport node you want to deploy, depending on the responsibilities and features you will configure.

Advanced Resource The resources you want to allocate on the vCenter Server side. Reservations

Figure 6-5 Credentials for the new edge VM Table 6-4 Virtual Edge Transport Node Parameters | Credentials Parameter

Description

CLI Credentials

You will use the credentials to log in to the NSX GUI and NSX CLI through the console or SSH. The default admin username can be changed here.

Root Credentials

The root password.

Audit Credentials (optional)

The audit credentials are optional but need to be specified when you need a user with read-only access.

Allow SSH Login

When you enable this option, SSH will be enabled on the virtual edge transport node. SSH can also be enabled later.

Allow Root SSH Login

By default, the “root” user will not be allowed to log in through SSH for security reasons, but this setting overrides this.

Figure 6-6 Configure the new edge VM deployment Table 6-5 Virtual Edge Transport Node Parameters | Configure Deployment Parameter

Description

Compute Manager

The Compute Manager is the vCenter Server that holds the resources to deploy this specific virtual edge transport node.

Cluster

The vSphere Cluster where you want to deploy this virtual edge transport node.

Resource Pool When you have vSphere resource pools configured, you can select a resource (Optional) pool for the virtual edge transport node deployment. Host (Optional)

When you want to deploy your virtual edge transport node on a specific host, you can select it here.

Data Store

The data store where you want to deploy the virtual edge transport node.

Figure 6-7 Configure the new edge VM’s node settings Table 6-6 Virtual Edge Transport Node Parameters | Configure Node Settings Parameter

Description

IP Assignment

The management IP address can either be assigned statically or with DHCP.

Management IP/Default Gateway

In case of a static assignment, you need to specify the IP address and default gateway here.

Management Interface

The management interface is the port group that provides networking to the management network.

Search Domain Names

The DNS search domain where that node belongs to the management network.

DNS Servers

The DNS server that can resolve this virtual edge transport node.

NTP Servers

The NTP server that can provide the correct time to this virtual edge transport node. This NTP server needs to be the same across all edge transport nodes.

Figure 6-8 Configure the NSX for the new edge VM Table 6-7 Virtual Edge Transport Node Parameters | Configure NSX Parameter Description Edge Switch Name

The NSX internal N-VDS switch is used to provide overlay networking.

Transport Zone

The transport zones define where this virtual edge transport node can provide network and security services.

Uplink Profile

The uplink profile specifies how the interfaces should behave on the virtual edge transport node. That also defines the load balancing mechanism and the VLAN for the GENEVE overlay network, including the MTU.

Teaming Policy Switch Mapping

The mapping between a port group and the virtual fast-path interfaces of the virtual edge transport node.

Virtual Edge Node Deployment from the vCenter Server GUI If you prefer an edge installation that is more interactive, you can use a

UI-based VM management tool, such as the vSphere Client connected to vCenter Server. Figures 6-9 and 6-10 show you the deploy option through the vSphere Client. The wizard will guide you through the steps so that you can provide the required details. The parameters that you have to enter in the wizards are similar to the parameters you entered when you deployed the virtual edge transport node using the NSX-T GUI. This wizard does not register the virtual edge transport node with the NSX-T management plane. You will need an additional command that needs to be entered on the command-line.

Figure 6-9 Deploy an edge VM using the vSphere Client | Deploy OVF Template

Figure 6-10 Deploy an edge VM using the vSphere Client | Select an OVF Template

You must join the virtual edge transport node with the management plane so they can communicate. When you join the virtual edge transport node to the management plane, it ensures that the edge transport nodes are available from the management plane as managed nodes. Step 1 is to verify that you have the administration privileges to access NSX edge nodes and the NSX UI. When you have this, you can use the CLI to join the NSX edge nodes to the management plane. On the NSX-T Manager, use:

nsx-manager> get certificate api thumbprint 5a343eca5dc659427dc4eb9da05b38c27cc615db104cd85073e95823079c9 On the virtual edge transport node, use:

edge> join management-plane 10.10.10.100 username admin passw VMware1!VMware1! thumbprint 5a343eca5dc659427dc4eb9da05b38c27cc615db104cd85073e95823079c9 The IP address 10.10.10.100 is one of the NSX-T Manager’s addresses, and the thumbprint is the thumbprint you just retrieved from the NSX-T Manager. Also, you need to use your admin user credentials to complete the registration.

Edge VM Resource Reservation When you deploy a virtual edge transport node from the NSX-T GUI, you can select the number of resources you want reserved on the vCenter

Server side. The resource reservation settings can be changed afterward, only when after deploy the virtual edge transport node through the NSX-T GUI. In Figure 6-11, the edge called edge-nsxt-05 was deployed using the NSX-T GUI. You can see that it is possible to change its resources.

Figure 6-11 The Change Edge VM Resource Reservations menu (available)

Figure 6-12 The Change Edge VM Resource Reservations options

In Figure 6-13, the edge-nsxt-01 edge was deployed using an OVF

file directly from the vCenter Server. You can see that it is not possible to change its resources in this case (it’s grayed out).

Figure 6-13 The Change Edge VM Resource Reservations menu (not available)

Bare Metal Edge Form Factor The bare metal edge transport node is typically used when you have high bandwidth throughput requirements. The bare metal edge transport node only supports specific CPU types and has some specific physical NIC requirements. This list of specific requirements can be found on the VMware website in the (official) NSX-T installation guide. When the hardware you are trying to use to set up the bare metal edge transport node is not listed, you will risk that the storage, video adapter, or motherboard components will not work correctly. One essential CPU requirement is that it must have AES-NI1 B huge page capabilities.

Bare Metal Edge and Virtual Switches The bare metal edge transport node can support multiple virtual

switches in the form of N-VDSs. Each N-VDS will require at least two physical NICs, so if you plan on configuring more than one N-VDS, you need multiple physical NICs on your bare metal edge transport node.

Bare Metal Edge Deployment Prerequisites To deploy a bare metal NSX edge transport node, you can use an ISO file for OS installation and the ISO file together with a Preboot Execution Environment (PXE) . The bare metal NSX edge transport node hardware requirements need to be satisfied, and when you use the PXE Boot method with the ISO file, you must encrypt the passwords for root and admin users using SHA-512 encryption. Also, the hostname must not contain strange characters or spaces. The virtual NSX edge transport node requires some TCP/UDP/IP ports to be opened in order to operate correctly. You can find the necessary ports on the https://ports.vmware.com/ website. All virtual NSX edge transport nodes also need to be configured with the same NTP server.

Bare Metal Edge Server Hardware Requirements The hardware requirements for a bare metal edge transport node can be found on this VMware website: https://docs.vmware.com/en/VMware-NSX-T-DataCenter/3.0/installation/GUID-14C3F618-AB8D-427EAC88-F05D1A04DE40.html The hardware requirements may change when the NSX-T version evolves, but this information is publicly available on the NSX-T installation guide; it’s version-specific.

Edge Node Deployment on Bare Metal Edges Manual installation is also available when you install NSX edge nodes on a bare metal server. After you satisfy the requirements mentioned in the previous section, the installation process should start automatically from the installation media using an ISO file. After the bootup and power-on processes are complete, the system requests an IP address manually or with DHCP. The default username is

root and the password is vmware. The admin login password is default. During the setup procedure, you need to enable the interfaces and join the bare metal edge transport node to the management plane. To install the bare metal edge transport node using an ISO file, follow these steps: 1. 2. 3. 4.

Set the system BIOS to Legacy BIOS. Download the NSX edge ISO file and either create a physical CD-ROM or use a method to mount the ISO file to a virtual drive. Boot the physical machine from the ISO. Select Automated Install (Figure 6-14).

Figure 6-14 Bare metal edge transport node installation screen

You can use a PXE Boot to automate the network configuration installation process. The PXE Boot can be used to install NSX edge nodes on a bare metal or virtual platform. PXE Boot automates the installation process of the edge transport nodes. You can preconfigure the deployment with all the required network settings for the edge transport node. The password for root and admin users must be encrypted with SHA-512. Note The PXE method supports only the NSX edge node deployment. It does not support NSX Manager deployments. By using PXE, the networking settings—like the IP address, the default gateway, the network mask, the NTP server, and the DNS server—are automatically set. The PXE Boot process requires several components to be active, including DHCP, HTTP, and TFTP servers, as you can see in Figure 6-15.

Figure 6-15 Edge transport node PXE boot components

Edge Node Settings Settings that you specify during the deployment—like hostname/FQDN, enable SSH, DNS search domains, DNS SERVERS, and NTP servers—can be changed by using the Change Node Settings menu option (Figures 6-

16 Figure 6-17).

Figure 6-16 Edge Transport Node | Change Node Settings menu

Figure 6-17 Edge Transport Node | Change Node Settings

Verifying the Edge After the Initial Deployment (Before Configuration) After you have deployed an edge transport node, you need to perform some validation steps to ensure it is operating properly before you start using it (or configure it if the configuration is not done). The verification steps are listed here: When you enable SSH, you can set up an SSH session to the edge transport node and verify if access is working. You can try to ping the IP management address of the edge transport node. When you’re logged in to the CLI (using the console, SSH session), try these steps: Ping the management subnet’s default gateway where the edge transport node is hosted. Ping some of the ESXi hosts management interfaces (vmk0), which will be part of your NSX-T infrastructure. Ping the DNS and NTP server that is configured on the edge transport node.

Verifying the Edge Transport Node (After Deployment and Registration) When the edge transport node is successfully deployed and configured, you can verify this by looking at the edge overview screen (Figure 6-18). Check out the: Configuration state Node status Switches (N-VDS) NSX version When you click the little “i” (information icon) next to the node status, you will see more information about the given status. When you click the number after the N-VDS, you’ll see information about the attached transport zones.

If you want to validate the data path interfaces, you can edit the corresponding edge transport node.

Figure 6-18 Edge transport node overview screen

Creating an Edge Cluster You need to create an edge cluster for the following reasons: When you have multiple edge transport nodes active, you want to make sure these nodes are highly available. When you want to use stateful services such as NAT, load balancing, VPN, gateway firewalling, and so on. To create a new edge cluster, choose System ➤ Fabric ➤ Nodes ➤ Edge Clusters. To configure your edge cluster, follow these steps: 1. 2. 3.

Click +ADD. Create an edge cluster profile, either before or during the Add Edge Cluster wizard. Include the edge transport node members that will be part of the edge cluster (Figure 6-20).

An NSX edge transport node can be added to only one NSX edge cluster at a time. When the NSX edge cluster (Figure 6-19) is created, you can edit it later and add NSX edge transport nodes.

Figure 6-19 Edge cluster overview screen

Figure 6-20 Edge transport node member selection to be part of the edge cluster

Summary In this chapter, you learned what type of edge transport nodes you can deploy. You also learned the prerequisites for edge transport node deployment and how the actual deployment is done using the proper verification steps. The next chapter explains how to leverage the edge transport nodes to create logical routers and perform north-south and east-west (logical) routing.

© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021 I. Hoogendoorn, Getting Started with NSX-T: Logical Routing and Switching https://doi.org/10.1007/978-1-4842-6708-0_7

7. Logical Routing Iwan Hoogendoorn1 (1) Rotterdam, The Netherlands

Logical routing, when offered by NSX-T, provides a scalable way of doing east-west and north-south routing. This chapter explains the architecture, routing features, and components needed to design and implement a Layer 3 network using NSX-T.

Logical Routing Use Cases Table 7-1 explains the NSX-T logical routing use cases. Table 7-1 Logical Routing Use Cases Use Case

Description

Single-tenant Multiple gateway instances (Tier-0 and Tier-1) support the separation of or multi-tenant tenants; networks are required. routing architecture models. Fully isolated and separated networks between tenant networks. Offer logical Offer routed logical ports up to the container level. routing on applications on containers. Provide logical Support for ESXi host and KVM logical routing across hypervisors without

routing across touching a physical routing gateway. multiple hypervisors and hypervisor types. Simplified and optimized routing paths between virtual segmented networks.

The routing is done closest to the source. Traffic between two virtual machines that have interfaces on different subnets both hosted on the same physical ESXi host can be routed in the hypervisor’s kernel. The network traffic (between the virtual machines) will not need to leave the ESXi host to get routed. This method avoids network traffic hair-pinning on the physical network.

Centralized services are offered in combination with distributed routing.

NSX-T edge transport nodes that host gateways can provide network services (like NAT, load balancing, and gateway firewalling) that cannot be distributed across ESXi hosts.

Extend logical segments to the physical network.

With bridging, it is possible to extend GENEVE overlay networks to VLAN networks on the physical network.

Logical Routing To make logical routing work, the NSX-T Manager Cluster must be in place, the transport zones must be configured, and all hypervisors must be prepared for NSX-T (so they must be marked as a host transport node). All host transport nodes must be attached to a transport zone. The NSX-T edge transport nodes must also be deployed so that north-south connectivity is possible by leveraging Tier-0 gateways (Figure 7-1). Logical routing within NSX-T is fully distributed and decoupled from the physical routing infrastructure, and all forwarding decisions are taken locally on the transport nodes. Next to north-south and east-west connectivity, the gateways also provide centralized services like NAT, load balancing, VPN, and gateway firewalling. It is also possible to create multiple tenant environments (multitenancy) by deploying dedicated gateways and services per isolated tenant that separates the routing architecture.

Figure 7-1 Logical north-south and east-west routing

Tier-0 and Tier-1 Gateways An NSX-T gateway is either a Tier-0 or a Tier-1 gateway. Each gateway has its characteristics, as described in Table 7-2. Table 7-2 Tier-0 and Tier-1 Gateway Characteristics

Tier-0

Tier-1

Provider/Tenant Owned by the provider or infrastructure administrator.

Owned by the tenant.

Routing Protocols

Dynamic routing and static routing are supported with the Border Gateway Routing Protocol (BGP) toward upstream physical network devices.

Does not support dynamic routing.

ECMP Support

Upstream physical network devices Does not support equal-cost multipath support equal-cost multipath (ECMP). (ECMP).

Auto-plumbing is used to send traffic to the connected Tier-0 gateway.

One Tier-1 gateway only has one single connection toward a Tier-0 gateway. Routing Direction

Responsible for east-west and north- Responsible for east-west network south network traffic. traffic.

Edge Cluster Member

Each NSX-T edge transport node needs to be a member of an edge cluster to operate as a gateway between north and south.

When an edge transport node is hosting Tier-1 gateways that offer stateful network services, this edge VM needs to be a member of an edge When you deploy a Tier-0 gateway on cluster. an edge node, that edge node needs An edge node that hosts a Tier-1 GW to be a member of an edge cluster. has to be a member of an edge cluster only if the Tier-1 GW runs stateful services.

Each gateway is configured in a distributed way on each host transport node. The Tier-0 and Tier-1 gateways both support stateful services, such as NAT, load balancing, VPN, and gateway firewalling.

Edge Transport Nodes An edge transport node hosts network and security services that can’t be distributed across host transport nodes (such as NAT, VPN, etc.). An edge transport node can be a virtual machine or a bare metal server, whereby the bare metal server is typically used when high throughput and fast packet switching are required. An edge transport node cannot be part of multiple edge clusters; it can only be part of one single edge cluster. There is also a limitation that an edge transport node can only host one single Tier-0 gateway, So if you require five Tier-0 gateways, you also need to have five edge transport nodes.

Edge Clusters An edge cluster is a group of edge transport nodes that takes care of the high availability of the edge transport nodes inside that edge cluster. When you plan to create a Tier-0 gateway with uplinks toward northbound-facing physical network devices, the edge transport node needs to be inside an edge cluster, even if you only have one single edge transport node (Figure 7-2). When you plan to configure Tier-0 or Tier-1 gateways with stateful services enabled, you also need to place your edge transport node inside an edge cluster. The maximum number of edge transport nodes that are allowed is ten, and it is not possible to mix virtual edge transport nodes (VM) with bare metal (BM) edge transport nodes inside a single edge cluster.

Figure 7-2 Edge cluster

Topologies With NSX-T, you can build various topologies types, as listed in Table 7-3. Table 7-3 NSX-T Topologies Topology Type

Description

Single-tier deployment Single Tier-0 gateway deployment. Multi-Tier deployment Single Tier-0 gateway with multiple Tier-1 gateway deployment.

Single-Tier Topologies In a single-tier topology , all segments will connect to a single Tier-0 gateway (Figure 7-3).

Figure 7-3 Single-tier topology

Multi-Tier Topologies In a multi-tier topology , all segments will connect to a Tier-1 gateway,

where multiple Tier-1 gateways are connected to a single Tier-0 gateway. Figure 7-4 shows the A Tier-1 gateway, where all A segments are connected, and a B Tier-1 gateway, where all B segments are connected.

Figure 7-4 Multi-tier topology

If the provider and tenant do not need to be separated, you can use a single-tier topology. In most use cases, the provider will own the Tier-0 gateway, and the tenants will own the Tier-1 gateway. Cloud Management platforms (CMPs) typically provision the Tier-1 gateways using REST API calls.

Tier-0 Gateway Uplinks Each Tier-0 gateway can have a single uplink toward the physical

network or multiple uplinks. To provide optimal redundancy and throughput, it is better to use multiple uplinks when possible. Figures 7-5 and 7-6 show you the two different configuration options. In Figure 7-5, the Tier-0 gateway has one uplink mapped to one VLAN to connect to the outside world. The redundancy is handled on the physical side using a redundancy routing protocol such as HSRP, VRRP, or Global Server Load Balancing (GSLB). In Figure 7-6, the Tier-0 gateway has two uplinks mapped to different VLANs, with a two-VLAN uplink option and a four-VLAN uplink option.

Figure 7-5 Tier-0 gateway with a single VLAN uplink

Figure 7-6 Tier-0 gateway with multiple VLAN uplinks

In both configurations, the NSX edge cluster contains two NSX edge transport nodes. The Tier-0 gateway deployment can be either active/active (ECMP enabled) or active/standby.

Gateway Components An NSX (virtual) gateway (Tier-1 and or Tier-0) will consist of two main components, the distributed router (DR) and the service router (SR) components (Figure 7-7).

Figure 7-7 Tier-0 and Tier-1 gateway SR and DR components

The DR component: Provides distributed routing for east-west network traffic. Provides basic network packet-forwarding functions. Spans across all host and edge transport nodes. Has a local gateway active on all host transport nodes, so first-hop routing is performed on the host transport nodes. The SR component: Provides routing for north-south network traffic (BGP peering). Is created when a Tier-0 gateway is configured with uplinks to the physical network. Provides dynamic and static routing services, including centralized stateful network and security services like NAT, load balancing, gateway firewalling, etc. Is only created on the edge transport nodes that are part of an edge cluster.

Gateway Components of a Single-Tier Topology Figures 7-8 shows a single-tier topology, where the Tier-0 gateway has two uplinks configured to the physical network. This means the Tier-0 gateway has two SRs (SR1 and SR2), one on each edge node.

Figure 7-8 Single-Tier topology with SR and DR components

When you look at this from the physical view, the DR component of the Tier-0 gateway is distributed across all transport nodes, and the SR components are only located on the edge transport nodes. The Tier-0 gateway is configured as active/active, and because of this, SR1 is located in Edge Transport Node-1, and SR2 is located in Edge Transport Node-2. This would be different if the Tier-0 gateway were configured as active/standby. In that case, SR1 and SR2 would still be located on the corresponding edge transport node, but only one would be active.

Gateway Components of a Multi-Tier Topology Figure 7-9 shows you a multi-tier topology where the Tier-0 gateway has two uplinks configured to the physical world. There are also two Tier-1 gateways that have no configured services and, for this reason, were not configured to be a member of an edge cluster.

Figure 7-9 Multi-tier topology with SR and DR components

For this reason, the Tier-1 gateways will not have any SR components. On the physical view, the DR component of the Tier-0 gateway is visible. The DR component of Tier-1 Gateway A and the DR component of Tier-1 Gateway B are also distributed across all transport nodes. The Tier-0 gateway is configured as active/active, and because of this, SR1 is located in Edge Transport Node-1, and SR2 is located in Edge Transport Node-2. This would be different if the Tier-0 gateway were configured as active/standby. In that case, the SR1 and SR2 would still be located on the corresponding edge transport node, but only one would be active. Figure 7-10 shows a multi-tier topology where the Tier-0 gateway has two uplinks configured to the physical world, just like in the previous figure. The two Tier-1 gateways this time have some services configured, and therefore, each of these gateways will have an SR component.

Figure 7-10 Tier-1 gateways with the edge clusters configured (non-DR-only)

On the physical view, the DR component of the Tier-0 gateway is visible. The DR component of Tier-1 Gateway A and the DR component of Tier-1 Gateway B are distributed across all transport nodes. The Tier-0 gateway is configured as active/active, and because of this, SR1 is located in Edge Transport Node-1, and SR2 is located in Edge Transport Node-2. This would be different if the Tier-0 gateway were configured as active/standby. In that case, the SR1 and SR2 would be located on the active edge transport node. The difference with Figure 7-10 is that both Tier-1 gateways are configured with some services, and because of this, each gateway has an SR component. These are configured as active/standby. In the example in Figure 7-10: Edge Transport Node A is the active node for Tier-1 Gateway A Edge Transport Node B is the active node for Tier-1 Gateway B Note It could also be that Edge Transport Node A is only active for the Tier-1 Gateway B or the other way around.

Gateway Interfaces When we deal with Tier-1 and Tier-0 gateways and their corresponding

DR and SR components, the different interfaces have different names for future reference (Table 7-4 and Figure 7-11). Table 7-4 Gateway Interface References Interface/Link Purpose Name Uplink Interface

The uplink interface provides connections to the external physical network. VLAN and overlay interface types are supported. The uplink interface is where the external BGP peering can be established between the Tier-0 gateway and your physical L3 BGP peer. External service connections, such as IPSec VPN and load balancing, can also be used through this uplink interface.

Downlink Interface

The downlink interface is responsible for the connectivity of workload networks (where your virtual machines are running) to the routing infrastructure. A downlink interface is configured to connect to a segment. This interface is the default gateway for the virtual machines hosted on that subnet/segment.

RouterLink Port

The RouterLink port is the interface that provides connectivity between Tier-0 and Tier-1 gateways. This interface is created automatically when Tier-0 and Tier-1 gateways are connected. It uses a subnet assigned from the 100.64.0.0/10 IPv4 address space by default.

Intra-Tier Transit Link

The intra-tier transit link is also automatically created when a service router is created for whatever reason. This link is an internal link between the DR and SR components on a gateway. The intra-tier transit link has an IP address in the 169.254.0.0/28 subnet range by default.

Service Interface

The service interface is a special port to enable services for VLAN-based networks. North-south service insertion is another use case to connect a partner appliance and redirect north-south traffic for third-party partner services. Service interfaces are supported on both active/standby Tier-0 and Tier-1 gateways. Firewall, NAT, and VPNs are supported on this interface. The service interface is also a downlink.

Figure 7-11 Gateway interface references

Gateway Configuration Steps To achieve full network connectivity between the NSX-T segments and the physical network, the following configuration steps need to be performed from a high level: Table 7-5 Full network connectivity between the NSX-T segments and the physical network configuration Step Description 1

Create a Tier-1 gateway.

2

Create the segments where you place your virtual machines.

3

Attach the segments to the Tier-1 gateway.

4

Create the (VLAN) uplink segments.

5

Create a Tier-0 gateway with uplink connections.

6

Configure routing on the Tier-0 gateway (dynamic or static).

7

Attach the Tier-1 gateway to the Tier-0 gateway.

8

Configure route advertisement and route redistribution on the Tier-1 and Tier-0 gateways.

9

Verify that the routes are advertised on the Tier-1 gateway, Tier-0 gateway, and physical network.

The order of these configuration tasks can be different depending on the state of your environment. In some cases, you want or need to create the Tier-0 gateway first, and then create the Tier-1 gateways and segments and attach them in a different order. Before a Tier-1 and/or Tier-0 gateway can be created, the following items must be in place or have the right state: Your NSX-T management cluster is stable. At least one NSX edge transport node is installed and configured. An NSX edge cluster is configured with assigned edge transport nodes. The Tier-1 and Tier-0 gateways are not automatically connected when you create them the first time. The management plane cannot determine by default what Tier-1 gateway should connect to what Tier-0 gateway, so you must manually connect them after their creation. After

manually connecting these gateways, the management plane injects the configured routes into the gateways to establish the proper connectivity between the two tiers on the NSX-T level.

Tier-1 Gateway This section explains how to create a Tier-1 gateway, attach some segments to it, and run some verification steps.

Creating a Tier-1 Gateway To create a Tier-1 gateway (Figures 7-12 and 7-13), choose Networking ➤ Connectivity ➤ Tier-1 Gateways ➤ Add Tier-1 Gateway. The parameters that you need to specify are explained in Table 7-6.

Figure 7-12 Add a Tier-1 gateway (1)

Figure 7-13 Add a Tier-1 gateway (2) Table 7-6 Tier-1 Gateway Configuration Parameters Parameter Type

Parameter

Tier-1 Gateway Name

This is the “friendly” name that you use to refer to the Tier-1 gateway.

Linked Tier-0 When you have a Tier-0 gateway ready to attach to this Tier-1 gateway, Gateway (optional) you can select this here. This can also be done later. Edge Cluster (optional)

When you plan stateful network and security services on the Tier-1 gateway, you need to select an edge cluster.

Attach a Segment to a Tier-1 Gateway To attach a Tier-1 gateway (Figure 7-14), choose Networking ➤ Connectivity ➤ Segments. Then choose Add Segment or select an existing segment. Use the Connectivity parameter to select a Tier-1 gateway.

Figure 7-14 Attach a Tier-1 gateway to a Tier-0 gateway

East-West Connectivity Test To verify that east-west connectivity is possible, you can perform ping tests between the virtual machine segments. In the example in Figure 7-15, you see three segments with three corresponding subnets (Table 7-7).

Figure 7-15 Subnets for east-west connectivity Table 7-7 Subnets for East-West Connectivity Segment

Subnet

Gateway IP Virtual Machine IP

WEB-SEGMENT-A 172.16.10.0/24 172.16.10.1 172.16.10.100 APP-SEGMENT-A 172.16.20.0/24 172.16.20.1 172.16.20.100 DB-SEGMENT-A

172.16.30.0/24 172.16.30.1 172.16.30.100

To provide a full connectivity test, you can try to ping between the virtual machine IP addresses and look for a successful reply.

Tier-0 Gateway This section explains how to create a Tier-0 gateway, attach it to a Tier-1 gateway, enable route advertisements and route redistribution, and then run some verification steps.

Creating a Tier-0 Gateway To create a Tier-1 gateway (Figures 7-16 and 7-17), choose Networking

➤ Connectivity ➤ Tier-0 Gateways ➤ Add Tier-0 Gateway. The parameters that you need to specify are explained in Table 7-8.

Figure 7-16 Add a Tier-0 gateway (1)

Figure 7-17 Add a Tier-0 gateway (2) Table 7-8 Tier-0 Gateway Configuration Parameters Parameter Type

Parameter

Tier-0 Gateway Name

This is the “friendly” name that you use to refer to the Tier-1 gateway.

HA Mode

Active/active.

Edge Cluster

When you configure the uplink interface on the Tier-0 gateway, you need to select an edge cluster.

To configure the interfaces and other configurable items, you first need to click Save, and then you will be presented with the question shown in Figure 7-17 to continue the configuration.

Tier-0 Gateway Uplink Segment Creation To communicate with the physical network, uplink interfaces that are VLAN backed are required. Figure 7-18 is using VLAN 31 and VLAN 32 in the uplink segments (Figure 7-19).

Figure 7-18 Tier-0 gateway 2 VLAN uplinks

Figure 7-19 Segments for Tier-0 uplink configuration

When the uplink segments are created, you can use them when you create the interfaces in the Tier-0 gateway. To create the uplink interfaces on the Tier-0 gateway, choose Networking ➤ Connectivity ➤ Tier-0 Gateways. Then click the three dots next to the existing Tier-0 gateway and select Edit. Finally, click the Interfaces number (in the Interfaces section) to configure the interfaces. Figure 7-20 shows I have configured four interfaces because I have two edge host transport nodes. Each edge transport node needs to have connectivity across both VLANs because the Tier-0 gateway runs in active/active mode in this case.

Figure 7-20 Tier-0 gateway uplink interface configuration summary

Table 7-9 contains the configuration details needed to make this work. Table 7-9 Tier-0 Gateway Uplink Interface Configuration Details Uplink Name IP Address

Edge Transport Node VLAN

E-01-UP-01 10.11.31.1/24 Edge Transport Node 1 31 E-01-UP-02 10.11.32.1/24 Edge Transport Node 1 32 E-02-UP-01 10.11.31.2/24 Edge Transport Node 2 31 E-02-UP-02 10.11.32.2/24 Edge Transport Node 2 32

Routing on a Tier-0 Gateway When configuring the Tier-0 gateway, routing is an important setting

that you need to configure so you can route traffic in and out of the NSXT virtual network. You can configure static routing or dynamic routing using BGP. When you go to the Routing section while configuring the Tier-0 gateway, you can see the settings for configuring static routes. For the BGP configuration, there is a separate (BGP) configuration section (Figure 7-21).

Figure 7-21 Tier-0 gateway routing configuration

Attaching a Tier-1 Gateway to a Tier-0 Gateway When the Tier-0 gateway is ready, you can attach the Tier-1 gateway (Figure 7-22) to it to allow routing between your segments and the physical network.

Figure 7-22 Tier-1 gateway to Tier-0 gateway attachment

To attach a Tier-1 gateway to a Tier-0 gateway, choose Networking ➤ Connectivity ➤ Tier-0 Gateways. Then click the three dots next to the existing Tier-1 Gateway and select Edit. Finally, select a Tier-0 gateway using the Linked Tier-0 Gateway dropdown.

Route Advertisement on the Tier-1 Gateway to a Tier-0 Gateway To advertise the connected routes/networks (segments) toward the Tier-0 gateway, you need to enable this setting by going to the Route Advertisement section in the Tier-1 gateway settings. The All Connected Segments & Service Ports switch needs to be enabled (Figure 7-23). Figure 7-24 shows the route advertisement on the Tier-1 gateway.

Figure 7-23 Route advertisement on the Tier-1 gateway configuration

Figure 7-24 Route advertisement on the Tier-1 gateway

Route Redistribution on the Tier-0 Gateway to the Upstream Routing Device Now that the Tier-0 gateway receives the routes from the Tier-1 gateway, it is also essential to make sure that the routes are sent toward the physical network. This can be done with a static default route, whereby you configure a static route (Figure 7-25).

Figure 7-25 Configure a static route for route redistribution toward the physical network

Another way is to use the BGP dynamic route distribution to redistribute the Tier-1 gateway networks. When you want to redistribute the received routes (using BGP) from the Tier-1 gateway (Figure 7-26), you need to enable route redistribution by checking the Connected interfaces & Segments box in the Advertised Tier-1 Subnets section. See Figure 7-27.

Figure 7-26 Configure route redistribution on the Tier-0 gateway

Figure 7-27 Route redistribution on the Tier-0 gateway (of the connected Tier-1 gateway segments)

Note Figure 7-26 does not advertise the static default route I configured in Figure 7-25. When you want to advertise this default static route, the Static Routes checkbox needs to be selected in the Tier-0 Subnets section.

North-South Connectivity Test To verify that north-south connectivity is possible, you can perform ping tests between the virtual machines segments and a device that is hosted outside the NSX-T domain. In the example in Figure 7-28, you see three segments with three subnets described in Table 7-10 and one VLAN subnet defined in Table 7-11.

Figure 7-28 Routing from the virtual NSX-T segments to the physical VLAN subnet Table 7-10 NSX-T Segment Subnets for North-South Connectivity Segment

Subnet

Gateway IP Virtual Machine IP

WEB-SEGMENT-A 172.16.10.0/24 172.16.10.1 172.16.10.100

APP-SEGMENT-A 172.16.20.0/24 172.16.20.1 172.16.20.100 DB-SEGMENT-A

172.16.30.0/24 172.16.30.1 172.16.30.100

Table 7-11 VLAN Subnet for North-South Connectivity VLAN Subnet 192

Gateway IP

Physical Server IP

192.168.10.0/24 192.168.10.254 192.168.10.100

To provide a full connectivity test, you can try to ping between one of the virtual machine IP addresses and the physical server and look for a successful reply.

Static and Dynamic Routing On the Tier-0 gateway, it is possible to configure static and dynamic routing. If you choose static routing, you should keep in mind the following: A static route needs to be configured manually, and this adds more administrative overhead. The configuration process requires fine-tuning. Route changes are not processed automatically, so manual intervention is required when there’s a change and therefore they are not very scalable. Network administrators must take additional measures to design for failure scenarios, and route redundancy must be configured manually. This can introduce intricate network designs. If you choose dynamic routing, you should keep in mind the following: Dynamic routing enables the Tier-0 gateway and physical network devices to dynamically exchange routes, lowering the administrative overhead. Whenever there is a change to the network, the updates occur dynamically.

Tier-0 Gateway Routing Configurations

The types of Tier-0 gateway routing configuration that are supported are shown in Figure 7-29.

Figure 7-29 Tier-0 gateway routing configuration

For BGP peers between the Tier-0 gateways, eBGP is supported. For BGP peers between Tier-0 gateways, iBGP is supported.

Static Routing on a Tier-0 Gateway Let’s assume you have the topology shown in Figure 7-30.

Figure 7-30 Tier-0 gateway static routing

To allow communication between the NSX-T segments and allow the 172.16.x.x subnets to communicate with 192.168.x.x subnets, the Tier-0 gateway needs to know how to reach this 192.16.x.x subnet. This can be done using static routing, and a static route needs to be configured. When you want to add a static route to the Tier-0 gateway, you need to browse to Tier-0 using the instruction given previously. First, you select the gateway and select Edit to add the route (Figure 7-31).

Figure 7-31 Add a static route to a Tier-0 gateway (1)

Then you go to the routing section and click Set next to the Static Routes label (Figure 7-32).

Figure 7-32 Add a static route to a Tier-0 gateway (2)

Click Add Static Route and specify a name and a network (Figure 733).

Figure 7-33 Add a static route to a Tier-0 gateway (3)

When this is done, click Set Next Hops to specify the IP address (or next hop) that knows where to reach the 192.168.x.x network (Figure 734).

Figure 7-34 Add a static route to a Tier-0 gateway (4)

Once you have added the next hop, click Save (Figure 7-35).

Figure 7-35 Add a static route to a Tier-0 gateway (5)

Now you should be able to communicate between the NSX-T segments and the physical server. Note On the physical router, you also need to have static routes in place for the 172.16.x.x networks pointing to the 10.10.10.1 IP address. This is so that the physical server knows its way back to the NSX-T segments.

Dynamic Routing on a Tier-0 Gateway with BGP Let’s assume you have the topology shown in Figure 7-36.

Figure 7-36 Tier-0 gateway dynamic (BGP) routing

Notice that the topology is the same as the one we used for static routing, but now we are using BGP to exchange the network information dynamically. First, select the gateway and choose Edit to add the BGP configuration (Figure 7-37).

Figure 7-37 Configure BGP on a Tier-0 gateway (1)

Then go to the BGP section. Configure the Local AS and enable BGP with the switch (Figure 7-38).

Figure 7-38 Configure BGP on a Tier-0 gateway (2)

When this is done, you can click the BGP Neighbors number to configure the neighbors (Figure 7-39). In this case, the 10.10.10.2 (physical router) will be Tier-0 gateway’s BGP neighbor.

Figure 7-39 Configure BGP on a Tier-0 gateway (3)

When this is configured correctly, you can click Save and then Close. Note On the physical router, you also need the BGP peer configured toward the Tier-0 gateway using the 10.10.10.1 IP address and the 65001 BGP AS number.

Verifying Dynamic Routing on a Tier-0 Gateway

with BGP When you have configured BGP correctly, you should verify that the BGP neighbor is up with the GUI (Figure 7-40).

Figure 7-40 Tier-0 BGP peer configuration verification

You can also use the CLI by logging in to one of the edge transport nodes hosting the Tier-0 gateways with the following commands: First, log in with SSH or directly into the console and verify which logical routers are configured.

Pod-110-T0-EdgeVM-01> get logical-router Wed Nov 18 2020 UTC 13:39:47.805 Logical Router UUID VRF LRID Name Type Ports Neighb 736a80e3-23f6-5a2d-81d6bbefb2786666 0 0 TUNNEL bfbbe637-719c-47cf-9ef3-fc5dbf26cdad 3 1037 SRT0 SERVICE_ROUTER_TIER0 6 0/50000 Then hop into the VRF that is responsible for the Tier-0 gateway. Pod-110-T0-EdgeVM-01> vrf 3 Pod-110-T0-EdgeVM-01(tier0_sr)> When you are in the VRF, verify that the peer is established. Pod-110-T0-EdgeVM-01(tier0_sr)> get bgp neighbor summary BFD States: NC - Not configured, DC - Disconnected

AD - Admin down, DW - Down, IN - Init, UP - Up BGP summary information for VRF default for addressfamily: ipv4Unicast Router ID: 10.10.10.2 Local AS: 65001 Neighbor AS State Up/DownTime InMsgs OutMsgs InPfx OutPfx 10.10.10.1 never

65000 NC 0

Activ 0

0

BFD

0

Wed Nov 18 2020 UTC 13:41:34.099 Pod-110-T0-EdgeVM-01(tier0_sr)>

Advanced Dynamic Routing on a Tier-0 Gateway The NSX-T BGP configuration also offers advanced configuration items (Figure 7-41), such as: Inter-SR Routing Bidirectional Forwarding Detection (BFD) Allowas-in Multipath Relax Route Filtering using: IP prefix lists Community lists Route maps Route Aggregation

Figure 7-41 Advanced (BGP) routing settings

Inter-SR Routing with iBGP When Inter-SR iBGP is enabled, routing is possible using iBGP between different SR instances in the same Tier-0 gateway. Inter-SR routing will increase the resiliency by avoiding traffic blackholes when one single uplink is not working. The SR components will synchronize eBGP and static routes automatically. Note Inter-SR routing is only applicable with active/active Tier-0 gateways. Inter-SR routing (Figure 7-42) uses automatically generated internal interfaces using the IP address range of 169.254.0.x by default, and there is no Time to Live (TTL) decrement for traffic going over this interface.

Figure 7-42 Inter-SR routing

The auto-plumbed iBGP peer cannot be changed, and only one iBGP neighbor is automatically created for carrying IPv4 and IPv6 routes. If required, connected and user-defined static routes can be redistributed in BGP. Note Inter-SR routing is not supported when you are running iBGP toward your physical network northbound.

Bidirectional Forwarding Detection (BFD) BFD (Figure 7-43) is a protocol that detects forwarding path failures.

Figure 7-43 Bidirectional forwarding detection (BFD)

BFD provides the detection if an edge transport node, a physical router of an uplink, fails. BFD protects both static routes and BGP peers by establishing multiple BFD sessions across multiple links between virtual and physical devices. BFD can be enabled per BGP peer/neighbor or on a per-gateway basis.

Turning BFD On You can enable BFD (Figure 7-44) per BGP peer/neighbor and/or globally per gateway.

Figure 7-44 Enable bidirectional forwarding detection (BFD) (1)

The protocol timer for BFD can be fine-tuned based on your environmental needs. These timers are set in milliseconds, and for a virtual machine edge transport node, the minimum value is 1000 milliseconds. For a bare metal edge transport node, the minimum value is 300 milliseconds. To enable BFD for the entire Tier-0 gateway, switch to Manager mode. Then choose Networking ➤ Tier-0 Gateways. Then select a Tier-0 gateway and choose Routing ➤ BFD ➤ Edit (Figure 7-45).

Figure 7-45 Enable bidirectional forwarding detection (BFD) (2)

BGP Allowas-in Each Tier-0 gateway has a local BGP AS number. By default, BGP drops receive routes with that same local AS number to avoid routing loops. When a customer has two sites using the same Internet Service Provider (ISP) , it can be that routes are received from one of its peers that contain the same AS number. When you enable BGP Allowas-in on the Tier-0 gateway, you override this feature, and you allow the routes coming from the same (local) AS number in. Figure 7-46 shows two sites, Site A and Site B, and both sites belong to AS Number 65100. Both sites are connected to the same ISP with AS number 65200.

Figure 7-46 Allowas-in

Gateway 1 advertises the 10.10.10.0/24 network prefix to the ISP. The ISP gateway advertises 10.10.10.0/24 to Gateway 2. The AS number is recorded in the AS path in the advertised prefix as the prefix traverses each AS number. In this case, the 10.10.10.0/24 prefix will get the AS path 65100 => 65200. By default, Gateway 2 will not accept the 10.10.10.0/24 prefix because Gateway 2 has the AS number 65100, and the 10.10.10.0/24 prefix has the 65100 AS number in the path. 10.10.10.0/24 is a valid network residing at another site (Site A). So in this case, you can enable the Allowas-in feature so that Gateway 2 will accept the route advertisement 10.10.10.0/24 with its own ASN 65100 in the AS path.

Turning Allowas-in On You can enable Allowas-in under the BGP neighbor configuration (Figure 7-47).

Figure 7-47 Enable the allowas-in switch

Multipath Relax When an application needs to be reachable across multiple paths, the same prefix should also be advertised from numerous gateways. From the perspective of the receiving gateways, the AS path may be different in terms of AS numbers, but the AS path length is still the same. When this is the case, BGP supports load sharing when the path length is the same. This feature is called multipath relax, or multipath multiple-AS. This feature enables equal-cost multipath (ECMP) routing across different neighboring AS numbers if all other attributes—like weight, local preference, and so on—are equal for both paths. Figure 7-48 shows that Gateway 1 advertises the 10.10.10.0/24 network prefix in AS 65100 to its BGP routing peers. Gateway 2 has its own local AS set to 65200, and Gateway 3 has its own local AS set to AS 65300.

Figure 7-48 Multipath relax

Gateways 2 and 3 advertise the 10.10.10.0/24 prefix to Gateway 4 in AS 65400. Gateway 4 has two paths that it can use to reach the 10.10.10.0/24 network: Through AS 65200 using Gateway 2 (65400, 65200, 65100) Through AS 65300 using Gateway 3 (65400, 65300, 65100) Both paths have equal cost and path length. Without multipath relax, Gateway 4’s BGP process chooses only one

path to reach the 10.10.10.0/24 network. With ECMP and multipath relax enabled, Gateway 4 will use both paths, resulting in a more balanced traffic load.

IP Prefix Lists An IP prefix list is typically a collection of subnets and IP addresses. By using these prefix lists, you can control these subnets (and/or IP addresses). You either allow them or deny them as part of the advertising and acceptance process within a gateway. You can use IP prefix lists for various purposes, such as for BGP filtering. Using the le (less than or equal to) or ge (greater than or equal to) prefix modifications, you can limit or extend the subnet or IP range. For example, you can add the 192.168.10.3/24 IP address to an IP prefix list and deny the route to be redistributed to the physical network. This will advertise all subnets except for the 192.168.10.3/24 IP address.

IP Prefix List Configuration You can add IP prefix lists under the Routing section by clicking the number next to the IP Prefix Lists label (Figure 7-49).

Figure 7-49 Configure the IP prefix list (1)

When you click the number, you will be presented with a list of the

prefix lists that are already available. The default prefix list should be listed (Figure 7-50).

Figure 7-50 Configure IP prefix list (2)

Click Add to add a new IP prefix list (Figure 7-51).

Figure 7-51 Configure IP prefix list (3)

Click Set to define the prefixes (Figure 7-52).

Figure 7-52 Configure IP prefix list (4)

Once the prefix is created, you can use it later when you are configuring a route map.

Communities and Community Lists We can use communities to “tag” a set of selected BGP routes that share the same properties. These shared properties are not dependent on the routes, the network, AS number, or physical boundaries. The community tag is included as a path attribute inside the BGP update message. With these community attributes, gateways can perform specific actions on these “tagged” routes, like filtering them. Filtering based on community tags is faster than using long lists of individual permits or deny statements (prefix lists with route maps) and eliminates the need for iterating. A specific route (entry) can belong to multiple communities. The community is identified by a four-byte value. The first two bytes are the AS number, where the community tagged routes originate from, and the last two bytes are a unique number. When you don’t want to have the four-byte community, you can use predefined communities (Table 7-12). Table 7-12 NSX-T Predefined BGP Communities Predefined BGP Community

Purpose

INTERNET

Advertise the prefix to all BGP neighbors. A route belongs to this community if not specified otherwise.

NO_ADVERTISE

Prevent advertisements from advertising the route to any iBGP or eBGP peers.

NO_EXPORT

Prevent advertisements to external peers (eBGP) outside the BGP confederation.

NO_EXPORT_SUBCONFED Prevent advertisements to any eBGP peers.

A confederation consists of multiple AS numbers. To the external world, a confederation appears as a single AS number. A community list is a group of these BGP communities that can influence routes that are part of these subsets of communities. In Figure 7-53, Gateway 2 receives a route for the 10.10.10.0/24 network from Gateway 1 with the NO_ADVERTISE community tag. Gateway 2 will not advertise this 10.10.10.0/24 route to Gateway 3.

Figure 7-53 Routing topology with BGP communities

Community List Configuration You can add community lists under the Routing section by clicking Set next to the Community Lists label (Figure 7-54).

Figure 7-54 Configure a community list (1)

You can add a community or multiple communities to form the list (Figure 7-55).

Figure 7-55 Configure a community list (2)

Once the community list is created, you can use it later when configuring a route map. You can also add custom communities, where the first part of the community is always the BGP AS number (Figure 7-56).

Figure 7-56 Configure a community list (3)

Figure 7-57 AS path prepend

Route Maps A route map is an advanced access list with matching criteria and actions. You can use route maps for other purposes, but they are primarily used for route advertisement and receiving routes. A route map is responsible for how routes are advertised or received.

With the help of IP prefix lists and/or community lists, specific routes can be permitted or denied. Route maps can also use the BGP path attributes described in Table 7-13 (AS Path Prepending, MED, Weight, Community, and Local Preference) to influence specific routes. Table 7-13 BGP Path Attributes BGP Path Attribute

Description

AS Path Prepend (Figure 757)

AS Path prepending is a method to influence the AS-Path attribute of a BGP route. It allows prepending multiple entries of AS numbers to a BGP route. By adding AS numbers in the path, you make a path less attractive when the same route is received from another peer with fewer AS numbers.

MED (Figure 758)

The Multi Exit Discriminator (MED) attribute is responsible for influencing routes from another AS if there are multiple entrance points toward the AS number. BGP follows a specific order to choose the best path to a particular route received by BGP. Before MED is considered, BPP uses weight, local preference, originated route, and AS path (prepending). When any of these criteria match, the MED attribute will not be considered. When all other attributes are equal, the exit point with the lowest MED is preferred. The BGP MED attribute says to the external neighbors, “How to Enter AS.”

Weight (Figure 759)

The weight attribute is locally significant to the routing peer in the routing topology. Weight is typically configured on a BGP peer. The higher the value, the more preferred the route.

Community A BGP community is an additional information piece that can add to one or more (Figure 7- prefixes advertised to specific BGP neighbors. The configured BGP community lists 53) are used to define this additional information. Local Preference (Figure 760)

A local preference can be configured to your autonomous system to select or prefer a specific exit point. Instead of configuring weight on each router and peer, you can use local preference because this is exchanged on all (internal) BGP routers. The BGP Local Preference attribute says to the internal neighbors, “How to Exit AS.”

Figure 7-58 Multi-exit discriminator (MED)

A route map includes a sequence of IP prefix lists or community strings, BGP path attributes, and an associated action that is either PERMIT or DENY. The gateway scans the sequence for an IP address or IP subnet to match. When there is a hit, the gateway acts, and the scanning stops (top-down). When a route map is configured, this can be referenced where you configure the BGP neighbors for route redistribution. When IP prefix lists are used in the route maps, and the route map action is applied to DENY or PERMIT, the action specified in the route map sequence overrides the order in the IP prefix list.

Route Map Configuration You can add route maps under the Routing section by clicking Set next to the Route Maps label (Figure 7-61).

Figure 7-59 Weight

Then you add the route map’s name and click Set to specify the criteria (Figure 7-62).

Figure 7-60 Local preference

Figure 7-61 Route map configuration (1)

Figure 7-62 Route map configuration (2)

Figure 7-63 Route map configuration (3)

Figure 7-64 Use a route map for BGP advertisement (1)

Figure 7-65 Use a route map for BGP advertisement (2)

Figure 7-66 Use a route map for BGP advertisement (3)

Figure 7-67 Route aggregation

Figure 7-68 Configure route aggregation (1)

Figure 7-69 Configure route aggregation (2)

Figure 7-70 Equal-cost multipath routing (ECMP)

Figure 7-71 Equal-cost multipath routing (ECMP) configuration

Figure 7-72 Active/active mode

Figure 7-73 Active/standby mode

Figure 7-74 Failure detection methods

Figure 7-75 BFD failover scenario (1)

Figure 7-76 BFD failover scenario (2)

Figure 7-77 BGP failover scenario (1)

Figure 7-78 BGP failover scenario (2)

Figure 7-79 Single-tier routing | packet walk

Figure 7-80 Single-tier routing | packet walk | return traffic

Figure 7-81 Multi-tier routing | packet walk

Figure 7-82 Multi-tier routing | packet walk | return traffic

Choose an IP prefix (list) or a community list to act on. You can then specify the BGP path attributes to apply (Figure 7-63).

Using Route Maps for BGP Route Advertisements Once you have created the route map, you can apply it to a specific BGP peer by browsing to the BGP peer and clicking the number under the Route Filter label (Figure 7-64). You can apply the route map to incoming (in filter) or outgoing (out filter) traffic (Figure 7-65). When you have selected a direction, you can choose the route map (Figure 7-66).

Using Route Aggregation With route aggregation, you can aggregate routes from different AS

numbers into a single route and advertise this “summary route” to another AS. A prerequisite for this is that the subnets must be adjacent. The benefit of this is that you reduce the routing tables’ size and reduce the number of advertised routes. Figure 7-67 shows that Gateway 1 is advertising the 10.10.10.0/25 prefix, and Gateway 3 is advertising the 10.10.10.128/25 prefix. Both gateways advertise this route to Gateway 2. When route aggregation is enabled on Gateway 2, it will aggregate the 10.10.10.0/25 and 10.10.10.128/25 prefixes into one single 10.10.10.0/24 prefix and advertise that to Gateway 4.

Route Aggregation Configuration You can configure route aggregation under the BGP section by clicking Set next to the Route Aggregation label (Figure 7-68). You can add a custom prefix that you want to readvertise (Figure 769).

Equal-Cost Multi-Path Routing Equal-cost multipath (ECMP) will provide more bandwidth toward your northbound physical network by combining multiple uplinks and offering traffic load balancing across the available links. ECMP will use all available uplinks for data transfer. If one link is not available, it will automatically detect that and only use the available links. The main benefits are: Load sharing (more bandwidth) Faster convergence (redundancy) By default, if a router receives multiple routes to a prefix all with the same cost, it installs only one of the routes in the routing table. By enabling ECMP, all equal-cost routes (in our case up to eight) will be installed in the routing and forwarding tables. ECMP supports a maximum of eight paths, and the hashing is based on the source and destination IP address of the packet. ECMP is only available on the Tier-0 gateway and not on the Tier-1 gateway. When you enable ECMP, all the routes provided by the NSX or physical network side will be present in the routing tables, so no

network routing reconvergence is necessary in order to detect net routing paths. Figure 7-70 shows that the 10.10.10.0/24 network is advertised from two separate physical routers toward the Tier-0 gateway. With ECMP enabled, the Tier-0 gateway will have two route entries for the 10.10.10.0/24 network in the routing table. If ECMP is not enabled, BGP would select the best route and place only one entry in the routing table. It will leave the other entry in the BGP table.

Equal-Cost Multi-Path Configuration You can configure ECMP under the BGP section by enabling it (Figure 771).

High Availability When you group edge transport nodes, you will use high availability for edge node services.

High Availability Configuration The service router (SR) runs on an edge node and has two modes of operation: active/active or active/standby (Table 7-14). Table 7-14 High Availability Modes Mode

Specification

Active/Active

The edge transport nodes are both active, and the gateway services are hosted on each gateway simultaneously. The load is distributed across the available edge transport nodes in the cluster to prevent one of the nodes from using too many resources. Logical routing is active on more than one NSX edge node at a time, and this mode is only supported with Tier-0 gateways.

Active/Standby Only one edge transport node is active at the time, and the other one inside the edge cluster will remain standby. One standby edge transport node will only be active when the active edge transport node fails. Logical routing is active on only one NSX edge node at a time, and this mode is supported with Tier-0 and Tier-1 gateways.

Active/Active Mode With active/active, the gateway is hosted on more edge transport nodes at the same time. In active/active mode, traffic is load-balanced across all edge transport node members. When one edge transport node fails, the traffic is not disrupted, but the bandwidth is constrained. Active/active mode is used in a Tier-0 gateway by default. When you run active/active mode, ECMP can be enabled. When you use stateful services such as NAT and firewall, you cannot use active/active mode, but it supports stateless services such as reflexive NAT. A gateway can span across eight edge transport nodes. Figure 7-72 shows that the SR is active on both edge transport nodes and that the virtual machines in the host transport node (hypervisor) can travel across all available paths.

Active/Standby Mode With active/standby mode, the gateway is hosted on one edge transport node. Tier-1 gateways can only be deployed in this mode (when they run in an edge cluster). This mode is required when the following stateful services are enabled. Edge (gateway) firewall Source NAT (SNAT) and destination NAT (DNAT) VPN (L2 and L3) Load balancer (only available on Tier-1 gateway) DNS relay DHCP server Stateful services require tracking of the connection states. For this reason, traffic for a given session must go through the same edge node. The active/standby mode is supported on both Tier-1 and Tier-0

service routers (SRs). For Tier-0 gateways, the following rules apply: The active/standby service routers (SR) will have different IP addresses on the northbound facing network and will have eBGP sessions established using both links. The gateway state is synchronized between the edge transport nodes but does not actively forward traffic. Both service routers (SRs) will maintain BGP peering with the physical network/gateway. For Tier-1 gateways, the following rules apply: The active/standby SRs have the same IP addresses that are on the northbound facing network. Figure 7-73 shows that the SR is active on one edge transport node and that the virtual machines in the host transport node (hypervisor) can travel across one path.

Failover Detection Methods NSX uses two types of failover detection methods to detect when an edge transport node has failed (Figure 7-74). Bidirectional Forwarding Detection (BFD) on the overlay and management networks. BGP session status checks on the BGP uplinks networks.

BFD Failover Scenario Bidirectional Forwarding Detection (BFD) is a network protocol used to detect failures between two forwarding engines connected by a link. Failures are detected per gateway (logical router). The rules used to declare an edge transport node as down are the same in active/active and active/standby HA modes. BFD enables the engine to send keepalive packets (control messages) to its peers in certain intervals. BFD provides a low-overhead detection of failures even on physical network media that does not support failure detection, like Ethernet. To ensure the edge transport nodes can continuously send traffic, the

edge transport nodes send keepalives (BFD sessions) between the edge cluster members. The networks used for these BFD sessions are the overlay network (TEP network) and the management network configured on the edge transport nodes (Figure 7-75). When the standby Tier-0 gateway fails to receive keepalives on both management and TEP interfaces, it announces itself as active. BFD provides fast detection of failure for forwarding paths or forwarding engines and that improves convergence (Figure 7-76). Edge transport nodes’ virtual machines support BFD. There is a minimum BFD timer of one second with three retries and a threesecond failure detection time. Bare metal edge transport nodes support BFD. There is a minimum BFD timer of 300 milliseconds with three retries, which implies a 900millisecond failure-detection time.

BGP Failover Scenario The active service router (SR) on an edge transport node is declared down when all eBGP sessions are down, and this is applicable only when the Tier-0 gateway is configured with dynamic routing (Figure 7-77). In the case of dynamic routing, eBGP (or iBGP) is configured on the uplinks between each NSX edge transport node and the external physical routing gateways. The eBGP (or iBGP) status is also monitored during the keepalive exchanges, where the default keepalive interval is 60 seconds, and the minimum time between advertisements is 30 seconds. If an active gateway loses all the BGP neighbor sessions and a standby gateway is configured, a failover will occur (Figure 7-78).

Failover Modes During a failover scenario, the standby edge transport node becomes active. In a fallback scenario, two modes can be used, as described in Table 7-15. Table 7-15 Failover Modes HA Mode

Steps

Preemptive

1. Edge Transport Node 1 is active and fails. 2. Edge Transport Node 2 becomes active. 3. Edge Transport Node 1 recovers from its failure. 4. Edge Transport Node 1 becomes active again (with preemptive mode). 5. Edge Transport Node 2 becomes standby.

Non Preemptive 1. Edge Transport Node 1 is active and fails. 2. Edge Transport Node 2 becomes active. 3. Edge Transport Node 1 recovers from its failure. 4. Edge Transport Node 2 remains active (with non preemptive mode). 5. Edge Transport Node 1 remains on standby.

Fallback happens when the node that failed becomes available again.

Logical Routing Packet Walks Let’s now do a packet walk that involves logical routing to the physical network using the single-tier NSX-T network topology and a multi-tier NSX-T routing topology.

Single-Tier Routing: Inside NSX-T Egress to the Physical Network Let’s assume we have the network topology shown in Figure 7-79. We have a virtual machine with the IP address 10.10.10.100/24 that is networked by an NSX-T segment and connected to a Tier-0 gateway. The gateway is responsible for the routing toward the destination, which is a physical server hosted on the physical network with IP address 192.168.10.100/24. Figure 7-79 shows how the virtual machine sends traffic to the physical server. 1. 2.

The network packets that are generated by the virtual machine (10.10.10.100) are sent to its default gateway (10.10.10.1). The (default) gateway (T0-DR) will consult its forwarding table. a.

There will not be a route in the table for the 192.168.10.0/24

network, and therefore the packet is sent to the default route 169.254.0.2 gateway (SR-T0 of the edge transport node). 3.

The T0-DR (host transport node) sends the encapsulated packet with the GENEVE header to the edge transport node. a.

4.

The edge transport node then decapsulates the packet, where the SR-T0 components consult its forwarding table to verify if the 192.168.10.0/24 route is in there. a.

5. 6.

The encapsulated packet is sent to the edge transport node across the GENEVE overlay tunnel network.

The network route is in the table.

The edge transport node will forward the packet to the physical network device. This physical router can deliver the packet to its final destination, which is the physical server with IP address 192.168.10.100.

Here are the steps for the return traffic between the virtual machine and the physical server (Figure 7-80). 1. 2.

3.

4.

To send return traffic to the virtual machine, the physical server sends the traffic to its default gateway (192.168.10.1). The physical network knows that the 10.10.10.0/24 network is reachable from the IP address 172.16.10.2 by consulting its routing table. The service route (SR) and distributed router (DR) components share the same routing table, so the packet knows that it can reach the 10.10.10.0/24 network from 169.254.0.1 (DR-T0 on the host transport node).

The packet is encapsulated on the edge transport node and sent to the edge transport node across the GENEVE overlay tunnel network. 5. 6.

The host transport node receives the packet and decapsulates it. This host transport node device can deliver the packet to its final destination, which is the virtual machine with IP address 10.10.10.024.

Multi-Tier Routing: Inside NSX-T Egress to the Physical Network Let’s assume we have the network topology shown in Figure 7-81. We have a virtual machine with the IP address 10.10.10.100, which is networked by an NSX-T segment and connected to a Tier-1 gateway. The gateway is responsible for the routing toward a Tier-0 gateway and that Tier-0 gateway is responsible for routing toward the destination, which is a physical server hosted on the physical network with IP address 192.168.10.100. The following steps explain how the virtual machine sends traffic to the physical server. 1.

2.

The network packets that are generated by the virtual machine (10.10.10.100) are sent to its default gateway (DR-T1 on the host transport node) (10.10.10.1). The (default) gateway (T1-DR) will consult its forwarding table. a.

3. 4.

There will not be a route in the table for the 192.168.10.0/24 network, and therefore the packet is sent to the default route 10.64.16.0 gateway (DR-T0 on the same host transport node).

The packets are sent to the DR-T0 instance on the same host transport node using the T0-T1-TRANSIT network. The DR-T0 consults its forwarding table, and the packets are sent to

the default route 169.254.0.2, which is the SR-T0 component on the edge transport node. 5. 6.

The DR-T0 (host transport node) sends the encapsulated packet with the GENEVE header to the edge transport node. The edge transport node then decapsulates the packet, where the SR-T0 components consult its forwarding table to verify if the 192.168.10.0/24 route is in there. a.

7. 8.

The network route is in the table.

The edge transport node will forward the packet to the physical network device. This physical router can deliver the packet to its final destination, which is the physical server with IP address 192.168.10.100/24.

Figure 7-82 shows the return traffic steps between the virtual machine and the physical server. 1. 2.

3.

To send return traffic to the virtual machine, the physical server sends the traffic to its default gateway (192.168.10.1). The physical network knows that the 10.10.10.0/24 network is reachable from the IP address 172.16.10.2 by consulting its routing table. The service route (SR-T0) and distributed router (DR-T0) components share the same routing table, so the packet knows that it can reach the 10.10.10.0/24 network from 10.64.16.1 (DR-T1 on the same edge transport node). a.

4.

The packets are sent to the DR-T1 instance on the same edge transport node through the T0-T1-TRANSIT network.

The gateway (DR-T1) consults its forwarding table and sees that a

route is directly connected to the 10.10.10.0/24 network over the NSX segment. a. 5. 6. 7.

The packet is sent to the remote host.

The packet is encapsulated on the edge transport node and sent to the edge transport node across the GENEVE overlay tunnel network. The host transport node receives the packet and decapsulates it. This host transport node device can deliver the packet to its final destination, which is the virtual machine with IP address 10.10.10.100.

Summary In this chapter, you learned how NSX-T leverages Tier-0 and Tier-1 gateways to perform logical routing across different segments inside NSX-T and toward the physical network world. You also learned more about how high availability works, how static and dynamic routing works, and how BGP route-path decisions can be influenced using BGP path attributes and route maps. The next chapter explains how VRF Lite and EVPN can be added to your routing architecture.

© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021 I. Hoogendoorn, Getting Started with NSX-T: Logical Routing and Switching https://doi.org/10.1007/978-1-4842-6708-0_8

8. VRF Lite and EVPN Iwan Hoogendoorn1 (1) Rotterdam, The Netherlands

This chapter describes VRF, explains the use cases, benefits, and limitations, and shows how to configure and validate VRF Lite. You will also learn about EVPN architecture and learn about the use cases and benefits related to EVPN.

VRF Lite Use Cases When you require separate routing instances for different tenants, you need to deploy multiple edge transport nodes and multiple Tier-0 gateways (per tenant, at least one Tier-0 gateway). This does not scale very well because you can only deploy one T0 per edge node, so multiple T0 gateways mean multiple edge nodes, especially when you are using bare metal edge transport nodes. By using VRF Lite, you will be able to run multiple routing instances on the same Tier-0 gateway, thereby allowing you to save resources. VRF Lite helps you overlap network ranges in the same routing domain when this is required. It also allows you to extend existing VRF Lite deployments in the physical network infrastructure to your NSX-T infrastructure. Figure 8-1 illustrates three tenants with and without the use of VRFs.

Figure 8-1 Tenant routing designs with and without VRFs

VRF Lite Terminology To understand VRF Lite, you need to be familiar with the networking concepts described in Table 8-1. Table 8-1 VRF Related Networking Concepts Networking Concept

Description

Virtual Routing and Forwarding (VRF)

The VRF technology allows you to have multiple isolated routing instances on a routing device. Each routing instance will handle the routing and forwarding independently and isolated from the other instances.

Multiprotocol Border Gateway Protocol (MPBGP)

Multiprotocol extensions for BGP (MBGP or MP-BGP), sometimes referred to as multiprotocol BGP or multicast BGP. This is an extension to Border Gateway Protocol (BGP) that allows different types of addresses (known as address families) to be distributed in parallel.

VRF Lite VRF Lite provides logical routing isolation in NSX and spans it to external peer devices that support this technology.

When you want to allow segmentation (or separation) between different tenants and applications, you will no longer need separate Tier0 gateways and edge transport nodes with VRF Lite. With the VRF Lite feature, you will be able to configure multiple routing instances inside a single Tier-0 gateway without the need to deploy separate Tier-0 gateways (and edge transport nodes) per routing instance. Logical routing is fully isolated within and toward external peers that are compatible with VRF Lite. VRF Lite is different from other VRF implementations because it does not rely on MPLS and MP-BGP protocols running in the physical network, so MPLS and MP-BGP protocols are not used for VRF Lite by default unless it is used in combination with EVPN.

VRF Lite Requirements To deploy VRF Lite, you need to deploy at least one Tier-0 gateway, and this Tier-0 gateway needs to have an external connection with a Layer 3 peer that supports the 802.1Q protocol (the tagging of VLANs). The Tier-0 gateway is known as the default Tier-0 gateway and is the parent gateway of the VRF gateways you create. The Tier-0 gateway used as the default can be an existing Tier-0 gateway with Tier-1 gateways connected to it. You can have more than one Tier-0 gateway with VRF gateways. VLAN tagging (802.1Q) is used to separate the VRFs in the uplink segment that connects to the external devices.

VRF Lite Limitations The VRF gateway (VRF Lite) is not compatible with VPNs, load balancers, and stretched Tier-0 gateways used together with NSX-T Federation. When you use VRFs, the bandwidth is also shared across all VRFs because they are all contained by one parent Tier-0 gateway. The VRFs will inherit the active/active or active/standby state of the parent Tier-0 gateway. Also, inter-SR routing is not possible when you use VRFs and interVRF static routing does not work together with NAT, so you need to route through the physical network if you want to perform routing

between VRFs with NAT-enabled.

VRF Lite Topologies The single-tier and multi-tier gateway topologies (without VRFs enabled) are discussed in Chapter 6. Figures 8-2 and 8-3 show you the single and multi-tier topologies with VRF-enabled. The following topology requirements must be satisfied: VRF gateways can only be deployed on Tier-0 (parent) gateways. A trunk (802.1q) network connection needs to interconnect the different VRFs with the physical routers. The physical routers and the underlying virtual infrastructure, like vSphere distributed port groups, have to support trunking (802.1q). Virtual machines of tenants can be connected to Tier-0 segments and Tier-1 segments.

Figure 8-2 Single-tier VRF-enabled network topology

Figure 8-3 Multi-tier VRF-enabled network topology

VRF Lite Gateway Interfaces When we deal with Tier-1 and Tier-0 gateways with VRFs and the corresponding DR and SR components, the different interfaces have different names for future reference. This terminology is described in Table 8-2 and Figure 8-4. Table 8-2 VRF Lite Interface Terminology Interface/Link Purpose Name LR Trunk Port

The logical router (LR) trunk port provides connectivity between the parent Tier-0 gateway and the upstream physical routers.

Uplink

The VRF uplink interface provides internal connectivity to the LR trunk port of the parent Tier-0 gateway.

Intra-tier Transit Link

The Intra-tier Transit Link provides internal connectivity between the service router (SR) and distributed router (DR) of a VRF gateway.

Downlink

The Downlink interface provides connectivity between VRF gateways and segments with attached workloads.

Routerlink

The Routerlink port provides connectivity between VRF gateways and Tier-1 gateways.

The LR trunk port is a network interface, while the VRF uplink port can be seen as a sub-interface with a specific VLAN ID. The LR trunk port is internally created in the parent Tier-0 gateway and is the only port connected to the uplink trunk segment. The other interfaces are the same type as the interfaces used in the standard Tier-0 and Tier-1 gateways.

Figure 8-4 VRF Lite interface terminology

VRF Lite Control and Data Planes VLAN tagging with 802.1Q provides isolation for each VRF in the uplink trunk segment. VLANs are the channels for the data plane. A BGP instance that runs inside each separate VLAN per tenant will provide the control plane. A dedicated BGP instance will run in every tenant VRF network. You do not need to use the extensions in the MP-BGP protocol to exchange the VRF routing information. BGP is the control plane because it dynamically propagates and updates routing information to all VRF

peers. Each VLAN is mapped to a VRF and does only transport traffic for that particular VRF (Figure 8-5).

Figure 8-5 VLAN and VRF mapping

Each and every VRF instance has to peer with the upstream routers. Therefore, unique VLANs and subnets are required per VRF.

VRF Lite Configuration The configuration of VRF Lite consists of the following steps: Table 8-3 VRF Lite configuration Step Description 1

Create a parent Tier-0 gateway with its own uplinks to perform BGP peering across (if required) and/or to use as the uplink(s) for north-south connectivity.

2

Create segments responsible for the north-south connectivity and BGP peering of your VRF-enabled Tier-0 gateway.

3

Create the VRF Tier-0 gateway and attach it to the parent Tier-0 gateway you created in Step 1.

4

Create uplink interfaces and assign the segments you created in Step 2.

5

Configure BGP peering across the new uplinks for the VRF-enabled Tier-0 gateway.

6

Create a Tier-1 gateway and attach this to the VRF-enabled Tier-0 gateway.

7

Create a segment for your (tenant) workloads that are part of the new VRF.

8

Configure route advertisement on the Tier-1 gateway and route redistribution on the VRF-enabled Tier-0 gateway.

9

Verify if the connected networks of the Tier-1 gateway that are part of the new (tenant) VRF are advertised up to your northbound physical routers.

In Figure 8-6, you see two physical routers, and this is how you should typically design and deploy this in a production environment. In

my test environment, I only used one physical test router. In Figure 8-6, only Step 1 is shown.

Figure 8-6 Parent Tier-0 with two BGP uplinks

Tier-0 Gateway Deployment First, we deploy our parent, Tier-0 gateway, and assign it to an edge cluster (Figure 8-7). In this case, I will be doing an active/active deployment.

Figure 8-7 Parent Tier-0 gateway configuration

Tier-0 Gateway Uplink Interfaces When the (parent) Tier-0 gateway is deployed, we configure the uplink interfaces. In my example, I used VLAN 16 and VLAN 17 (Figure 8-8).

Figure 8-8 Parent Tier-0 gateway uplink configuration

Tier-0 Gateway BGP When the uplinks are created, you can start configuring BGP and the BGP peers (Figure 8-9).

Figure 8-9 Parent Tier-0 gateway BGP configuration

Figure 8-10 shows the BGP peers for the parent Tier-0 gateway.

Figure 8-10 Parent Tier-0 gateway BGP configuration status

Figure 8-11 Parent Tier-0 with two VRF-enabled Tier-0 gateways

Tier-0 Gateway Uplink Interface for VRF Gateway Now that the Parent Tier-0 gateway is ready, we can start configuring the VRF feature using Figure 8-10. I will start by creating two segments per VRF (Table 8-4). Table 8-4 VRF Uplink (VLAN) Configuration Details Segment Name

Transport Zone VLANs

SEG-TENANT1-01 TZ Edge | VLAN

1-30

SEG-TENANT1-02 TZ Edge | VLAN

1-30

SEG-TENANT2-01 TZ Edge | VLAN

31-60

SEG-TENANT2-02 TZ Edge | VLAN

31-60

I am creating two segments per VRF (Figures 8-12 through 8-15) to make sure I have redundant paths toward the north in case a failure occurs with one of the paths.

Figure 8-12 Create SEG-TENANT1-01 segment creation

Figure 8-13 Create SEG-TENANT1-02 segment creation

Figure 8-14 Create SEG-TENANT2-01 segment creation

Figure 8-15 Create SEG-TENANT2-02 segment creation

VRF Gateway When the segments are created, I create two new VRFs representing Tenant 1 (Figure 8-16) and Tenant 2 (Figure 8-17).

Figure 8-16 Create VRF-enabled Tier-0 gateway | VRF-TENANT1

Figure 8-17 Create VRF-enabled Tier-0 gateway | VRF-TENANT2

When the parent Tier-0 and VRF-enabled Tier-0 gateways are configured, you should be able to review this in the Tier-0 gateway overview (Figure 8-18).

Figure 8-18 Tier-0 (parent and VRF) gateway overview

VRF Gateway Uplink Interfaces The following interfaces are created for Tenant 1 (Table 8-5 and Figure 8-19). Table 8-5 Tenant 1 Uplink Configuration Details

Uplink Name

IP Address

Segment

VLAN Edge Transport Node

en01-uplink1 10.203.28.2/24 SEG-TENANT1-01 28

Edge Node 1

en01-uplink2 10.203.29.2/24 SEG-TENANT1-02 29

Edge Node 1

en02-uplink1 10.203.28.3/24 SEG-TENANT1-01 28

Edge Node 2

en02-uplink2 10.203.29.3/24 SEG-TENANT1-02 29

Edge Node 2

Figure 8-19 Tenant 1 uplink configuration overview

The following interfaces are created for Tenant 2 (Table 8-6 and Figure 8-20). Table 8-6 Tenant 2 Uplink Configuration Details Uplink Name

IP Address

Segment

VLAN Edge Transport Node

en01-uplink1 10.203.58.2/24 SEG-TENANT2-01 58

Edge Node 1

en01-uplink2 10.203.59.2/24 SEG-TENANT2-02 59

Edge Node 1

en02-uplink1 10.203.58.3/24 SEG-TENANT2-01 58

Edge Node 2

en02-uplink2 10.203.59.3/24 SEG-TENANT2-02 59

Edge Node 2

Figure 8-20 Tenant 2 uplink configuration overview

VRF Gateway BGP Once the uplinks are configured, we can configure BGP and the BGP peers for the Tenant 1 VRF. You only need to turn BGP on here, as the other settings are inherited from the parent Tier-0 gateway (Figure 821).

Figure 8-21 Tenant 1 VRF-enabled Tier-0 gateway BGP configuration

The BGP peers for Tenant 1 are configured in Table 8-7 and Figures 8-22 and 8-23 . Table 8-7 Tenant 1 BGP Configuration Details Remote Peer IP Address Remote AS 10.203.28.1

65000

10.203.29.1

65000

Figure 8-22 Tenant 1 VRF-enabled Tier-0 gateway BGP configuration status

Figure 8-23 Tenant 2 VRF-enabled Tier-0 gateway BGP configuration

The BGP peers for Tenant 2 are configured with the configuration details in Table 8-8 and Figure 8-24. Table 8-8 Tenant 2 BGP Configuration Details Remote Peer IP Address Remote AS 10.203.58.1

65000

10.203.59.1

65000

Figure 8-24 Tenant 2 VRF-enabled Tier-0 gateway BGP configuration status

When the peer is configured correctly and the other side is configured correctly as well, you will see the Success message when your BGP peers are established on both of the VRFs.

Add and Connect a Tier-1 Gateway to a VRF Gateway You can choose to optionally configure a Tier-1 gateway and attach it to the VRF-enabled gateway to create a multi-tier routing topology using VRFs. I have created a Tier-1 gateway for Tenant 1. In Figure 8-25, you can see that this Tier-1 gateway is attached to the VRF-enabled gateway dedicated to Tenant 1.

Figure 8-25 Tenant 1 Tier-1 gateway to VRF-enabled Tier-0 attachment

In Figure 8-26, you can see that this Tier-1 gateway is attached to the VRF-enabled gateway dedicated to Tenant 2.

Figure 8-26 Tenant 2 Tier-1 gateway to VRF enabled Tier-0 attachment

When you have created two Tier-1 gateways, the summary should look like Figure 8-27.

Figure 8-27 Tier-1 gateway overview

Enable Route Advertisement and Redistribution To advertise the routes toward the physical network, you need to enable advertisement on the Tier-1 gateway. You do this by enabling the All Connected Segments & Service Ports switch in the Route Advertisement

section of the Tier-1 gateways (Figure 8-28).

Figure 8-28 Enable route advertisement on the tenant Tier-1 gateway

On the Tier-0 VRF-enabled gateways, you need to redistribute the connected routes coming from the Tier-1 gateway. This is done in the Route Redistribution section of the Tier-0 VRF-enabled gateways (Figures 8-29 and 8-30).

Figure 8-29 Enable route redistribution on the tenant Tier-1 gateway (1)

Figure 8-30 Enable route redistribution on the tenant Tier-1 gateway (2)

You need to make sure you create a new Route Redistribution entry and check the Connected Interfaces & Segments checkbox in the

Advertised Tier-1 subnets section (Figure 8-31).

Figure 8-31 Enable route redistribution on the tenant Tier-1 gateway (3)

Network Topology Map Verification When all the VRF routing topology is created, you can perform a quick verification by looking at the network topology map (Figure 8-32 and Figure 8-33).

Figure 8-32 Tenant 1 VRF topology

Figure 8-33 Tenant 2 VRF topology

VRF Lite (Routing Table) Validation You can verify the routing tables by logging in to the logical routers hosted on the edge transport node. Note The vrfs that you see in the get logical-routers output is not the same VRF as the ones you just configured to represent the tenants. These vrf instances define the DR and SR components when you create a Tier-0/1 gateway. I used edge transport node 1 as an example to execute the commands. Still, the commands will be the same for the other edge transport nodes

when you need this for verification or troubleshooting purposes.

Edge Transport Node 1 First , we set up an SSH session into an edge transport node and verify the configured logical routers. I highlighted the DR components for the tenant VRFs we configured. Pod-010-T0-EdgeVM-01> get logical-routers Logical Router UUID VRF LRID Name Type 736a80e3-23f6-5a2d-81d6bbefb2786666 0 0 3810b075-996f-49fd-8348-55da7cb9a9cf 1 4 01 SERVICE_ROUTER_TIER0 11 adff2e0a-3fa2-4506-b678-8d2067a21753 3 1 01 DISTRIBUTED_ROUTER_TIER0 5 34dacb67-bf37-4502-ad55-65c4997d8fd9 4 3 01 DISTRIBUTED_ROUTER_TIER1 7 b797acbd-ba92-48d4-9ae6-dabdcb5e2447 5 9 TENANT1 VRF_SERVICE_ROUTER_TIER0 6 99abb0b2-e5b6-4ac4-b26e-a6c6585b2afe 6 12 TENANT2 VRF_SERVICE_ROUTER_TIER0 6 f21797fa-80a1-4198-8f6c-039d63596e56 7 17 TENANT1 DISTRIBUTED_ROUTER_TIER1 4bed844a-6e75-4933-8bbf-285b11bbd883 8 7 TENANT1 VRF_DISTRIBUTED_ROUTER_TIER0 4 8b67c46c-45ec-4264-bf7b-ee5c04571e1a 9 10 TENANT2 VRF_DISTRIBUTED_ROUTER_TIER0 4 4e890dcb-2fab-44a0-a313-58084ccadeea 10 18 TENANT2 DISTRIBUTED_ROUTER_TIER1 We can log in to this DR component by using the vrf command with the VRF number. I highlighted the 172.16.10.0/24 route that was received by the Tier-1 gateway that is connected to the VRF-enabled Tier-0 gateway dedicated to Tenant 1.

TU SR-T0-Ga

DR-T0-Ga

DR-T1-Ga

SR-VRF-V

SR-VRF-V DR-T15

DR-VRF-V

DR-VRF-V DR-T15

Pod-010-T0-EdgeVM-01> vrf 8 Pod-010-T0-EdgeVM-01(vrf)> get forwarding Logical Router UUID VRF ID Name Type 4bed844a-6e75-4933-8bbf-285b11bbd883 8 VRF-TENANT1 VRF_DISTRIBUTED_ROUTER_TIER0 IPv4 Forwarding Table IP Prefix Gateway IP Type UUID MAC 0.0.0.0/0 10.203.29.1 route 4a38-90e5-4724c9ac1542 00:0c:29:89:10:37 10.203.28.0/24 route 4820-a413-586e72298077 10.203.28.2/32 route 583d-ac25-24fc8851ef98 10.203.29.0/24 route 4a38-90e5-4724c9ac1542 10.203.29.2/32 route 583d-ac25-24fc8851ef98 100.64.240.2/32 route 51b0-b677-dd180bf57f20 100.64.240.2/31 route 48b7-a5f6-a7dc6266f678 127.0.0.1/32 route 4b1e-9fac-96c255e79704 169.254.0.0/24 route 49f6-8fe3-63abcd801a83 169.254.0.1/32 route 51b0-b677-dd180bf57f20 169.254.0.3/32 route 583d-ac25-24fc8851ef98 172.16.10.0/24 100.64.240.3 route 48b7-a5f6-a7dc6266f678 IPv6 Forwarding Table IP Prefix Gateway

LR7

DR-VRF-

Gatewa c7a06708-a54e0774b3e8-0ac3e5c92597-d9d3c7a06708-a54ee5c92597-d9d356cf72de-4c5357211030-680af57ccfd7-ea2b0d87ac25-f8eb56cf72de-4c53e5c92597-d9d357211030-680a-

IP Type UUID MAC ::1/128 ea2b-4b1e-9fac-96c255e79704 fc53:77e1:2211:5801::/64 680a-48b7-a5f6-a7dc6266f678 fc53:77e1:2211:5801::1/128 4c53-51b0-b677-dd180bf57f20 ff00::/8 f8eb-49f6-8fe3-63abcd801a83

Gateway route

f57ccfd7-

route

57211030-

route

56cf72de-

route

0d87ac25-

Pod-010-T0-EdgeVM-01(vrf)> We can also log in to the other VRF. I highlighted the 172.16.20.0/24 route that was received by the Tier-1 gateway that is connected to the VRF-enabled Tier-0 gateway dedicated to Tenant 2.

Pod-010-T0-EdgeVM-01> vrf 9 Pod-010-T0-EdgeVM-01(vrf)> get forwarding Logical Router UUID VRF LRID Name Type 8b67c46c-45ec-4264-bf7b-ee5c04571e1a 9 10 DR-VRFVRF-TENANT2 VRF_DISTRIBUTED_ROUTER_TIER0 IPv4 Forwarding Table IP Prefix Gateway IP Type UUID Gatewa MAC 0.0.0.0/0 10.203.58.1 route b19b4966-d9de4074-ad22-c63bbdda4f70 10.203.58.0/24 route b19b4966-d9de4074-ad22-c63bbdda4f70 10.203.58.2/32 route 48381ef4-a85155a4-93b8-b29f11516650 10.203.59.0/24 route 41e75cae-b9a24b7c-9502-2f68eaa7cfe8 10.203.59.2/32 route 48381ef4-a851-

55a4-93b8-b29f11516650 100.64.240.4/32 route 5a6b-aa4c-c1541df8767d 100.64.240.4/31 route 459c-a97c-d911043d113e 127.0.0.1/32 route 4be1-83bc-b281a185d870 169.254.0.0/24 route 4d78-911d-331b7cd231fa 169.254.0.1/32 route 5a6b-aa4c-c1541df8767d 169.254.0.3/32 route 55a4-93b8-b29f11516650 172.16.20.0/24 100.64.240.5 route 459c-a97c-d911043d113e IPv6 Forwarding Table IP Prefix Gateway IP Type UUID MAC ::1/128 9182-4be1-83bc-b281a185d870 fc53:77e1:2211:5802::/64 f956-459c-a97c-d911043d113e fc53:77e1:2211:5802::1/128 874f-5a6b-aa4c-c1541df8767d ff00::/8 84e1-4d78-911d-331b7cd231fa

37648afa-874f1fa3faa7-f956dc6503d3-918207cb30ce-84e137648afa-874f48381ef4-a8511fa3faa7-f956-

Gatewa route

dc6503d3-

route

1fa3faa7-

route

37648afa-

route

07cb30ce-

Pod-010-T0-EdgeVM-01(vrf)> When I log in to the physical router, you can see the BGP peers that are up. Notice that there are only two BGP peers per tenant. I highlighted them. vyos@Pod-010-Router:~$ show ip bgp summary BGP router identifier 10.203.10.1, local AS number 65000 IPv4 Unicast - max multipaths: ebgp 1 ibgp 1

RIB entries 3, using 288 bytes of memory Peers 12, using 53 KiB of memory Neighbor V AS MsgRcvd Up/Down State/PfxRcd 10.203.16.2 4 65001 47 01:36:13 1 10.203.16.3 4 65001 48 01:35:13 1 10.203.17.2 4 65001 47 01:35:55 1 10.203.17.3 4 65001 47 01:35:53 1 10.203.28.2 4 65001 0 never Active 10.203.28.3 4 65001 90 01:29:07 1 10.203.29.2 4 65001 90 01:29:07 1 10.203.29.3 4 65001 0 never Active 10.203.58.2 4 65001 87 01:26:45 1 10.203.58.3 4 65001 0 never Active 10.203.59.2 4 65001 0 never Active 10.203.59.3 4 65001 87 01:26:30 1 Total number of neighbors 12 vyos@Pod-010-Router:~$

MsgSent

TblVer

InQ

OutQ

52

0

0

0

59

0

0

0

52

0

0

0

57

0

0

0

1

0

0

0

100

0

0

0

101

0

0

0

0

0

0

0

98

0

0

0

0

0

0

0

0

0

0

0

96

0

0

0

I should replace this output when the bug of VLAN 16/17 is fixed. To see if you received the routes from NSX-T, you can verify this on the physical router. vyos@Pod-010-Router:~$ show ip route bgp

Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - ISIS, B - BGP, > - selected route, * FIB route B>* 172.16.10.0/24 [20/0] via 10.203.28.3, eth1.28, 00:01:03 B>* 172.16.20.0/24 [20/0] via 10.203.59.3, eth1.59, 00:00:06 And when you want to verify communication from an NSX-T perspective, you can use traceflow to perform an end-to-end connectivity test. In Figure 8-34, I have done this, and you can see the full path presented visually. Note that the packet sent by the Tenant 1 virtual machine is delivered to the Tenant 2 virtual machine.

Figure 8-34 Traceflow diagram between Tenant 1 and Tenant 2

A full step-by-step explanation of how VRF Lite is configured can be found at https://nsx.ninja/index.php/Configure_NSXT_VRF_Lite.

Ethernet VPN (EVPN) Ethernet VPN (EVPN) is a standard developed by the Institute of

Electrical and Electronics Engineers (IEEE). EVPN expands Layer 2 and Layer 3 networks over IP/MPLS core networks with the separation of control and data plane traffic. EVPN supports encapsulation types like the following: VXLAN Multiprotocol Label Switching (MPLS) Network Virtualization using Generic Routing Encapsulation (NVGRE) EVPN is using MP-BGP VPN capabilities to transport tenant networks with Layer 2 and Layer 3 routing information (control plane traffic).

Challenges that Telco Companies Have with NFV Currently, telco companies have investments in IP/MPLS networks using hardware for routing and switching. They need an easy transition to NFV clouds using Network Functions Virtualization (NFV). This means that they need support transport and routing protocols for the telco virtual routers, and they need to be able to isolate tenants from each other. Telcos are also looking for full integration between the virtual networks and the traditional physical telco networks.

The Telco NFV Solution The prominent reasons a telco will go with NFV clouds are scale and throughput. To extend traditional telco networks to NFV clouds, NSX-T can be used. This works because telco virtual routers use the BGP routing protocol to exchange routes of tenants with the NSX-T gateways dynamically, and the NSX-T gateways implement VRFs to provide isolation between the tenants. These VRF-enabled NSX gateways support the MP-BGP protocol to interconnect the telco’s traditional networks with the NSX-T virtual network segments. VXLAN is the encapsulation protocol used to exchange traffic between NSX-T and the remote gateways.

Telco Virtual Router Telcos use their own virtual router (vRouter) appliances to provide Virtual Network Functions (VNFs) that previously were handled with

dedicated hardware. Telco providers can deploy any vRouter from any vendor of choice. Some vendors support regular BGP, while others also support MP-BGP. Telco administrators will manage the vRouters. vRouters are capable of advanced routing and forwarding, and they are multi-tenant. They have gateway functions and they provide connectivity to the NSX-T VRFenabled gateways (Figure 8-35).

Figure 8-35 Telco virtual routers connected to VRF-enabled Tier-0 gateways

EVPN Topology In Figure 8-36, you can see the edge inline EVPN topology model. In this model, the NSX-T edge transport nodes are involved in the routing plane and data path.

Figure 8-36 EVPN topology

EVPN will extend existing VRFs (on the vRouter) in traditional IP/MPLS core networks to the virtual networks (segments) in NSX-T.

EVPN Data Plane The EVPN data plane will span across NSX-T data center nodes and physical remote gateways. VXLAN is used to do this and to exchange traffic with the remote gateways. The GENEVE encapsulation transports only the overlay traffic between the VRF gateways and the telco’s virtual routers, where VLANs are used to provide isolation between the VRFs. In Figure 8-37, you see two isolated tenants (blue and green) separated with VLANs inside the GENEVE tunnels and between the telco vRouter and the Tier-0 VRF-enabled gateways. Between the Tier-0 VRFenabled gateways and the remote gateway, VXLAN VNIs are used for isolation purposes.

Figure 8-37 EVPN data plane

EVPN Control Plane To transport all the tenant’s VRF routing information in a single session, MP-BGP L2VPN capabilities are used. A dedicated BGP instance runs between the Tier-0-enabled VRF gateway on the NSX-T side and the vRouter in each VRF. One single MPBGP instance is configured between the parent Tier-0 gateway and the remote gateway. In Figure 8-38, you will see two isolated tenants (blue and green) with separate BGP sessions between the telco vRouter and the Tier-0 VRF-enabled gateways (one set of BGP peers per tenant; this can be one or two BGP peers per tenant). Between the Tier-0 VRF-enabled gateways and the remote gateway, there is a single MP-BGP session.

Figure 8-38 EVPN control plane

EVPN Route Advertisement EVPN route advertisement (Figure 8-39) is done using the following steps: 1.

The vRouters use the BGP routing protocol to advertise the networks to the VRF-enabled Tier-0 gateways. a) b) c)

2. 3.

4. 5.

Orange = Network A, B, C Blue = Network D, E, F Green = Network G, H, I

A Route Distinguisher (RD) will be associated with each VRF-enabled Tier-0 gateway to differentiate the networks between the VRFs. The routes that are announced using the MP-BGP protocol between the parent Tier-0 and the data center gateway are tagged using route targets (RT), and these RTs are used to import and export routes into the VRFs. The VRF routes are redistributed into the MP-BGP protocol as an EVPN Type-5 route (Layer 3). The VXLAN TEP IP addresses are advertised using the IPv4 address family in this same MP-BGP session used in Step 4.

6.

The routes of all the networks A, B, C, D, E, F, G, H, and I are now sent to the IP/MPLS core cloud.

Figure 8-39 EVPN route advertisement

Each type of protocol that MP-BGP can transport is a separate address family, such as: IPv4 and IPv6 L2VPN for Layer 2 VPNs VPNv4 and VPNv6 for Layer 3 VPNs EVPN Type-5 routes are used to advertise the IP prefixes of the Orange, Blue, and Green networks. EVPN route types for Layer 2 networks are not used in EVPN for NSX-T as of the time of this writing (NSX-T 3.0.x).

Route Distinguishers (RDs) We just mentioned a new term, the RD. An RD attribute is prepended to a VRF network to be distinguished from the other VRFs and it is unique. This is useful when you have networks in the same subnet as another VRF/vRouter and they need to be advertised. When the RD admin address is configured, an RD is generated automatically in each VRF. It has the RD admin address:number format, where the RD admin address is configured in the parent Tier-0 gateway, and the number that is generated automatically is unique for each VRF.

When the RD admin address is not provided, RDs must be manually configured in each VRF with the AS:number or IP:number format. The numbers in Figure 8-40 explain what is happening with the RDs.

Figure 8-40 Route distinguishers (RDs)

Manual RD configurations can have one of the following formats: : The AS is two bytes The number is four bytes : The IP address is four bytes The number is two bytes The RD is included in the Type-5 EVPN (L3) routes.

Route Targets (RT) We also mentioned a new term, the route target (RT) . RT attributes are

BGP-extended communities used to import and export routes into different VRFs. These extended communities provide an extended format to carry the VPN information. Each EVPN route can be tagged with multiple RTs. The AS component in an RT is two bytes. The lower two bytes are taken from the autonomous systems with four bytes. RTs are automatically generated by default and have the AS:VNI format. The AS number is inherited from the parent Tier-0 gateway, and the VNI value is taken from each VRF. The RTs imported and exported in each VRF can be manually configured if you want. The numbers in Figure 8-41 explain what is happening with the RTs.

Figure 8-41 Route targets (RT)

EVPN Requirements and Limitations To deploy EVPN, you need to deploy a parent Tier-0 gateway. The remote (data center) gateway needs to support the MP-BGP protocol and the VXLAN overlay tunneling protocol. The virtual router (vRouter) that is part of the telco network needs to support the BGP protocol and support VLAN tagging (802.1q).

Note VXLAN requires an MTU of 1600 or higher. EVPN is not compatible when you have enabled load balancing of VPN services on the parent Tier-0 gateway.

Comparing EVPN and VRF As VRF Lite is required to configure EVPN, it’s a good idea to understand their differences. Table 8-9 summarizes their differences. Table 8-9 Comparing EVPN and VRF Components VRF Lite

EVPN

Control Plane The BGP protocol is used to exchange routes with remote gateways using a dedicated VLAN, per tenant, per BGP peer.

One single MP-BGP peer is used to exchange routes with the data gateway.

Data Plane

The VXLAN protocol is used between the parent Tier-0 gateway and data center gateway.

VLAN tagging is done using 802.1q with the remote gateways.

The BGP protocol is used to exchange routes with vRouters using a dedicated VLAN, per tenant, per BGP peer.

VLAN tagging is done using 802.1q over GENEVE with the vRouters. VRF Gateways

This is supported.

This is supported.

The different EVPN and VRF use cases are explained in Table 8-10. Table 8-10 EVPN and VRF Use Cases

VRF Lite

Use Provides tenant isolation using Cases one Tier-0 gateway

EVPN Extends traditional IP/MPLS core networks of telcos to NFV cloud solutions

Allows overlapping subnets No MPLS required for implementation

A full step-by-step explanation of how EVPN is configured (with Cisco NX-OS) can be found at https://nsx.ninja/index.php/NSX-

T_EVPN_North/South_routing_with_NX-OS_EVPNbased_Datacenter_fabric.

Summary In this chapter, you learned about VRFs and EVPNs. You should now be able to describe the benefits, use cases, and operation of VRF Lite. The chapter explained how to configure and validate VRF Lite using NSX-T in a sample network topology. The chapter also explained what EVPN architecture is and what the use cases and benefits are for telcos to use NFV cloud features together with NSX-T. The next chapter discusses NSX-T multicast.

© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021 I. Hoogendoorn, Getting Started with NSX-T: Logical Routing and Switching https://doi.org/10.1007/978-1-4842-6708-0_9

9. Multicast in NSX-T Iwan Hoogendoorn1 (1) Rotterdam, The Netherlands

This chapter covers multicast and the benefits and use cases of the multicast technology in general. You will also learn which multicast capabilities are offered by NSX-T and how to configure them. Note NSX-T 3.0.x and NSX-T 3.1 were used to write this chapter, and the support and working of multicast in future releases may be slightly different.

Multicast Multicast is a forwarding method that sends packets to a group of receivers in a single transmission. When using multicast, a source (S) will forward network packets to a selected group of receivers (D). When using the broadcast method, all the receivers will receive the packets; when using the unicast method, only one receiver will receive the packet (Figure 9-1).

Figure 9-1 Unicast, broadcast, and multicast, compared

Multicast Benefits and Use Cases Multicast lets a source send traffic to only a subset of receivers, preventing other receivers (or destinations) from receiving network traffic that they have not asked for. Broadcast sends traffic to everyone on the network, causing lots of unnecessary network traffic. This can be prevented by using multicast. Unicast sends traffic to a single receiver. With multicast, you can save bandwidth and preserve network resources on your network because a receiver (or destination) subscribes to a particular packet stream. When subscribed, it will receive the data. Destinations that have not subscribed will not receive data for that specific stream from the source. The multicast technology has many use cases. Video streaming, IPTV, and IP CCTV services are some examples.

Multicast Protocols The multicast technology uses different dedicated network protocols. NSX-T supports the following multicast protocols: Internet Group Management Protocol (IGMP): IGMP is a Layer 2 protocol used to establish multicast group memberships between hosts and adjacent routers (Table 9-1).

Protocol-Independent Multicast (PIM) : Layer 3 routing protocol to route multicast traffic between different networks (Table 9-2). Table 9-1 Internet Group Management Protocol (IGMP) Protocol Description IGMPv2

IGMP version 2 enables hosts to send a notification when leaving a multicast group.

IGMP IGMP snooping is used to maintain tables inside Layer 2 devices for clients snooping (receivers/destinations) that have applied to be part of (join) a specific multicast group. Table 9-2 Protocol-Independent Multicast (PIM) Protocol

Description

PIM Sparse Mode (PIMSM)

This PIM protocol type will not forward multicast traffic until it has received a downstream request from a routing device. PIM-SM uses the rendezvous point (RP) as the gathering point for sources and receivers of multicast data.

PIM Bootstrap PIM Bootstrap ensures that all routing devices in the PIM domain have the same RP configured without configuring this everywhere manually.

Multicast Topology The NSX-T multicast topology consists of multicast sources and receivers. These can either reside on the physical network (PIM cloud) or the virtual NSX-T network. The NSX-T Tier-0 gateways will be connected to the physical network, and PIM is enabled on all the uplinks. Figure 9-2 shows that: PIM is enabled on the uplinks on the Tier-0 gateways. IGMP is enabled on the downlinks on the Tier-0 gateways.

Figure 9-2 NSX-T multicast topology (only Tier-0)

Figure 9-3 shows that: PIM is enabled on the uplinks on the Tier-0 gateways. IGMP is enabled on the downlinks on the Tier-1 gateways.

Figure 9-3 NSX-T multicast topology (only Tier-1)

Multicast IGMP Snooping and Forwarding Tables To populate the multicast forwarding tables , the host transport nodes will perform IGMP snooping of local receivers (Table 9-3). The tables will have the Multicast Group IP and the VNI pairs. Table 9-3 Multicast Group VNI Receivers Multicast Group Local Receivers

239.0.0.1

6001,7001

These tables replicate the multicast packets to all the VNIs associated with a specific multicast group on the local transport node. The host transport nodes will also maintain entries with remote receivers to replicate multicast packets to every transport node that has joined a multicast group. This is done with pairs from the table (Table 9-4). Queries received from edge transport nodes have registered with pairs. Table 9-4 Multicast Group Edge Transport Node and TEP Receivers Multicast Group Remote Receivers *

Edge transport node 1

239.0.0.2

TEP2

Multicast IGMP Queries Figure 9-4 shows the communication flows for IGMP queries: The edge transport node (EN-TN-1) sends periodic IGMP queries across the transit logical switch. The DR instance updates the remote multicast forwarding table on the host transport nodes (H-TN-2 and H-TN-3). The query is then forwarded locally to all the VNIs (6001, 6002 for HTN-3, and 6003, 6004 for H-TN-2) with receiver VMs.

Figure 9-4 Multicast IGMP query process

The edge transport node performs the query functions and sends the IGMP queries to all the host transport nodes. The host transport nodes add an entry in the remote receiver’s multicast table with the edge transport node that sent the query out.

Multicast IGMP Reports Figure 9-5 shows the communication flows for IGMP reports: 1. 2. 3. 4.

The receiver’s virtual machines will send the IGMP reports. The DR instance (on the host transport nodes) updates the multicast table with local receivers. The DR instance (on the host transport nodes) forwards the reports to the other transport nodes across the logical transit switch. The remaining transport nodes update the remote receiver entries in the tables.

Figure 9-5 Multicast IGMP reports

Multicast PIM Sparse Mode When using multicast in PIM sparse mode, multicast traffic is not forwarded to an interface until a request is received. Traffic is flooded to all the interfaces to select multiple paths between the multicast server and the multicast client (Figure 9-6).

Figure 9-6 Multicast PIM sparse mode

Multicast PIM Sparse Mode Forwarding Figure 9-7 shows the steps of PIM sparse mode forwarding using a

Rendezvous Point (RP): 1. 2. 3.

The router that wants to receive multicast traffic (R3) signals it to the RP (R2) with a PIM join message. The router receives multicast traffic (R1) from a multicast server and sends a PIM register message to the RP (R2). The multicast traffic is forwarded through the RP (R2).

Figure 9-7 Multicast PIM sparse mode forwarding

Multicast PIM Bootstrap Candidate RPs are the routers that want to become rendezvous points

(RPs). RP configuration can be done in multiple ways. Using the PIM Bootstrap protocol is one way. The PIM Bootstrap protocol will configure the RP in PIM sparse mode. The BSR is the bootstrap router that receives the RP announcements from the candidate RPs and advertises them to the network. The remaining multicast routers will receive BSR messages from the BSR with a list of all candidate RPs and will select the best one. Figure 9-8 shows that Router 3 is configured as the BSR. Router 1 is the candidate RP. Router 1 and Router 5 will both send an RP announcement. The BSR sends out a BSR message. It selects Router 1 as the RP and configures Router 1. BSR can be disabled on the Tier-0 gateway.

Figure 9-8 Multicast PIM bootstrap

Multicast North-South Flows Figure 9-9 shows the NSX-T traffic flows for multicast traffic that originates from the physical network.

Figure 9-9 Multicast north-south flow

In case of north-to-south traffic, the Tier-0 service router (SR) receives the multicast traffic from the RP and forwards it to one of the host transport nodes (DR) connected to the logical transit switch. This host transport node takes care of the replication. The host transport nodes will replicate the packets to their local receivers (the virtual machines) and the remote receivers in the other host transport nodes connected to the logical transit switch.

Multicast South-North Flows Figure 9-10 shows you the NSX-T traffic flows for multicast traffic that originated from the NSX network. In the case of south-to-north traffic, the host transport nodes with the source of the traffic send the traffic

upstream to the Tier-0 service router (SR).

Figure 9-10 Multicast south-north flow

The source host replicates the multicast packets to the local receivers and the remote receivers in the other host transport nodes connected to the logical transit switch.

Multicast Source Replication Flows The following steps show how the multicast replication is done at the source (Figure 9-11): 1.

A multicast packet is forwarded to all the local ports on the source VNI that has joined the multicast group (VNI 6001).

2. 3.

The DR instance sends copies locally to the other VNIs (VNI 6002 and 6003) with receivers in the multicast group. If the packet must be routed outside, the DR instance looks up the multicast forwarding table to send the packet to remote transport nodes (H-TN-5).

Figure 9-11 Multicast source replication example

Multicast Destination Replication Flows The following steps indicate how the multicast replication is done at the destination (Figure 9-12): 1. 2. 3.

The DR instance receives the multicast packet originated in a remote host transport node (H-TN-2). The DR instance forwards the packet to the ports in the same VNI that joined the multicast group (VNI 6001). If the packet must be routed, the DR instance replicates the packet locally to the other VNIs (VNI 6002 and 6003) that joined the multicast group.

Figure 9-12 Multicast destination replication example

Multicast Limitations From an NSX-T perspective, there are some limitations you have to deal with, as described in Table 9-5. Table 9-5 NSX-T Multicast Limitations Parameter

Limitation

Tier-0 gateway

• Multicast workloads (source and receivers) can be connected anywhere. • Multicast can be enabled on a Tier-0 gateway that is active/active, but multicast will only work when the Tier-0 gateway is active/standby. • PIM can be enabled on two uplink interfaces on different service routers (SRs) deployed in active/active mode. • Tier-0 gateways cannot be configured as RPs or BSRs. This should be done on the physical network. • BSR can be disabled. • PIM can only be enabled on one uplink per service router (SR). • VRF-enabled gateways do not support multicast.

Tier-1 gateway

• Multicast can be configured on Tier-1 gateways with the same capabilities of the Tier-0 gateway. • A Tier-1 gateway SR component is created when multicast is enabled on the Tier-1 gateway. • The Tier-1 gateway SR component is the querier of the Tier-1 gateway routing

domain. • All features and functions are the same for the Tier-1 gateway as they are for the Tier-0 gateway. • The Tier-1 gateway SR component is using the Tier-0 SR component as the querier of the multicast router. • Multicast must be enabled on the Tier-0 gateway to work on the attached Tier-1 gateway. Layer 2 bridges

• Multicast is not supported on Layer 2 bridges.

KVM host transport nodes

• Multicast is not supported on KVM host transport nodes.

Multicast Configuration Steps From a high level, the following steps need to be performed to configure multicast: 1. 2. 3. 4. 5. 6.

Configure an IGMP profile under multicast profiles. Configure a PIM profile under multicast profiles. Select the newly created IGMP and PIM profiles in the multicast section of a Tier-0 gateway where you want to enable multicast. Enable PIM on (one of) the uplink interfaces of the Tier-0 gateway where you want to enable multicast. Enable multicast routing on the Tier-0 and/or Tier-1 gateway. Enable multicast routing on the (downlink) segments on the Tier-0 gateway where you want to enable multicast.

Multicast IGMP Profile Configuration To configure an IGMP profile (Figure 9-13 through Figure 9-15), you need to choose Networking ➤ Network Settings ➤ Multicast Profiles. Then choose IGMP Profiles as the Profile Type and click the Add IGMP Profile button.

Figure 9-13 IGMP Profile Configuration | Before Configuration

Figure 9-14 IGMP Profile Configuration | During Configuration

Figure 9-15 IGMP Profile Configuration | After Configuration

When you add a new IGMP profile, the parameters listed in Table 9-6 are required. Table 9-6 IGMP Profile Parameters Parameter Name

Parameter Description

Default Parameter Value

Name

The name of the IGMP profile

N/A

Query Interval

The interval time in which the querier is sending general queries

30 seconds

Query Max Response Time

The time that is inserted into the general queries when they are sent

10 seconds

Higher values make the traffic less bursty, as host responses are spread out over larger interval periods This value must be less than the Query Interval value Last Member Query Interval

The time between group-specific query messages

10 seconds

You can tune this value to modify the “leave latency” of the network When you reduce this value, this will reduce the time to detect when the last member of the group is lost

Robustness Variable

This variable allows tuning for the expected packet loss on a subnet

2

When you expect packet loss on a subnet, this variable can be increased

Multicast PIM Profile Configuration To configure a PIM profile (Figure 9-16 through Figure 9-18), you need to choose Networking ➤ Network Settings ➤ Multicast Profiles. Then choose PIM Profiles as the Profile Type and click the Add PIM Profile button.

Figure 9-16 PIM Profile Configuration | Before Configuration

Figure 9-17 PIM Profile Configuration | During Configuration

Figure 9-18 PIM Profile Configuration | After Configuration

When you add a new PIM profile, the parameters explained in Table 9-7 are required.

Table 9-7 PIM Profile Parameters Parameter Parameter Description Name Name

The name of the IGMP profile.

Static RP Address

The IP address of the rendezvous point (RP) as the gathering point for sources and receivers of multicast data. (On the physical network.) This is only configured when PIM bootstrap is not configured. You can configure multiple static rendezvous points (per configurable multicast range).

Multicast Tier-0/Tier-1 Gateway Configuration The IGMP and PIM profiles you just created now needs to be applied on a Tier-0 gateway. (These profiles can only be applied on the Tier-0 gateways and not on Tier-1.) You need to select a Tier-0/Tier-1 gateway by browsing to the Tier0/Tier-1 gateway. Choose Networking ➤ Connectivity ➤ Tier-0 Gateways/Tier-1 Gateways. Then click the three dots next to the Tier0/Tier-1 Gateway that you want to configure multicast on and click Edit. Finally, go to the multicast section. Select the IGMP and PIM profiles you just configured (Figure 9-19). Also, configure the Replication Multicast Range parameter.

Figure 9-19 IGMP profile application on Tier-0 gateway and enable multicast

The Replication Multicast Range is a range of multicast group addresses. This range is used in the underlay network to replicate tenant multicast group addresses in the overlay network. The Replication Multicast Range and the Tenant Multicast Group Addresses parameters should not overlap in the overlay network. Make sure you enable the Multicast toggle switch to enable multicast on the Tier-0/Tier-1 gateway (Figures 9-19 and 9-20).

Figure 9-20 Enable Multicast on a Tier-1 gateway

Multicast Tier-0 Uplink Configuration You need to enable PIM on the uplink interfaces (Figure 9-21) of the Tier-0 gateway on which you just enabled multicast. Go to the Interfaces section to do this and then click the number to open the Interfaces settings.

Figure 9-21 The external (Uplink) interface on the Tier-0 gateway

Click the three dots of the interface (Figure 9-22).

Figure 9-22 Edit the Tier-0 Uplink interface

Toggle the PIM switch button to Enabled to enable PIM on the uplink interface (Figure 9-23).

Figure 9-23 Enable PIM on Tier-0 gateway

You can now verify that PIM is enabled in the uplink interface’s summary screen (Figure 9-24).

Figure 9-24 PIM enablement verification

Repeat this process on each uplink interface.

Multicast Tier-0/Tier-1 Downlink Configuration You enable multicast routing on the downlink interfaces by selecting the segments. You need to select a segment by browsing to it. Choose Networking ➤ Connectivity ➤ Segments. Click the three dots next to the segment on which you want to configure multicast and then click Edit. Then go to the multicast section.

Toggle the Multicast Routing switch button to Enabled to enable multicast routing on the downlink interface (Figure 9-25).

Figure 9-25 Tier-0/Tier-1 gateway downlink configuration

Note In NSX-T 3.0.x, multicast routing is only possible on the Tier-0 gateway. This means that the Multicast Routing setting will only appear when the segment is attached to a Tier-0 gateway. In NSX-T 3.1, segments attached to the Tier-1 gateways are supported.

Summary This chapter explained the multicast use cases, benefits, capabilities, and limitations of multicast in the NSX-T 3.0 architecture. It also explained how multicast is configured in NSX-T. The next chapter covers logical bridging with NSX-T.

© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021 I. Hoogendoorn, Getting Started with NSX-T: Logical Routing and Switching https://doi.org/10.1007/978-1-4842-6708-0_10

10. Logical Bridging Iwan Hoogendoorn1 (1) Rotterdam, The Netherlands

This chapter describes the use cases and operations of logical bridging. You will learn about the differences between routing and bridging. You will also learn how to configure logical bridging by creating a bridge profile and a bridge-backed segment to bridge a virtual NSX-T Segment with a physical network.

Logical Bridging With logical bridging, you can communicate across Layer 2 between NSXT overlay-backed networks and physical network VLAN-backed networks (Figure 10-1). Logical bridging can help you migrate workloads from the physical (VLAN) network to the virtual (NSX-T Segment) network. The Layer 2 bridging feature is offered by a NSX-T edge transport node (bare metal edges and virtual edges). The Data Plane Development Kit (DPDK) is used to provide highperformance forwarding of the network packets. The incoming and outgoing traffic is bridged inside the NSX-T domain. The traffic is bridged only if the edge firewall is configured to allow access.

Figure 10-1 NSX-T Segment bridging to physical VLAN

Logical Bridging Use Cases The use cases of Layer 2 bridging is typically used for are to connect physical (not virtualized) servers (VLAN enabled) with virtual machines (NSX-T Segments) across the same L2 subnet. Layer 2 bridging can also be used to migrate physical servers to a virtual machine and ensure that the process is as seamless as possible. Another migration use case is when you want to perform an NSX-v to NSX-T migration, where the double bridge approach can be used (Figure 10-2).

Figure 10-2 NSX-v to NSX-T migration use case

Physical to Virtual Connectivity: Bridging vs. Routing Communication from the virtual to the physical network or the other way around can be achieved using either routing or bridging. In the upcoming sections, I will explain the differences. When connecting your physical workload on traditional physical

networks to a virtualized environment, you can use routers running standard routing protocols to route traffic between workloads in the two environments. If you do not want to use routing and place your physical and virtual devices on a single Layer 2 subnet, you can enable bridging.

Bridging With bridging, the Layer 2 broadcast domain is used for both physical and virtual workloads. This will result in a lack of scalability because the domain size is limited, and when you place all your workloads into a single broadcast domain, you make your failure domain larger (Figure 10-3). With NSX-T bridging, you are limited to one over-to-VLAN pair per active bridge (edge transport node).

Figure 10-3 Extending the L2 broadcast domain using L2 bridging

Routing With routing, the Layer 2 broadcast domain is different for the physical and virtual workloads (Figure 10-4). A router is required to send network traffic from one Layer 2 broadcast domain to another Layer 2 broadcast domain. Equal-Cost Multipath (ECMP) can be used to allow multiple routing paths (creating failure isolation) by scaling out your network connections toward the physical network.

Figure 10-4 Routing between two Layer 2 broadcast domains

The rule of thumb is to route when you can, but bridge when you have to.

Physical to Virtual Routing With routing, you typically create a router between one or multiple virtual machines inside the NSX-T domain and a physical server in the physical VLAN domain in different subnets. Note Virtual machines on a VLAN can also be routed using a router. In the example shown in Figure 10-5, the web server is on the NSX-T overlay segment named WEB SEGMENT A, and the subnet that this overlay network is using is 172.16.10.0/24. The physical server is on the VLAN with a different subnet (172.16.70.0/24). The Tier-0 gateway provides routing between the overlay segment (172.16.10.0/24) and the physical VLAN segment (172.16.70.0/24).

Figure 10-5 North/south routing with no bridging

Physical to Virtual Bridging With bridging, you create a bridge between one or multiple virtual machines inside the NSX-T domain and a physical server in the physical VLAN domain in the same subnet.

Note Virtual machines on a VLAN can also be bridged using logical bridging. In the example in Figure 10-6, the webserver is on the NSX-T overlay segment named WEB SEGMENT A, and the subnet that this overlay network is using is 172.16.10.0/24. The physical server is on the VLAN with the same subnet as the NSX-T Segment (172.16.10.0/24).

Figure 10-6 Bypass routing by bridging

Logical Bridging Components To provide Layer 2 connectivity between virtual machines inside the NSX-T domain (overlay) to (physical/virtual) machines outside the NSXT domain, you need to configure a bridge-backed segment. This bridge-backed segment is attached to a bridge profile that will contain the VLAN where you want to bridge (Figure 10-7). Inside this profile, you also specify what physical port you want to use, and this is identified by the VLAN transport you select.

Figure 10-7 Logical bridging components

Note The VLAN transport zone maps to (a) physical port(s).

Multiple Bridge Profiles You can configure multiple bridge profiles on an edge transport node. You can do this with two edge transport nodes and by specifying two bridge profiles on each edge transport node using the active/backup roles. Table 10-1 and Figure 10-8 show an example of this.

Table 10-1 Multiple Bridge Profiles Bridge Profile

VLAN Segment Active Edge Transport Node

Backup Edge Transport Node

Subnet

Blue

20

Blue

A

B

172.16.20.0/24

Red

10

Red

B

A

172.16.10.0/24

Figure 10-8 Multiple bridge profiles

Configure NSX-T Logical Bridging (Single Profile) When you want to configure logical bridging using a single bridge profile (so you only want to configure bridging between a single NSX-T GENEVE segment and a VLAN), you use the high-level steps described in Table 102. Table 10-2 Logical Bridging Configuration Steps Step Description 1

Create a vDS bridge port-group (TRUNK).

2

Create a new NSX-T transport zone (dedicated to the bridge interface).

3

Add the new transport zone (bridge) to the existing edge transport nodes.

4

Create a bridge profile.

5

Configure a VLAN on the physical network with a gateway IP address.

6

Create virtual test machines on this VLAN to perform some ping tests.

7

Configure a segment as an overlay network (on NSX-T) with a gateway IP address by attaching it to a Tier-1 gateway.

8

Create virtual test machines on this segment to perform some ping tests.

9

Configure the overlay network (segment) as a Layer 2 bridge-backed segment.

10

Perform some ping and traceroute tests between the VLAN and NSX-T segment virtual machines.

11

Perform traceflow tests between the VLAN and NSX-T segment virtual machines.

12

Look at the bridge statistics.

The configuration of logical bridging (single profile) with all of these steps is described in full detail on the following website: https://nsx.ninja/index.php/Configure_NSXT_Logical_Bridging_(single_profile)

Summary This chapter described the function and purpose of logical bridging and included an explanation of the difference between bridging and routing. It has shown you how to create a bridge profile, and a bridge-backed NSX-T Segment to bridge network traffic between the overlay segment and a physical VLAN. With this, you have reached the end of this book. You learned the basics of NSX-T, and this book bridged the gap between the physical network and the virtual (NSX-T) world. NSX-T has many more possibilities and features, which I will discuss in another publication coming out soon. Thank you! Iwan Hoogendoorn

Index A Address resolution protocol (ARP) Appliance Proxy Hub (APH) Autonomous systems (AS)

B Bare metal edge transport node definition deployment prerequisites edge node deployment hardware requirements initial deployment settings verify virtual switches Bidirectional Forwarding Detection (BFD) Allowas-in multipath turn on Bootstrap router (BSR) Border Gateway Protocol (BGP) Broadcast, Unicast, and Multicast (BUM)

C Central Control Plane (CCP) Certificate Signing Request (CSR) Cloud Management platforms (CMPs) Command Line Interface (CLI) Community tags Compute virtualization automation containers traditional virtual machines definition

hypervisor type-1 type-2 OS resources security Create, Read, Update, and Delete (CRUD) Customer experience program (CEIP)

D Data plane development kit (DPDK) Distributed router (DR)

E East-west connectivity Edge cluster Edge node cluster control plane form factors NSX-T GUI route advertisement/redistribution, enable transport vCenter Server GUI VM resource reservation Edge transport node Edge VM form factor deployment prerequisites interface considerations multiple virtual switches sizes Equal-cost multipath (ECMP) ESXi host preparation configuration dashboard explanation kernel modules overview steps

system monitoring vCenter Server virtual switch verification Ethernet Virtual Private Network (EVPN) control plane data plane definition encapsulation types RDs requirement/limitations route advertisement RT telco companies telco NFV solution topology VRF vRouter

F Failover detection methods BFD BGP Fully Qualified Domain Name (FQDN)

G Gateway interface configuration steps references Global Server Load Balancing (GSLB) Gracious Address Resolution Protocol (GARP)

H High availability active/active active/standby configure High level architecture controller role

components failure information exchange sharding data plane components functions information exchange private/public cloud management/control planes internal virtual IP address load balancing management cluster Manager role policy role Host preparation

I, J, K Infrastructure as a Service (IaaS) Internet Group Management Protocol (IGMP) forwarding tables profile configuration parameters queries reports snooping version 2 Internet Service Provider (ISP) Inter-SR routing IP address pool IP Flow Information Export (IPFIX) IP prefix list

L Link Aggregation Control Protocol (LACP) Link Aggregation Group (LAGs) Link Layer Discovery Protocol (LLDP)

Local control plane (LCP) Logical bridging components configuration multiple bridge profiles NSX-T segment/VLAN use cases Logical routing multi-tier north-south/east-west NSX-T single-tier Tier-0/Tier-1 gateways use cases Logical switching header format low-level packer walk segments attach, Hood creation parameters port status profiles virtual machine terminology tunneling use cases

M MAC address-based learning Maximum Transmission Unit (MTU) Multicast benefits configuration IGMP profile PIM profile steps

Tier-0/Tier-1 downlink Tier-0/Tier-1 gateway Tier-0 uplink definition destination replication flows IGMP See Internet Group Management Protocol (IGMP) limitations methods north-south flows PIM See Protocol-Independent Multicast (PIM) protocols source replication flows south-north flows topology Tier-0 Tier-1 use cases Multi-exit discriminator (MED) Multi-Protocol BGP (MP-BGP) Multi-tier topology

N, O Neighbor Advertisement (NA) Neighbor Solicitation (NS) Network Address Translation (NAT) Network Functions Virtualization (NFV) Network interface cards (NICs) Network I/O control (NIOC) bandwidth allocation parameters profile version 3 Network topology map verification edge transport node 1 Tenant 2 validation

Network virtualization (NV) North-south connectivity NSX-T edge features automation-related networking-related NSX-T’s troubleshooting-related operations-related platform-related security/services-related high level architecture See High level architecture topologies multi-tier single-tier use cases automation multi-cloud networking security NSX-T GUI inventory tab networking tab Plan and Troubleshoot tab Policy vs. Manager security tab System tab NSX-T Manager cluster appliance node size, deployment VIP deployment consideration edge transport nodes OVF template API CLI

GUI license vCenter Server deployment method vCenter server registration NSX virtual distributed switch (N-VDS) configuration DPDK ESXi host transport nodes functionalities NFV operation modes software abstraction layer switches transport node vCenter Server

P, Q Physical server architecture modes terminology Physical to virtual connectivity bridging routing Preboot Execution Environment (PXE) Proof of Concept (PoC) Protocol-Independent Multicast (PIM) bootstrap profile configuration parameters Sparse Mode (PIM-SM) Provider edge (PE)

R Red Hat Enterprise Linux (RHEL) Remote procedure call (RPC) Rendezvous points (RPs)

Role-based access control (RBAC) Route aggregation Route distinguishers (RDs) Route map BGP configuration Route target (RT)

S Segment profile applying segment applying segment, ports forwarding ARP table updates MAC table updates TEP table updates IP Discovery MAC Discovery QoS security SpoofGuard types unicast packet forwarding across hosts Segment Security profile Self-signed certificate CA-signed CSR import verification Service routers (SRs) Single-tier topology Software-defined data center (SDDC) clouds components definition VMware See VMware

Software defined networking (SDN) Software-defined storage (SDS) “Source-based” replication Storage virtualization

T Tier-0 gateway advanced dynamic routing attach Tier-1 configuration parameters create dynamic (BGP) routing north-south connectivity route redistribution routing configuration routing Tier-1 static/dynamic routing static routing uplink interfaces Tier-1 gateway attach configuration parameters create east-west connectivity Transport node architecture categories ESXi/KVM network packets physical connectivity Transport node profiles configuration ESXi host profile 1 profile 2 uses vCenter Server

Transport node switch configuration N-VDS See NSX virtual distributed switch (N-VDS) transport zone creation ESXi logical switch/segment NSX Manager NSX-T environment overlay overview VDS See vSphere distributed switch (VDS) Transport zones hypervisor hosts logical network NSX edge nodes types overlay VLAN Trust On Every Use (TOEU) Trust On First Use (TOFU) Tunnel endpoint interface (TEP)

U Uplink profiles properties teaming policies types failover order load balanced source MAC

V, W, X, Y Virtual IP address (VIP) Virtual machines (VMs) Virtual Network Functions (VNFs) virtual NIC (vNIC) virtual router (vRouter)

Virtual Routing Forwarding (VRF) VMware implementation NSX VCF vSAN vSphere ESXi Server vCenter Server VMware Cloud Foundation (VCF) vRealize Automation (vRA) vRealize Orchestrator (vRO) VRF lite configuration BGP deployment gateway steps Tier-0 gateway BGP Tier-0 gateway uplink configuration Tier-0 gateway uplink interface tier-1 gateway uplink interfaces control/data planes definition gateway interfaces limitations requirements terminology topologies use cases vSphere distributed switch (VDS) design DPDK ESXi host transport nodes NFV operation modes

port groups switching functionality transport node vCenter Server vSphere standard switch (VSS)

Z Zero-Trust security model