VMware Explore 2022: What’s New in vSphere 8 – My Picks

Of course! The biggest news at VMware Explore this year is the announcement of vSphere 8, which is scheduled to be generally available in September. This release comes with a plethora of new features, capabilities, and even architectural changes, some of which I’ll be talking about in this and other related posts.

With vSphere 8, VMware has focussed on four key areas from a features’ perspective:

Seamless manageability and integration for all VMware clouds but also using it to provide cloud-like capabilities to the private cloud deployments
The use of Data Processing Units (DPUs) capabilities to offload infrastructure services and increase network throughput – thereby meeting the needs of modern workloads
Increased efficiency in operational tasks by a reduction in downtime and maintenance windows and improving visibility etc.
Enhancement of developer-friendly services and increased availability and declarative cluster lifecycle

In the remainder of this post, I’ll talk about my favourite among all the computing announcements for vSphere 8.

vSphere Distributed Services Engine

Let’s start with the vSphere infrastructure capability I was most looking forward to. If you remember “Project Monterey“, this is formally it!

Get used to this technology as VMware plans to move most infrastructure services to DPUs so that the CPU remains available to serve more applications. Doing so, also improves on security, something that I will talk about in a future article.

With a DPU already in a host, the hypervisor places another copy of ESXi on the DPU itself, which allows it to move the infrastructure services to be served from the DPU card seamlessly – where those services can run faster and separately. Starting with vSphere 8, VMware will be supporting greenfield deployments of NSX running from supported DPUs, such as Pensando and NVIDIA BlueField.

The lifecycle manager will take care of managing the upgrades of the copy of ESXi on the DPUs as well, so no extra management is required there. In fact, it will ensure that the ESXi versions running on the main host and the DPU are kept in sync.

Once the hypervisor has knowledge of the DPU, it is amazingly simple to start making use of it, by selecting the correct DPU from a drop-down list. That enables the offloading of network functions to the DPU and in addition, provides enhanced visibility of the network traffic and security features.

VMware vSphere with Tanzu

Firstly, I am sure you’ll be glad to hear that with vSphere 8, VMware is consolidating all the various Tanzu editions. There will be a single runtime called “Tanzu Kubernetes Grid” – version 2.0 – which is the same runtime that is used when vSphere with Tanzu is deployed, including the one on public clouds.

Workload Availability Zones

Along with the unified runtime, VMware is also introducing a capability that, I think, is a natural progression: “Workload Availability Zones”

As you can see from the slide, Workload Availability Zones allow Supervisor and Workload clusters to span vSphere clusters. Naturally, this is a huge improvement in terms of availability and allows Kubernetes clusters to be resilient across availability zones.

For now, 3 availability zones are required for the availability mechanism to function, and at creation, one can choose either the pre-existing clusters or choose the workload availability zones option. There’s also a one-to-one mapping between an availability zone and a vSphere cluster for now, but I do expect that to change in the future.

ClusterClass for Declarative Tanzu Deployments

Another welcome addition to the capabilities is the introduction of the ClusterClass (which is part of the open-source upstream conformant ClusterAPI) that provides declarative lifecycle management capabilities to Kubernetes workloads. Typically, it’s managed using the management cluster but for Tanzu, that responsibility will be fulfilled by the supervisor clusters.

Not only does ClusterClass allow the definition of the basic deployment, but it also defines the initial state of the deployed clusters in terms of infrastructure packages, network connectivity, storage, authentication mechanisms etc.

Federated Authentication with Pinniped

Authentication for Tanzu workload and supervisor clusters has traditionally been provided using the vSphere SSO. While that capability remains, the integration of Pinniped with VMware vSphere with Tanzu is also present as an option now. That is an extremely useful addition which will provide seamless enterprise authentication integration with multiple external federated identity providers (IDP).

Pods for Pinniped are automatically deployed on the supervisor and workload clusters as required. Once in place, Pinniped takes over the authentication mechanism for the clusters in question and provides complete independence from the vSphere SSO. Another big tick in the box!

Life Cycle Management

To start, note that vSphere 8 will be the last release to support vSphere Update Manager’s baseline-driven lifecycle management. It is still supported, however, it’s time to remove any dependencies on that mechanism as vSphere Lifecycle Manager will completely take over from it going forward.

Cluster Remediation

Among many, there are a couple of lifecycle enhancements that I would like to highlight in this section. The first is the staging of cluster images to speed up remediation.

Staging does take a fair bit of time when remediating hosts and some updates do suffer when image transfer fails during the process for whatever reason. In vSphere 8, one gets to “Stage All” of the images before remediation starts, making it a much more dependable update. Of course, staging does not require the hosts to be in maintenance mode so separating it out of the remediation process, helps shorten the maintenance windows.

That combined with the new ability to remediate many hosts in parallel, reduces the maintenance window even further. To remediate in parallel, the administrator does need to define how many can be done together as all those will need to go into maintenance mode – affecting availability. By default, all hosts that are in maintenance mode can be remediated together but the administrator can change that number to something different.

Is it really that big a deal given we’ve had vMotion and DRS for years? I’d say yes because these capabilities improve the uptime metrics, and the workloads don’t have to be shifted from one host to another as many times.

Enhanced Recovery of vCenter

Now, this one is probably my favourite of the lot when it comes to lifecycle management and availability.

We all are aware of the pain we go through, trying to protect the vCenter servers, given they’re the brains of a particular environment. We back them up religiously, even if we don’t have a reliable way to test their restoration and pray that they’ll come good if God forbid, we need to restore them for whatever reason. While backups and restores are reliable with vCenter, they don’t remove the challenge that restoration of your vCenter from backup, rolls you back to the state it was in when the last backup was taken – which depending on the situation – could be weeks or even months in some cases.

With vSphere 8, vCenter’s cluster state is stored in a key-value store, distributed between the different hosts in the cluster. In case of a failure, that key-value store becomes the “source of truth” to reconcile all the changes made since the backup was taken, from which the vCenter was restored. For now, it’s just the host cluster membership state but more configuration items will be added soon.

vSphere Configuration Profiles

With vSphere 8, VMware is introducing “vSphere Configuration Profiles” in Tech Preview, which is not surprising given the theme of configuration management is popular with this release and is something that’s a given in the developer community already.

Keep an eye on the development and release of this as it will eventually replace the “Host Profiles”. Unlike extraction of a previously configured host, tweaking it and then attaching it to a cluster, this mechanism allows you to define how you would like the cluster to look like in terms of settings, storage, networking and the cluster will configure itself to comply with your stated configuration. Also, in true configuration management fashion, it will also detect drift and in case of any change, will bring the configuration back to what it is supposed to be.

Definitely, something to play with in the lab!

Guest OS and Workload Enhancements

There are quite a few enhancements in this area but let me pick a few.

Virtual Hardware Version 20

For starters, there’s a new virtual hardware version which is 20 – we’ve come a long way, right?

Of course, there’s support for the latest generation of Intel and AMD processors and guest OSes but then there are more of device support numbers like support for up to 32 DirectPath I/O devices and 8 vGPU devices.

There are also new capabilities like “Device Virtualization Extensions” that allow vendors to create hardware-backed virtual devices which for supported hardware, allow dynamic direct path I/O with the support of vSphere DRS, HA and even vMotion. In addition, the VMs containing such supported devices can also be suspended/resumed and snapshotted. Such capabilities are vendor-dependent/specific so I will look out for the devices to emerge on the back of this capability.

Virtual TPM Provisioning Policy

Windows 11 has been supported on vSphere for a while and the support goes back to vSphere 6.7. If you’ve worked with it, you might have noticed the requirement for Windows 11 to have a vTPM device. However, when organisations want to deploy Windows 11 VMs via templates, a challenge is introduced that the vTPM device is also cloned as it’s also a part of the virtual machine definition, thereby introducing a potential security risk.

For that reason, vSphere 8 is introducing a provisioning policy option that can replace the vTPM device when a Windows 11 template is cloned. Of course, the old policy of copying is still retained as one might want to clone an exact copy of the virtual machine but for larger automated and templated deployments, the policy exists that enables the replacement of vTPM.

Simplified Virtual NUMA Configuration

We commonly come across situations where the precise configuration of virtual NUMA is important to get the best performance out of a virtual machine. Up until now, however, it has been a tad challenging to configure, through advanced settings and/or CLI etc.

The good news with vSphere 8 is that if you go for virtual hardware 20, then virtual NUMA-related settings become available in the management console for you. Not only that, but you also get a whole tile containing the “CPU Topology” from which you can set or edit the vNUMA settings for a VM with ease. Great, isn’t it?

There are other enhancements to talk about too! Like “Migration-Aware Applications” where applications can be notified of an impending migration event and can take recommended steps in advance, rather than acting on them just before migration. Also, if your VM is based on Virtual Hardware Version 20, it allows you to set hyperthreaded applications to be scheduled on the same physical CPU core now, which helps with the performance of multi-threaded latency-sensitive applications.

I know there’s a lot to digest here and I haven’t covered all the features but do check out all these features in detail as they’re my pick for this release. This major version has brought a lot of new capabilities and features – along with some major infrastructure changes – that make me quite happy as an architect due to their potential.

However, this post was all about the compute side of things. For my picks for vSAN 8, see my next post!

VMware Explore 2022: What’s New in vSphere 8 – My Picks

vSphere Distributed Services Engine