DRS vCLS wtf?
vSphere 7 update 1 presents – the vCLS (vSphere Clustering Services) appliances which take care of, well, clustering services. They are now an integral bit of how DRS (and HA) works, if turned on. The catch seems to be disconnecting these functions from vCenter, where this functionality was previously integrated.
As smarter people (namely Niels Hagoort here, and Duncan Epping at Yellow-Bricks) have said – this allows for DRS and HA to work properly even if vCenter itself is down.
Both of the links above do a great job of explaining the how and why. I’ve met with some issues, and I thought I’d discuss them here.
My home lab consists of networked storage in the form of an iSCSI/NAS box, and two physical VMware hosts (12th gen Dell servers). On top of that, I run vSphere 7 Update 1a. After installing U1, the vCLS appliances get immediately installed (I have/had DRS enabled, and set to fully automated). All of the instructions say that you shouldn’t really have to touch the vCLS VMs. So I did as I was told and did some maintenance on one of my hosts. Placing a host in maintenance, when you have DRS on fully automated means your running VMs get transferred to other hosts in the cluster. This happened, and the host was in maint. Both of the vCLS appliances were running on the other host. All fine and dandy!
As a refresher: According to documentation, you ought to have a max of 3 vCLS running in a cluster. Or less if you are running a smaller cluster, like I am. A single host cluster should have one, and a two host cluster should have two. My initial guess was, placing one host in maint, would result in there temporarily being one vCLS only, but I guess moving it to the other host was another option. Maybe it’d go back once the host comes out of maint?
Another thing of note: I have a single datastore serving both hosts. So shared storage.
But lo and behold, after my second host comes out of maintenance mode, a new vCLS gets deployed! I now have three! Two on one host, one on the other. At this point I wanted to see what would happen if I deleted one of the “extra” vCLS VMs. Of course, this isn’t allowed on a powered off VM, so I power it down, and try to delete it. Just as my mouse cursor hovers on the delete from datastore button, the VM powers on, and I can’t delete it. You have to be very quick to do this thing which isn’t recommended in the first place!
Okay, be quicker this time. I managed to delete the “extra” VM, but just at it was done, another one was deployed. This one called vCLS (4). I found no way to get back to the two VM default. Not without retreat mode.
Retreat mode is a way to get the vCLS VMs cleaned up, and start all over if you so choose. And I chose. To get the cluster into retreat mode, you need the domain cluster ID. You get this from looking at your vCenter URL when you have the cluster selected:
In my case “8”. You need this value for the next step. Select the vCenter, go to Configuration -> Advanced settings and add or edit the value:
This value is True for a cluster where vCLS is running, and False for when we place it in retreat mode (i.e. clean things up).
When you hit Ok, you’re now going to be placed into retreat mode. In this mode, all vCLS VMs are shut down and removed.
In any case, this cleans up the VMs, and when I turn vCLS on again, two vCLS are deployed – one on each host as expected:
I think I even got into a situation where I had four of the damned things at one point but I don’t have a screenshot of that so it’ll have to remain a rumor.
So summary time. If you end up with more than the documented amount of vCLS VMs, go into retreat mode, then back to normal functionality and it should deploy the correct amount. As for how maintenance mode should be done so the vCLS’s don’t multiply, I need to read up more on this, and try things out. Or find out what makes my lab so darned different as to prompt this behavior.