VSS to DVS – a tale
Recently, me and a colleague were working on upgrading parts of a VMWare infrastructure from a mostly 1 Gbps network configuration to a mostly 10 Gbps network configuration as well as adding a 10G vMotion network. Cabling was already done at this point, but I’ll list the configuration below for posterity:
- 4 hosts
- Each host running VMware 6.7u3
- Each host had 4 x 1 Gbps ports
- …and 2 x 10 Gbps ports
For reasons, there was a split configuration with the two 10 Gbps ports running in a standard switch (VSS) for use with iSCSI. Two ports were mapped to the DVS for VM traffic, and mapped to Uplink 1 and Uplink 2 in the DVS.
For the purposes of this exercise, the scope was to move the 10 Gbps ports to the DVS as well (thereby migrating the iSCSI port groups and vmk’s there also), and fully getting rid of the standard switches on each host. The vmk’s were:
- vmk0 – management, already on the DVS, using the 1G uplinks
- vmk1 – iSCSI vlan/subnet 1 on VSS, using 10G uplinks, MTU 9000
- vmk2 – iSCSI vlan/subnet 2 on VSS, using 10G uplinks, MTU 9000
Why do I explicitly mention MTU? I’ll get to that later. MTU usually doesn’t impact performance in a pants-off crazy way (various posts and studies give the difference as 2-7% between MTU 1500 and 9000) but sufficed to say it’s usually worth it anyway. As long as you are careful and diligent and make sure MTU 9000 is set across the entire data path. This means vmk’s, virtual switches, physical switches and the storage system at the other end.
The action plan was laid out as follows:
- Evacuate host prior to changes
- Make sure the 10G switch ports have the requisite VLANs
- Make sure iSCSI port groups have been created on the DVS
- Add the vMotion VLAN to the 10G ports as well as two port groups for it (see Multiple NIC vmotion)
- Create two new uplinks (10G Uplink 1 and 2) and increment the DVS uplink count by two
- Make sure only the port groups for iSCSI and vMotion are actually using the 10G uplinks (Teaming and Failover in port groups) – in iSCSI only one uplink is active per port group, the other is in unused
- Use the DVS “Add or Manage Host Networking” functionality to migrate uplinks and map them to the 10G Uplink 1 and 2 respectively
- In the same wizard, migrate VMK ports from the VSS to the DVS, mapping them to the correct port groups
- No VM networking migrations as host was evacuated, and anyway the VM networks are already on the DVS
- Verify functionality after steps 1-9
- Make it so
Ok so here we go: Steps 1-9 were completed without any significant issues or errors. After you finalize the migration wizard, it’ll remove the uplinks from the VSS, move them to the DVS and then make sure the VMKs are moved as well. If there are issues with cluster connectivity, it has automatic rollback functionality that will revert the changes. No such rollback happened. However… after a short while we started noticing issues. The web client UI started getting very sluggish. Changing tabs or pages took tens of seconds or minutes. Inspecting logs (vpxa.log, vmkernel.log, hostd.log) we noticed something awry with storage. The storage system showed no initiators logged in from that host. Datastores were not visible on the host.
Impact was minimal since we had no customer VMs running there. We decided to try restarting management agents (using services.sh restart on the ESXi host), but this failed as well. The hostd process would not start, or did not stay on. Next step, reboot. Reboot took a bit longer than usual, especially when starting modules related to iscsi/nfs and networking (I suppose). But it did reboot. Management would respond to ping, but the web UI would not come up, and vCenter could not connect to the host. “Host not available on the network”. Hostd once more was not running, although it seemed to start manually with /etc/init.d/hostd start. Storage issues persisted and the logs had things like issues writing journal and logs (our presumption was that since datastores were not available, some logging related functions that refered to the datastores on the SAN were failing and/or looping)
The odd thing was, vmkping -I using the iSCSI interfaces did work, so the host could ping the storage array! Why was storage not working!
…using the default MTU of 1500 (or 1472).
Ping using MTU 9000 or 8972 (vmkping -I vmk1 xxx.xxx.xxx.xxx -d -s 8972) did not go through. Why the hell not? Recheck the data path. VMK ports for iSCSI have MTU 9000. Physical switchport, ditto. The storage system had no changes done, so MTU 9000 there as well. Trust but verify – yes, MTU 9000 in the storage array. The DVS surely has to have MTU 9000 as well, because why wouldn’t it? It’s just the maximum packet size that it will allow to pass.
Dear reader, can you guess what MTU was on the DVS? 1500.
At some point before we noticed this, we verified MTU sizes on the command prompt since vCenter couldn’t connect and the UI wasn’t working on the host level either. We also changed MTU to 1500, which started restoring functionality pretty quickly (https://kb.vmware.com/s/article/1038827)
The final fix obviously was to set MTU to 9000 on the DVS, and then raise the VMKs back to that as well.
Now, while this was human error, there is a feature suggestion that I will send to VMWare. When performing migrations of uplinks and particularly VMK adapters, there should be a verification for MTU. vCenter has all that information already:
- It can look at the VMK properties in the port group / VSS (the source)
- It can look at the MTU of the DVS, as that’s just one setting for the DVS level
- vCenter has (and has added) various compatibility checks and validations for various things in the past few years (vMotion, Updates and so on), so this would be just another check to prevent human error
If it had told us “Uh-oh! You’re trying to move a VMK with MTU 9000 into a DVS with an MTU of 1500! Are you absolutely sure you want to continue?”, we would have saved multiple hours. Again, completely our fault but these are the things you neglect to check.
Other things I learned:
- Using the localcli command when hostd is down works when esxcli commands don’t! If you get the “Connection failure” message when running any esxcli commands, check /etc/init.d/hostd status, and start if you can, but if you can’t, run esxcli commands with localcli instead.
- Working with MTU values from the command line
A story of failure and human error, surely. But also one of learning. I hope this post helps at least one other person avoid hourlong MTU related troubleshooting