Removing trickier VMFS-datastores

Ok, maybe tricky isn’t the right word, but at least I couldn’t find anything written on this particular issue. Maybe it’s too simple a solution even for the VMware KB, but anyway.

I was cleaning out some local datastores (Smart Array 420 and 420i controllers) and ran into an issue where I was unable to remove the VMFS datastore because of a file in use error. It didn’t give me specifics; just told me that there were file(s) in use, and/or that the datastore was busy. After a fair amount of googling I started throwing some commands at it through the ssh. There’s a vmkfstools command that can break any existing locks, and it warns you that it will do it forcibly. So I tried that, given that there was nothing on the datastore that I couldn’t afford to lose (the point, after all, was to remove it). Despite grave warnings, vmkfstools was unable to break the lock and didn’t really give me a proper reason.

Looking at the vmkernel logs (/var/log/vmkernel.log by default), I saw the same references to files being in use, but no exact reference as to what files and where. No virtual machines were running anymore, and I had deleted most everything that I could off the datastore by hand already. There was a rather specific error message relating to corruption, and googling that got me exactly diddley. The datastore had had some problems previously, some hardware had been replaced, so there were a lot of variables and things that could have affected the case.

The solution, how ever, was much simpler. ESXi (5.1 update 1), a standalone server not attached to any cluster, was shoving logfiles onto the datastore I was trying to remove. Obviously, there would be ‘file in use’-errors. D’uh. So, from the host level, I went to the Configuration tab, and from there Advanced Settings. From there, Syslog -> Syslog.global.logDir. If it is null (and it can be null), the logs are all reset if and when you reboot the host. If there’s a path, in the style of [datastore]/path, it’ll use that instead.

So for this particular case, I set a null path, which raises a warning that logs are being stored in a non-persistent location, but it then allowed me to delete the datastore (and/or detach it first) without issue.

I was probably thrown off by the vmkernel messages about corruption, though they may have played a part in why certain files and folders couldn’t be deleted by hand using datastore browser or the command line.

After everything was done, I redirected the logs back to one of the datastores, which clears the warning (no reboot needed here, or when I set the null path earlier).

I tried to find the specific error messages but I couldn’t. I may have them somewhere so I’ll shove them in here if I find them.

Some of the commands that helped me along were:

esxcli storage filesystem list ## This lists the filesystems that the server knows about, including their UUID, label and path. These are needed for many vmkfstools commands, so it’s a good place to start

vmkfstools -B /vmfs/devices/disks/naa.unique_disk_or_partition_goes_here ## This tries to ‘forcibly’ break any existing locks to the partition that may prevent you from proceeding. Didn’t work in my case, but also didn’t tell me anything useful..

vmkfstools -V ## re-read and reload vmfs metadata information

Some of the sites and blogs that helped me along:

VMWare KB article 1009570
VMWare KB article 2004201
VMWare KB article 2032823
VMWare KB article 1009565
http://blogs.vmware.com/vsphere/2012/05/vmfs-locking-uncovered.html
http://kb4you.wordpress.com/2012/04/23/unpresenting-a-lun-in-esxi-5/
VMWare KB article 2011220
VMWare KB article 2004605
http://arritdor.e-wilkin.com/2012/03/removing-vmfs-datastore.html

Thanks to everyone who wrote those.