Patching ScaleIO 2.0 VMware Hosts

Recently did an ESXi patch run, along with some BIOS and firmware updates on a ScaleIO 2.x environment (more precisely 2.0.5014.0). The environment consists of some Dell PowerEdge servers, some of which are ESXi 6.0 build 3380124, some are Linux based, non-virtualized hosts. Luckily this environment was ScaleIO 2.x, because this version has a real maintenance mode in it (1.3.x did not). This means that while I can only patch one host at a time in this layout, I can do it fairly quickly and in a controlled fashion.

ScaleIO Maintenance Mode vs. ESXi Maintenance Mode

These are, obviously, two different things. With ScaleIO maintenance mode, you can put one SDS (providing storage services) host (at least in this configuration with two MDM’s) at a time into maintenance mode, which does not have an adverse impact on the cluster. The remaining SDS will take care of operations, provided it too does not break or go down at the same time.  After you are done patching, you exit maintenance mode, which the makes sure all changes are rebuilt and synced across the cluster nodes. This takes some time depending on the amount of data involved.

ESXi maintenance mode on the other hand, deals with putting the VMware hypervisor layer into maintenance mode so you can patch and perform other operations on it with no VMs running. The order is:

  1. ScaleIO
  2. VMware ESXi

And when coming out of the maintenance break, it’s the reverse.

I left the SVM (virtual machine on the host which takes care of the different functions that the host has, technically a SLES appliance) that I was patching, but I powered it down gracefully before putting the host into maintenance mode.

So accounting for all these things, my order was:

  1. Migrate all running VMs except the SVM off of the host using vMotion
  2. When the host is empty (bar the SVM), put ScaleIO into maintenance mode
    1. This is done via the ScaleIO GUI application, on the Backend page, by right clicking on the host. I did not have to use the force option, and neither should you…
  3. Shut down the SVM via “Shut Down Guest” in vCenter
  4. Put the host into maintenance mode without moving the SVM off the host (I suppose you could move it, but I didn’t)
  5. Scan and Remediate and install other patches (I installed BIOS, iDRAC and some other various updates via iDRAC; I had set them to “Install next reboot” so they would be installed during the same reboot as ESXi does remediation)
  6. Once you are satisfied, take the host out of maintenance mode
  7. Start the SVM on that host
  8. Wait for it to boot
  9. Exit ScaleIO maintenance mode (see 2.)
  10. Check to see that rebuild goes through (ScaleIO GUI application, either the Dashboard or Backend page)
  11. Make sure all warnings and errors clear. During host remediation and patching, I had the following errors
    1. High – MDM isn’t clustered (this is because you’ve shut down one of the SVMs containing the MDM role)
    2. Medium – SDS is disconnected (for the host being remediated)
    3. Low – SDS is in maintenance mode (for the host being remediated)
  12. After the SVM starts, it should clear all but the last alert, and once you have Exited Maintenance Mode, the final alert should clear
Exiting maintenance mode in ScaleIO GUI application
Rebuilding after exiting maintenance mode in ScaleIO

(Expected) Alerts during maintenance

As mentioned, you will have alerts and warnings during this operation. I had the following:

  • First, when putting the SDS into maintenance mode in ScaleIO, one warning about SDS being in maintenance mode:
SDS still on, ESXi not in maintenance
  • After SVM is shut down and ESXi is also placed in maintenance, two more:
All three alerts after host is in maintenance and SVM has been shut down
  • Then once you have remediated and taken the host out of maintenance, and started the SVM, you’re back to one, as in the first picture.
  • When you take the SDS out of maintenance, it will clear the last alert

Note that the highest rated alert, the Critical “MDM isn’t clustered” is actually noteworthy. It means that the SDS you are taking down for maintenance had the MDM role (critical for management of ScaleIO). Normally you’d have another one, and you shouldn’t proceed with any of this if you can only find one MDM, or if you already had this (or any other alert).

EMC has this to say about MDM’s (also see the document h14036-emc-scaleio-operation-ensuring-non-disruptive-operation-upgrade.pdf):

Currently, an MDM can manage up to 1024 servers. When several MDMs are present, an SDC may be managed by several MDMs, whereas, an SDS can only belong to one MDM. ScaleIO version 2.0 and later supports five MDMs (with a minimum of three) where we define a Master, Slave and Tie-breaker MDM.

Roles / Elements in ScaleIO

You can see the installed roles in VMware in the notes field, like so:

Roles in the Notes field in VMware

Elements or roles are (may not be a complete list):

  • MASTER_MDM – Master MDM node, Meta Data Manager, enables monitoring and configuration changes
  • SLAVE_MDM – Secondary MDM node, will take over if Master is unavailable
  • SDS – Storage node, ScaleIO Data Server, provides storage services through HDD, SSD, NVMe etc.
  • SDC – ScaleIO Data Client, consumer of resources (e.g. a virtualization host)
  • RFCACHE – Read-only cache consisting of SSD or Flash
  • RMCACHE – RAM based cache
  • LIA – Light installation agent (on all nodes, creates a trust between node and Installation Manager)
  • TB – Tiebreaker, in case of conflicts inside cluster, counted as a type of MDM, non critical except in HA/conflict situations

ESXi funny business…

While running remediate on the hosts, every single one failed when installing patches.

Scary Fatal Error 15 during remediation

A very scary looking Fatal Error 15. However, there’s a KB on this here.

So, (warm) reboot the host again, wait for ESXi to load the old pre-update version, and do a re-remediate without using the Stage option first. I used stage, as I’m used to, apparently this breaks. Sometimes.

And to re-iterate, I was patching using vCenter Update Manager (or VUM) from 6.0 build 3380124 to 5050593.

Sources

docu82353_ScaleIO-Software-2.0.1.x-Documentation-set.zip from support.emc.com (not actually for the version in use, but similar enough in this case. Use at your own risk..

ScaleIO v2.0.x User Guide.pdf contained in the above mentioned

https://community.emc.com/thread/234110?start=0&tstart=0

https://www.emc.com/collateral/white-papers/h14344-emc-scaleio-basic-architecture.pdf

https://www.emc.com/collateral/white-papers/h14036-emc-scaleio-operation-ensuring-non-disruptive-operation-upgrade.pdf

Home Lab Xeon

The current home lab setup consists of an Intel Core i3-2100 with 16GB of DDR3, a USB drive for ESXi (on 6.5 right now) and a 3TB WD for the VMs. While the Intel i3 performs perfectly for my needs, I came across a Xeon E3-1220 (SR00F, Ivy Bridge), which should be even better!

For the specs, we have the following differences:

Model Intel Xeon E3-1220 Intel Core i3-2100
Released: Q2-2011 Q1-2011
Manufacturing process: 32nm 32nm
Price originally: 189-203 US dollars (more in euroland) 120 USD
Core count: 4 Cores 2 cores
Hyperthreading No Yes
Base Freq: 3.10 GHz 3.1 GHz
Turbo Freq: 3.40 GHz No
TDP: 80 W 65W
Max Memory: 32 GB ECC DDR3 32 GB Non-ECC DDR3
L1 Cache: 128 + 128 KB 64 + 64 KB
L2 Cache: 1 MB 512 KB
L3 Cache: 8 MB 3 MB

So we can see that the Xeon part is 4 core processor, without hyperthreading, so real cores as opposed to the i3’s threads. It’s more power hungry, which is to be expected, but can also Turbo at a higher frequency than the i3. Also, the Xeon has more cache, which is also to be expected with a server grade component.

A notable thing is that the Xeon, being a server part, does not include the GPU components, so I’ll have to add a GPU at least for the installation. I run the server headless anyway, but I want to see it POST at least. I think I’ll have to add a PCI card for this it has no PCI slots so, as I only have one PCIe slot (well there are some x1 slots but I have no such cards), and that’s used by the NIC. The motherboard is an Asrock H61M-DGS R2.0 which has one x16 slot and one x1 slot. Maybe I’ll do it all headless and hope it posts? Or take out the NIC for the installation?

Some yahoo also tried running an x16 card in an x1 slot here. Might try that but since I have to melt off one end of the x1 slot, probably not.

There are apparently some x1 graphics cards, but I don’t have one as I mentioned. An option could be the Zotac GeForce GT 710, which can be had for 60 euros as of this post.

Preparations

I went to the pharmacy to get some pure isopropyl alcohol. It wasn’t on the shelf, so I had to ask for it. I told the lady I need some isopropyl alcohol, as pure as possible. She looked at me funny and said they had some in stock. I told her I’m using it to clean electronics, so she wouldn’t suspect I’m some sort of cringey soon-to-be-blind  (not sure if you get blind from this stuff, but it can’t be good for you) wannabe alcoholic, to which she replied that she doesn’t know what i’ll do with it, or how it will work for that. She got the bottle, which is described as “100 ml Isopropyl Alcohol”. There is a mention of cleaning vinyl disks and tape recorder heads on the back, so I was vindicated. There’s no indication of purity on the bottle, but the manufacturer lists above 99.8% purity here. Doesn’t exactly match the bottle, but it’s close.

Why did I get isopropyl alcohol? Well, because people on the internet said it’s good for cleaning off residual thermal paste from processors and CPU coolers. With common sense 2.0, I can also deduce that anything with a high alcoholic content will evaporate, and not leave behind anything conductive to mess things up. Oh and it cost 6,30€ at the local pharmacy. It’s not listed on the website (or it says it’s no longer a part of their selection).

Let’s see how it performs. I’m using cotton swabs, but I suppose I could use a paper towel. If it leaves behind cotton pieces, I’ll switch to something else.

The Xeon originally had a passive CPU block and a bunch of loud, small case fans, but I will use the same cooler as for the i3.

Take out the i3 and the cooler. Clean the cooler off with the isopropyl:

Isopropyl worked wonders

Put in the E3, new thermal paste. I used some trusty Arctic Silver 5.

Termal paste added, note artistic pattern

Re-attach the cooler and we’re off to the races. I’ll note here that I hate the push through and turn type attachments of the stock Intel cooler. Oh well, it’ll work.

 

Powering on

Powering the thing on was the exciting part. Will there be blue smoke? Will it boot headless? Will it get stuck in some POST screen and require me to press a button to move on? Maybe even go into the BIOS to save settings for the new CPU?

Strangely enough, after a while, I started getting ping replies from ESXi meaning the box had booted.

There’s really nothing left to do. ESXi 6.5 recognizes the new CPU and VMs started booting shortly after.

Xeon E3 running on ESXi 6.5

MicroATX Home Server Build– Part 4

After a longish break, here’s the next installment! So the server has been in production now since last September, and is running very well. After the previous post, this is what’s happened:

  • Installed ESXi 6.0 update 1 + some post u1 patches
  • Installed three VMs: Openbsd 5.8 PF router/firewall machine, Windows Server 2016 Technical Preview to run Veeam 9 on and an Ubuntu PXE server to test out PXE deployment
  • Added a 4 port gigabit NIC that I got second hand

In this post, I’ll be writing mostly about ESXi 6.0 and how I’ve configured various things in there.

For the hypervisor, I bought a super small USB memory, specifically a Verbatim Store n’ Stay (I believe this is the model name) 8GB, which looks like a small Bluetooth dongle. It’s about as small as they get. Here’s a picture of it plugged in:

The Verbatim Store N Go plugged in
The Verbatim Store N Go plugged in

Using another USB stick created with Rufus, which had the ESXi 6u1 installation media on it, I installed ESXi on the Verbatim. Nothing worth mentioning here. Post-installation, I turned on ESXi Shell and SSH, because I like having that local console and SSH access for multiple reasons, one of them I’ll get to shortly (hint: it’s about updating).

Since I didn’t want to use the Realtek NIC on the motherboard to do anything, I used one of the ports on the 4 port card for the VMkernel management port. One of the ports I configured as internal and one as external. The external port is hooked up straight to my cable modem, and it will be passed through straight to the OpenBSD virtual machine, so it can get an address from the service provider. The cable modem is configured as a bridge.

The basic network connections therefore look like this:

Simple graph of my home network
Simple graph of my home network

After the installation, multiple ESXi patches have been released. Those can be found under my.vmware.com, using this link: https://my.vmware.com/group/vmware/patch#search. Patches for ESXi can be installed in two ways: either through vCenter  Update Manager (VUM) or by hand over ssh/local esxi shell. Since I will not be running vCenter Server, VUM is out of the question. Installing patches manually requires you to have a datastore on the ESXi server where you can store the patch while you are installing. The files are .zip files (you don’t decompress them before installation), and are usually a few hundred megabytes in size.

To install a patch, I uploaded the zip file to my datastore (in this case the 2TB internal SATA drive) and through SSH logged on to the host. From there, you just run: esxcli software vib install -d /vmfs/volumes/volumename/patchname.zip

Patches most often require reboots so prepare for one, but you don’t have to do it right away.

Update 2 installed on a standalone ESXi host through SSH
Update 2 installed on a standalone ESXi host through SSH

Edit: As I’m writing this, I noticed Update 2 has been released. I’ll have to install that shortly..  Here’s the KB for Update 2 http://kb.vmware.com/kb/2142184

A one-host environment is hardly a configuration challenge, but some of the stuff that I’ve set up includes:

  • Don’t display a warning about SSH being on (this is under Configuration -> Advanced Settings -> UserVars -> UserVars.SuppressShellWarning “1”)
  • Set hostnames, DNS, etc. under Configuration -> DNS and Routing (also made sure that the ESXi host has a proper dns A record and PTR, too; things just work better this way)
  • Set NTP server to something proper under Configuration -> Time Configuration

For the network, nothing complicated was done as mentioned earlier. The management interface is on vmnic0, vswitch 0. It has a vmkernel port which has the management ip address. You can easily share management and virtual machine networking if you want to, though that’s not a best practice. In that scenario, you would create a port group under the same vswitch, and call it something like Virtual Machine port group for instance. That port group doesn’t get an IP, it’s just a network location you can refer to when you are assigning networking for your VMs. What ever settings are on the physical port / vswitch / port group apply to VMs that have been assigned to that port group.

By the way, after the install of Update 2, I noticed something cool on the ESXi host web page:

Host..client?
VMware Host..client?

Hold on, this looks very familiar to the vSphere web client which has been available for vCenter since 5.1?

Very familiar!
Very familiar!

Very familiar in fact! This looks awesome! Looks like yet another piece that VMware needs to kill of the vSphere Client. Not sure I’m ready to give it up just yet, but the lack of a tool to configure a stand-alone host was one of the key pieces missing so far.

Host web client after login
Host web client after login

In the next  post I will be looking at my VMs and how I use them in my environment.

Relevant links:

https://rufus.akeo.ie/
http://www.verbatim.com/prod/flash-memory/usb-drives/everyday-usb-drives/netbook-usb-drive-sku-97463/
The Host UI web client was previously a Fling, something you could install but that wasn’t released with ESXi https://labs.vmware.com/flings/esxi-embedded-host-client
But now it’s official: http://pubs.vmware.com/Release_Notes/en/vsphere/60/vsphere-esxi-60u2-release-notes.html

MicroATX Home Server Build – Part 3

Because I am impatient, I went ahead and got a motherboard, processor and memory. The components that I purchased were:

  • Asrock H61M-DGS R2.0 (Model: H61M R2.0/M/ASRK, Part No: 90-MXGSQ0-A0UAYZ)
  • 16 GB (2x8GB) Kingston HyperX Fury memory (DDR3, 1600MHz, HX316C10FBK2/16, individual memories are detected as: KHX1600C10D3/8G)
  • Intel i3-2100 (2 cores, with hyperthreading)

I ended up with this solution because I realized I may not have enough money to upgrade my main workstation, to get the parts from that machine into this one. I also didn’t have the funds to get a server grade processor, and getting an mATX server motherboard turned out to be difficult on short notice (did I mention I’m an impatient bastard?).

I ended up paying 48€ for the motherboard, 45€ for the processor (used, including Intel stock cooler) and 102 bucks for the 16GB memory kit.

The motherboard has the following specs:

  • 2 x DDR3 1600 MHz slots
  • 1 x PCIe 3.0 x16 slot
  • 1 x PCIe 2.0 x1 slot
  • 4 x SATA2
  • 8 USB 2.0 (4 rear, 4 front)
  • VGA and DVI outputs

The factors that led to me choosing this motherboard were mainly: Price, availability, support for 2nd and 3rd generation Intel Core processors (allowing me to use the i3 temporarily, and upgrade to the i5 later if I feel the need), and the availability of two PCIe slots. All other features were secondary or not of importance.

The reductions in spec that I had to accept were: No support for 32GB memory (as mentioned in the previous post), no integrated Intel NIC (this has crappy Realtek NIC, but I might still use that for something inconsequential as management; probably not though)

These pitfalls may or may not be corrected a later date when I have more money to put toward the build, and patience to wait for parts.

The CPU is, as mentioned, an Intel i3-2100. It’s running at 3.1 GHz, has two cores, four threads (due to HT), 3MB Intel ‘SmartCache’, and a 65W TDP.  It does support 32GB of memory on a suitable motherboard. I doubt the CPU will become a bottleneck anytime soon, even though it is low-spec (it originally retailed for ~120€ back when it was released in 2011). The applications and testing I intend to do is not CPU heavy work, and since I have four logical processors to work with in ESXi, I can spread the load out some.

Putting it all together

Adding the motherboard was fairly easy. There were some standoffs already in the case, but I had to add a few to accommodate the mATX motherboard. Plenty of space for cabling from the PSU, and I paid literally zero attention to cable management at this point. The motherboard only had two fan headers: One for the CPU fan (obviously mandatory..) and one for a case fan. I opted to hook up the rear fan (included with the case) to blow out hot air from around the CPU. I left the bottom fan in, I may hook it up later, or replace it with the 230mm fan from Bitfenix.

Initially, I did not add any hard drives. ESXi would run off a USB 2.0 memory stick (Kingston Data Traveler 4GB), and the VMs would probably run from a NAS. I ended up changing my mind (more on this in the next post). For now, I wanted to validate the components. I opted to run trusty old MemTest86+ for a day or so. Here’s the build running MemTest:

Build almost complete, running MemTest86+
Build almost complete, running MemTest86+

Looks to be working fine!

Here’s a crappy picture of the insides of the case, only covered by the HDD mounting plate:

Side panel open, showing HDD mounting plate, side of PSU
Side panel open, showing HDD mounting plate, side of PSU

One thing to note here is that if you want the side panel completely off, you need to disconnect the cables seen to the front left. These are for the power and reset buttons, USB 2.0 front ports and HDD led. They are easy to remove, so no biggie here.

One note on the motherboard: There has only ever been one release of the BIOS, version 1.10. This was installed at the factory (obviously, as there were no other versions released at the time of writing). If you do get this board, make sure you are running the latest BIOS. Check for new versions here: http://www.asrock.com/mb/Intel/H61M-DGS%20R2.0/?cat=Download&os=BIOS

So this is the current state of the build. Next up…

  • Installing ESXi 6.0U1 (just released in time for this build)
  • Deciding on where the VMs would run
  • Adding NIC and possible internal storage
  • Configuring ESXi
  • Installing guest VMs

Stay tuned!

Relevant links:

http://ark.intel.com/products/53422
http://www.asrock.com/mb/Intel/H61M-DGS%20R2.0/

http://www.kingston.com/datasheets/HX316C10FBK2_16.pdf
https://pubs.vmware.com/Release_Notes/en/vsphere/60/vsphere-esxi-60u1-release-notes.html

MicroATX Home Server Build – Part 1

Today I officially started my new home server build by ordering a case. The requirements for building a new home server are the following:

  • It needs to be physically small
  • It needs to be able to operate quietly
  • It needs to utilize some current hardware to reduce cost
  • It needs to be able to run VMware ESXi 6
  • Needs to support 32GB RAM for future requirements
  • Needs to accommodate or contain at least 2 Intel Gigabit NICs

Having run a number of machine at home in the past three decades, some of these have become more or less must-haves. Others are more of a nice-to-have. I’ve had some real server hardware running at home, but most of the hand-me-down stuff has been large, powerhungry and/or loud to the point where running it has been a less than pleasurable experience.

The last candidate was an HP Proliant 350 G5 (or so?), which was otherwise nice, but too loud.

You will note that power isn’t a requirement. I don’t care, really. My monthly power bills for a 2.5 person household of 100 m^2 is in the neighborhood of a few dozen euros. I really don’t know, or care. I’m finally at a position where I can pick one expense that I don’t have to look at so closely. For me, that expense is power. Case closed.

The conditions I’ve set forth rule out using a classic desktop machine cum server thing. Those are usually not quiet, they use weird form factors for the motherboard, seldom support large amounts of RAM etc. etc. A proper modern server can be very quiet, and quite scalable as most readers will know. A new 3rd or 4th generation Xeon machine in the 2U or Tower form factor can be nigh silent when running at lower loads, and support hundreds of gigabytes of RAM. They are, however, outside my price range, and do not observe the “Needs to utilize some current hardware to reduce cost”-condition.

Astute readers will also pipe up with, “Hey, this will probably mean you won’t use ECC memory! That’s bad!”. And I’ll agree! However, ECC is not a top priority for me, as I am not running data or time sensitive applications on this machine. Data will reside elsewhere, and be backuped to yet another “elsewhere”, so even if there is a crash, with loss of data (which is still unlikely, even *with* non-ECC memory), I’ll just roll back a day or so, not losing much of anything. A motherboard supporting ECC would be nice, but definitely not a requirement.

Ruling out classic desktop workstations and expensive server builds I am left with two choices:

  1. Get a standard mATX case + motherboard
  2. Get a server grade mATX motherboard and some suitable case

The case would probably end up being around the same choice, as the only criteria is that it is small, and can accommodate fans that are quiet (meaning non-small fans). The motherboard presents a bigger question, and is one that I have yet to solve.

I could either go with a Supermicro, setting me back between 200-400 €, and get a nice server grade board, possibly with an integrated intel nic, out of band management etc., or I could go with a desktop motherboard that just happens to support 32GB of memory. There are such motherboards around for less than 100€ (For instance, Intel B85 chipset motherboards from many vendors).

Here’s the tricky part: I could utilize my current i5-2500 (Socket LGA1155) in this build, and associated memory. This would mean that the motherboard would obviously need to support that socket. Note! The 1155 socket is not the current Intel socket. We’re now at generation 6 (Skylake), which uses an altogether different socket (Socket 1151), which is not compatible with generations 2&3 (which used 1155), generation 4&5 (which used 1150).

Using my current processor would save some money. Granted, I’d have to upgrade the machine currently running that processor (meaning a motherboard, cpu and memory upgrade, probably to Haswell or Broadwell, i.e. Socket 1150), meaning the cost would be transferred there. But then again, I tend to run the most modern hardware on my main workstation, as it’s the one I use as my daily driver. The server has usually been re-purposed older hardware.

Case selection

I’ve basically decided on the form factor, which will be micro ATX (or mATX or µATX or whatever), so I can go ahead an buy a case. Out of some options, I picked something that is fairly spacey inside, and somewhat pretty on the outside, which doesn’t cost over 100€. The choice I ended up with was the Bitfenix Prodigy mATX Black.

Here’s the case, picture from Bitfenix (all rights belong to them etc.):

bitfenix_prodigy

Some features include:

  • mATX or mITX form factor
  • 2 internal 3.5″ slots
  • Suitable for a standard PS2 standard ATX PSU (which I happen to have lying around)
  • Not garish or ugly by my standards

I ordered the case today from CDON, who had it for 78,95€ + shipping (which was 4,90€). Delivery will happen in the next few days.

The current working idea is to get an mATX motherboard which supports my i5-2500 and 32GB of DDR3 memory. I’ve been looking at some boards from Gigabyte, Asrock and MSI. MSI is pretty much out, just because I’ve had a lot of bad experience with their kit in the past. May be totally unjustified, but that’s the way it feels right now.

I haven’t still ruled out getting a Supermicro board, something like this one: http://www.supermicro.nl/products/motherboard/Xeon/C202_C204/X9SCM-F.cfm but that would rule out using my current CPU and memory. I’d have to get a new CPU, which, looking at the spec, would either be a Xeon E3 or a 2nd or 3rd generation i3 (as i5’s and i7’s are for some reason not supported). i3 would probably do well, but I would take a substantial CPU performance hit going from Xeon or i5 down to i3. I’d lose 2 cores at least, which are nice to have in a virtualized environment, such as this.

Getting the board would set me back about 250€ and the CPU, even if I got it used would probably be around 100€. Compare this against an 80-100€ desktop motherboard, use existing CPU, existing memory (maybe?). Then again, I’ll have to upgrade my main workstation if I steal the CPU from there. Oh well. More thinking is in order, me thinks.

 

Last minute edit:

The hardware I have at my disposal is as follows:

  • Intel NICs in the PCI form factor
  • Some quad-NIC thing, non intel, PCIe
  • Corsair ATX power supply
  • Various fans
  • If I cannibalize my main rig:
    • i5-2500
    • 16GB DDR3 memory (4x4GB)