While adding a couple of datastores to a newly-built vSphere 6 cluster recently, the following error message came up:
All shared datastores failed on the host <hostname>
Everything seemed normal and I hadn’t done anything different. Plus, the datastores were actually visible and operational so it seemed like a false positive. I did a couple of HBA rescans and refresh etc. to see if it goes away, but it didn’t! ESXi build used was: 6.0.0, 3073146.
Annoyingly, three out of the four got fixed by just a reboot. One of them, however, persisted with the error (as seen in the screenshot above). Firstly, I don’t like when false positives come up and a reboot fixes them. Secondly, it’s worse when that is inconsistent too!
With that host, there was nothing obvious that I could spot in the logs. Failing that, I tried various things e.g. Rescan, removal of iSCSI Port-Binding, another reboot, disabling the ports on the distributed switch etc. but nothing worked. Weirdly, it started to seem that nothing I did changed anything in the configuration. Even the “Uptime” field wasn’t updating, despite several reboots!
At this point, I thought of removing the host from cluster/vCenter and adding it back, in the hope that whatever is stuck in the configuration will get reset as a result. Then I discovered that all switch and VMKernel configuration options (just for this host!) were also greyed-out. That was a problem because now I couldn’t remove the host from the distributed switch (removal attempt complained that ports are in use), which in turn, prevented removal from the cluster.
Unfortunately, there isn’t a good resolution here. I had to go to DCUI and reset networking to the default. Once done, I gave the host a reboot. vCenter now started complaining that vDS configuration doesn’t match to what’s known to vCenter so I cleanly removed the host from the vDS. That satisfied vCenter so following that, I also cleanly removed the host from the cluster.
After a reboot, I added the host back into the cluster and the error was finally gone! Rest of the process was to add the host to vDS and configure iSCSI etc. on it. Everything from that point forward worked as expected.
This was definitely a weird one and I didn’t like the fact that I had to remove the host and put it back in to get the error fixed. So, I am documenting the story here in case someone else sees it too.
I also saw another false positive briefly which the same first reboot also fixed (also why I couldn’t capture the screenshot of it):
Deprecated VMFS volume(s) found on the host. Please consider upgrading volume(s) to the latest version
However this, as I later found out, is a known issue. It’s documented here and can also be fixed by just restarting the management agent.