While performing resilience tests for a customer I identified that following the failure of a SAN switch the ESXi hosts did not restore lost storage paths once the SAN switch was brought back into service.
The infrastructure in question was an IBM Flex System containing x240 compute nodes running vSphere ESXi 5.5.0 from an internal USB memory key. The compute nodes had the IBM Flex System FC5054-4-port 16Gb FC Adapters installed in them, which are Emulex LPe16000 quad port fibre channel adapters. The Flex System chassis contained 2 x IBM Flex System FC5022 24-port 16Gb ESB SAN Scalable switch, this is a Brocade Switch. The storage was presented from an IBM XIV over the Fibre Channel fabric.
When one of the SAN switches was pulled from the rear of the Flex System to simulate it failing then half of the paths to each LUN were lost. Once the SAN switch was brought back into service the paths via it still stayed “dead”. A reboot of the ESXi host appeared to be the only method of restoring these paths.
I tracked the problem down to the driver being used by the ESXi hosts for the fibre channel adapters; this was version 10.0.100.1-vmw.5188.8.131.521820. I downloaded the Emulex FC adapter for these cards from the VMware website, these was version 10.0.725.203-1OEM.5184.108.40.2061820 (https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI55-EMULEX-LPFC-10072744&productId=353).
I uploaded the VIB file (lpfc-10.0.725.203-1OEM.5220.127.116.111820.x86_64.vib) from the downloaded package to a datastore named XIV_TEMPLATES and stored it in a folder named Drivers. All of the ESXi could access this datastore so I could then use the following procedure to update the driver on each of the ESXi hosts
- Put the host into maintenance mode first
- Enable SSH
- Login to the host via SSH as root
- Check the current version with the command
esxcfg-module -i lpfc
There is a lot of output from this command and the version is displayed on the 3rd line
- Update the driver with the command
esxcli software vib update -v /vmfs/volumes/XIV_TEMPLATES_DS_01/Drivers/lpfc-10.0.725.203-1OEM.518.104.22.1681820.x86_64.vib
output should be
Message: The update completed successfully, but the system needs to be reboot
Reboot Required: true
VIBs Installed: Emulex_bootbank_lpfc_10.0.725.203-1OEM.522.214.171.1241820
VIBs Removed: VMware_bootbank_lpfc_10.0.100.1-1vmw.5126.96.36.1991820
- Reboot the host
- Check the version is now 10.0.725.203-1OEM.5188.8.131.521820 by running the command in 4 above, you will need to enable SSH again
Once the driver was updated and the hosts rebooted then the dead paths were automatically restored following a failure of the SAN switch.