Let’s face it. In today’s commodity driven private cloud environments we are seeing more and more transient failures as opposed to true hardware failures. Whether that be storage connectivity, switch reset, spanning tree events, packet loss, or latency. They wreak havoc on an environment and I see it almost weekly in working with customers. In Server 2012, experiencing a failure in reading or writing to the virtual hard disk could mean the app you’re running on the VM crashes, or the whole VM would crash.
In comes Server 2016 with Virtual Machine Storage Resiliency. New capabilities have been baked in to detect and deal with storage failures more effectively. If a VM is experiencing issues reading or writing to the VHD/VHDX, the VM will be placed into what’s called a critical pause state. It is essentially frozen in time meaning the system state at the time of the failure is preserved. So, anything happening on that server is preserved until the storage is available again and it will pick right up where is left off. This is ideal for short-term transient failures and will most likely be transparent to the end user.
When the VM is in that critical paused state, however it will not be accessible to clients. Which for longer outages poses an issue but the VM’s session state is preserved making the outage less impactful.
The VM will not stay in this critical paused state forever. There is a configurable time out set via PowerShell. The default is 30 min and it can be set on a per VM basis. After your time out period the VM reboots and session state is lost.
You can determine what transient means to your organization and SLA’s to get your desired behavior out of Server 2016.
Storage resiliency is supported in the following configurations:
- Gen1 and Gen2 VMs
- VHD, VHDX and Shared VHDX
- Local block storage (SAN)
- FC, iSCSI, FCoE, SAS with Cluster Shared Volumes
- File Based storage (NAS)
- File shares using SMB (Server Message Block protocol) with Continuous availability such as a Scale-out File Server (SoFS)
Storage resiliency is not supported with:
- VHD / VHDX on a local hard disk without Cluster Shared Volumes
- Standard file servers
- USB storage
- Hyper-V pass-through disks