(Do you have ever seen the following message while you’re trying to validate your cluster configuration with your availability groups or FCI’s and Windows Server 2012 R2?
Microsoft recommends to add a witness even if you have only two cluster members with dynamic weights. This recommendation may make sense regarding the new witness capabilities. Indeed, Windows 2012 R2 improves the quorum resiliency with the new dynamic witness behavior. However, we need to take care about it and I would like to say at this point that I’m reluctant to recommend to meet this requirement with a minimal cluster configuration with only 2 nodes. In my case, it’s very usual to implement SQL Server AlwaysOn and availability groups or FCI’s architectures with only two cluster nodes at customer places. Let’s talk about the reason in this blog post.
First of all, let’s demonstrate why I don’t advice my customers to implement a witness by following the Microsoft recommendation. In my case it consists in adding a file share witness on my existing lab environment with two cluster nodes that use the dynamic weight behavior:
Now let’s introduce a file share witness (DC2WINCLUST-01) as follows:
We may notice after introducing the FSW that the node weight configuration has changed:
The total number of votes equals 3 here because we are in the situation where we have an even number of cluster members plus the witness. As a reminder, we are supposed to use a dynamic witness feature according to the Microsoft documentation here.
In Windows Server 2012 R2, if the cluster is configured to use dynamic quorum (the default), the witness vote is also dynamically adjusted based on the number of voting nodes in current cluster membership. If there is an odd number of votes, the quorum witness does not have a vote. If there is an even number of votes, the quorum witness has a vote.
The quorum witness vote is also dynamically adjusted based on the state of the witness resource. If the witness resource is offline or failed, the cluster sets the witness vote to “0.”
The last sentence draws my attention and now let’s introduce a failure of the FSW. In my case I will just turn off the share used by my WFSC as follows:
As expected, the file share witness state has changed from online to failed state by the resource control manager:
At this point, according to the Microsoft documentation, we may expect that the WitnessDynamicWeight property will change by the cluster but to my surprise, this was not the case:
In addition, after taking a look at the cluster log I noticed this sample among the entire log records:
The latest line (highlighted in red) is the most important. I guess here that “hybrid configuration” means my environment includes 2 cluster nodes and one witness (regarding its type). An interesting thing to notice is a potential limitation that exists for the dynamic witness behavior that cannot be performed below two cluster nodes. Unfortunately, I didn’t find any documentation from Microsoft about this message. Is it a bug or just a missing entry to the documentation or have I overlook something concerning the cluster behavior? At this point I can’t tell anything and I hope to get soon a response from Microsoft. The only thing I can claim at this point is that if I lose a cluster node, the cluster availability will be compromised. This reproduced issue is not specific on my lab environment and I faced the same behavior several times at my customers.
Let’s demonstrate by issuing a shutdown of one of my cluster node. After a couple of seconds, connection with my Windows failover cluster is lost and here what I found by looking at the Windows event log:
As I said earlier, at this point, with minimal configuration with two cluster nodes, I always recommend to my customers to skip this warming. After all, having only two cluster members with dynamic quorum behavior is sufficient to get a good quorum resiliency. Indeed, according to the Microsoft documentation to allow the system to re-calculate correctly the quorum, a simultaneous failure of a majority of voting members should not occur (in others words, the failure of cluster members must be sequential) and with two cluster nodes we may only lose one node at the same time in all cases.
(Update: In fact, we can still face last man standing issue without implementing a witness in this specific configuration. Please read the second part of this article for more details).
What about more complex environments? Let’s say a FCI with 4 nodes (two cluster nodes on each datacenter) and a file share witness on the first datacenter. In contrast, in this case, if the file share witness fails, the cluster will adjust correctly the overall node weight configuration both on the cluster nodes and on the witness. This is completely consistent with the message found above: “Cannot go below weight count 3″.
The bottom line is that the dynamic witness feature is very useful but you have to take care about its behavior with minimal configurations based on only two cluster nodes which may introduce unexpected results in some cases.
Happy cluster configuration!