Infrastructure at your Service

David Barbarin

Using Windows 2012 R2 & dynamic witness feature with minimal configurations

(Do you have ever seen the following message while you’re trying to validate your cluster configuration with your availability groups or FCI’s and Windows Server 2012 R2?

blog_32_-_0_-_cluster_validation

Microsoft recommends to add a witness even if you have only two cluster members with dynamic weights. This recommendation may make sense regarding the new witness capabilities. Indeed, Windows 2012 R2 improves the quorum resiliency with the new dynamic witness behavior. However, we need to take care about it and I would like to say at this point that I’m reluctant to recommend to meet this requirement with a minimal cluster configuration with only 2 nodes. In my case, it’s very usual to implement SQL Server AlwaysOn and availability groups or FCI’s architectures with only two cluster nodes at customer places. Let’s talk about the reason in this blog post.

First of all, let’s demonstrate why I don’t advice my customers to implement a witness by following the Microsoft recommendation. In my case it consists in adding a file share witness on my existing lab environment with two cluster nodes that use the dynamic weight behavior:

blog_32_-_1_-_cluster_2_nodes_configuration_nodeweight

Now let’s introduce a file share witness (DC2WINCLUST-01) as follows:

blog_32_-_2_-_adding_FSW

We may notice after introducing the FSW that the node weight configuration has changed:

blog_32_-_3_-_cluster_new_configuration

blog_32_-_4_-_cluster_fsw_config

The total number of votes equals 3 here because we are in the situation where we have an even number of cluster members plus the witness. As a reminder, we are supposed to use a dynamic witness feature according to the Microsoft documentation here.

In Windows Server 2012 R2, if the cluster is configured to use dynamic quorum (the default), the witness vote is also dynamically adjusted based on the number of voting nodes in current cluster membership. If there is an odd number of votes, the quorum witness does not have a vote. If there is an even number of votes, the quorum witness has a vote.
The quorum witness vote is also dynamically adjusted based on the state of the witness resource. If the witness resource is offline or failed, the cluster sets the witness vote to “0.”

The last sentence draws my attention and now let’s introduce a failure of the FSW. In my case I will just turn off the share used by my WFSC as follows:

blog_32_-_5_-_disable_fileshare

As expected, the file share witness state has changed from online to failed state by the resource control manager:

blog_32_-_6_-_fileshare_witness_failed

At this point, according to the Microsoft documentation, we may expect that the WitnessDynamicWeight property will change by the cluster but to my surprise, this was not the case:

blog_32_-_62_-_fileshare_witness_configuration

In addition, after taking a look at the cluster log I noticed this sample among the entire log records:

000014d4.000026a8::2015/02/20-12:45:43.594 ERR   [RCM] Arbitrating resource ‘File Share Witness’ returned error 67
000014d4.000026a8::2015/02/20-12:45:43.594 INFO [RCM] Res File Share Witness: OnlineCallIssued -> ProcessingFailure( StateUnknown )
000014d4.000026a8::2015/02/20-12:45:43.594 INFO [RCM] TransitionToState(File Share Witness) OnlineCallIssued–>ProcessingFailure.
000014d4.00001ea0::2015/02/20-12:45:43.594 INFO [GEM] Node 1: Sending 1 messages as a batched GEM message
000014d4.000026a8::2015/02/20-12:45:43.594 ERR   [RCM] rcm::RcmResource::HandleFailure: (File Share Witness)
000014d4.000026a8::2015/02/20-12:45:43.594 INFO [QUORUM] Node 1: PostRelease for ac9e0522-c273-4da8-99f5-3800637db4f4
000014d4.000026a8::2015/02/20-12:45:43.594 INFO [GEM] Node 1: Sending 1 messages as a batched GEM message
000014d4.000026a8::2015/02/20-12:45:43.594 INFO [QUORUM] Node 1: quorum is not owned by anyone
000014d4.000026a8::2015/02/20-12:45:43.594 INFO [RCM] resource File Share Witness: failure count: 0, restartAction: 0 persistentState: 1.
000014d4.00001e20::2015/02/20-12:45:43.594 INFO [GUM] Node 1: executing request locally, gumId:281, my action: qm/set-node-weight, # of updates: 1
000014d4.000026a8::2015/02/20-12:45:43.594 INFO [RCM] numDependents is zero, auto-returning true
000014d4.00001e20::2015/02/20-12:45:43.594 WARN [QUORUM] Node 1: weight adjustment not performed. Cannot go below weight count 3 in a hybrid configuration with 2+ nodes

 

The latest line (highlighted in red) is the most important. I guess here that “hybrid configuration” means my environment includes 2 cluster nodes and one witness (regarding its type). An interesting thing to notice is a potential limitation that exists for the dynamic witness behavior that cannot be performed below two cluster nodes. Unfortunately, I didn’t find any documentation from Microsoft about this message. Is it a bug or just a missing entry to the documentation or have I overlook something concerning the cluster behavior? At this point I can’t tell anything and I hope to get soon a response from Microsoft. The only thing I can claim at this point is that if I lose a cluster node, the cluster availability will be compromised. This reproduced issue is not specific on my lab environment and I faced the same behavior several times at my customers.

Let’s demonstrate by issuing a shutdown of one of my cluster node. After a couple of seconds, connection with my Windows failover cluster is lost and here what I found by looking at the Windows event log:

blog_32_-_7_-_quorum_lost

As I said earlier, at this point, with minimal configuration with two cluster nodes, I always recommend to my customers to skip this warming. After all, having only two cluster members with dynamic quorum behavior is sufficient to get a good quorum resiliency. Indeed, according to the Microsoft documentation to allow the system to re-calculate correctly the quorum, a simultaneous failure of a majority of voting members should not occur (in others words, the failure of cluster members must be sequential) and with two cluster nodes we may only lose one node at the same time in all cases.

(Update: In fact, we can still face last man standing issue without implementing a witness in this specific configuration. Please read the second part of this article for more details).

What about more complex environments? Let’s say a FCI with 4 nodes (two cluster nodes on each datacenter) and a file share witness on the first datacenter. In contrast, in this case, if the file share witness fails, the cluster will adjust correctly the overall node weight configuration both on the cluster nodes and on the witness. This is completely consistent with the message found above: “Cannot go below weight count 3″.

blog_32_-_8_-_quorum_adjustement_with_4_nodes

The bottom line is that the dynamic witness feature is very useful but you have to take care about its behavior with minimal configurations based on only two cluster nodes which may introduce unexpected results in some cases.

Happy cluster configuration!

 

 

 

 

2 Comments

Leave a Reply


− three = 0

David Barbarin
David Barbarin

Senior Consultant & Microsoft Technology Leader