Infrastructure at your Service

David Barbarin

SQL Server AlwaysOn: when a listener becomes the cluster and vice versa

Let’s talk about a funny story that concerns an interesting issue that I faced a couple of months ago at one of my customers. Initially, the problem concerned only the creation of an availability group listener but after deleting the related availability group, he noticed quickly that the deletion failed but even more strange, he noticed that the virtual network name related to the listener corresponded to the virtual computer object of the cluster itself.

In order to expose the issue, I have simulated the same issue on my lab environment and here the initial picture of the scenario:

blog 79- 1 - initial context

First let’s focus on the cluster name. Initially, the cluster name was WIN2012CLUST and you may notice that the name has been changed to LST_SHPT that corresponds to the listener of my shptgrp availability group. In addition, if we take a look at the cluster core resources section, both the cluster network name and the virtual IP address related to the cluster itself have effectively disappeared as shown below:

blog 79- 2 - cluster core resources

Next, the first attempt made by my customer was to try to drop the availability group from SQL Server Management Studio but he faced the following error:

blog 79- 22 - drop aag

So, let’s summarize the initial situation:

  • The cluster has been renamed with a name that corresponds to the availability group listener name
  • The cluster core resources (virtual network name + virtual ip address) have disappeared from the cluster core resources group
  • An availability group exists and includes a listener name that corresponds to the cluster name  and a specific virtual ip address
  • Dropping the availability group seems to fail
  • Dropping the listener name does’nt work because it is a cluster core resource

 

In addition, let’s precise that we were in a production environment with a more complex architecture compared to my lab environment that included multiple availability groups and critical applications on the same WSFC. In such context, attempting to perform dangerous manipulations at the Windows failover cluster layer mean compromising potentially the entire environment availability. Fortunately, others availability groups are not impacted by this weird situation, so we decided to wait a non-business day to fix this issue.

At this point, the first thing we wanted to do was to rename correctly the cluster and this is what the Microsoft support advised but unfortunately this action was not successful with the following error:

blog 79- 3 - cluster resource rename

We guessed we can’t rename the cluster because we already had a resource with the same name. We tried to rename the virtual network related to the listener with a dummy name but once again we faced an error. It looked like a death-spiral …

But after analysing calmly the situation (working in a maintenance windows provides some advantages. Time is not your enemy in this case), we find out how to address this issue and the solution was in fact pretty simple (I know, a solution is always simple when you get it :-) ) So the solution consisted in using PowerShell cmdlets. Why PowerShell? In fact, we figured out that the both cluster network name LST_SHPT and its virtual IP address resources are initially member of the cluster core group. We verified our assumption by using the  Get-ClusterResource cmdlet as following:

Indeed, their names put us on the right back.  So we could assume safely that these resources have been moved accidentally  from the cluster core resources group to the resource group related to the availability group. However, by using the cluster manager console, we quickly saw that it didn’t provide any way to move resources back to the cluster core resource group and this is precisely where PowerShell may help us. Let’s find out the right cmdlet to use for this specific tasks:

Well,using the Move-ClusterResource cmdlet was the good solution in your case:

And finally we have returned to a more normal situation. Let’s have a look at the resource core section from the cluster manager console:

blog 79- 7 - core resource new situation

What about dropping the availability group and renaming the Microsoft failover cluster now? Well, we dropped first the availability group from SQL Server Management Studio without encountering any issue. Then we also were able to rename the cluster name and to change its related virtual IP address after performing some active directory and DNS tricks (like enabling the disabled computer object related to the VCO and changing the permissions on the active directory and DNS records etc …).

blog 79- 8 - core resource final situation

It was finally a happy end with a high available environment that works properly and a good drink to celebrate this!

Happy clustering!

 

 

 

 

Leave a Reply


× 9 = twenty seven

David Barbarin
David Barbarin

Senior Consultant & Microsoft Technology Leader