Infrastructure at your Service

David Barbarin

SQL Server AlwaysOn and availability groups : how to isolate the replication traffic from other networks?

A couple of days ago, I was involved in a project about implementing availability groups with SQL Server 2014. My customer wanted to isolate the replication traffic from the public network. The idea behind was to get a better control of the network bandwidth as well as to be sure that the replication can be done without any latency issues in case of heavily network traffic issued by other applications or backup activities for instance.
The first option to meet this requirement consists in redirecting the replication traffic to the private cluster network. However, depending on the data modification workloads on the primary replica, the availability group messaging throughput requirements can be non-trivial and can have an impact on the heartbeat and intra-cluster communication. As a reminder, heartbeat communication is generally sensitive to latency and if they are delayed by a statured NIC, it could cause the cluster node to be removed from the cluster membership. => it could cause the removal of the cluster node from the cluster membership.
So, the second option is to dedicate a network for availability group replication traffic. However, for those that already tried to configure a dedicated network for hadr endpoints communications, you probably noticed that it cannot be configured completely by using SQL Server management studio and availability group wizard.

Let’s demonstrate … In my context, I already set up an availability group that includes 3 replicas and endpoints communications on the public network as shown below:

blog 59- 1 - aag endpoints by default

In order to move endpoints communication to a dedicated network, I have to use a set of T-SQL commands against my existing availability group.

blog 59- 2 - network topology

 

The first one will reconfigure the hadr endpoints:

:CONNECT SQL161
-- drop existing endpoint
DROP ENDPOINT [Hadr_endpoint];
GO

-- recreate endpoint with the specific dedicated network address
CREATE ENDPOINT [Hadr_endpoint] 
    AS TCP (LISTENER_PORT = 5022, LISTENER_IP = (192.168.20.61))
    FOR DATA_MIRRORING (ROLE = ALL, ENCRYPTION = REQUIRED ALGORITHM AES);
GO

-- configure endpoint security
IF (SELECT state FROM sys.endpoints WHERE name = N'Hadr_endpoint') <> 0
BEGIN
    ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;
END
GO

USE [master];
GO
 
GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [DBI-SERVICES\sql_service];
GO
 
  
:CONNECT SQL162
DROP ENDPOINT [Hadr_endpoint];
GO

CREATE ENDPOINT [Hadr_endpoint] 
    AS TCP (LISTENER_PORT = 5022, LISTENER_IP = (192.168.20.62))
    FOR DATA_MIRRORING (ROLE = ALL, ENCRYPTION = REQUIRED ALGORITHM AES);
GO

IF (SELECT state FROM sys.endpoints WHERE name = N'Hadr_endpoint') <> 0
BEGIN
    ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;
END
GO

USE [master];
GO
 
GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [DBI-SERVICES\sql_service];
GO

:CONNECT SQL163
DROP ENDPOINT [Hadr_endpoint];
GO

CREATE ENDPOINT [Hadr_endpoint] 
    AS TCP (LISTENER_PORT = 5022, LISTENER_IP = (192.168.20.63))
    FOR DATA_MIRRORING (ROLE = ALL, ENCRYPTION = REQUIRED ALGORITHM AES);
GO

IF (SELECT state FROM sys.endpoints WHERE name = N'Hadr_endpoint') <> 0
BEGIN
    ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;
END
GO

USE [master];
GO
 
GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [DBI-SERVICES\sql_service];
GO

 

The second one will modify the configuration of each concerned availability replica:

-- Modify the availability group endpoints configuration
-- on all replicas
ALTER AVAILABILITY GROUP testgrp
MODIFY REPLICA ON 'SQL161'
WITH
(
	ENDPOINT_URL = N'TCP://192.168.20.61:5022'
);
GO

ALTER AVAILABILITY GROUP testgrp
MODIFY REPLICA ON 'SQL162'
WITH
(
	ENDPOINT_URL = N'TCP://192.168.20.62:5022'
);
GO

ALTER AVAILABILITY GROUP testgrp
MODIFY REPLICA ON 'SQL163'
WITH
(
	ENDPOINT_URL = N'TCP://192.168.20.63:5022'
);
GO

Voila …

blog 59- 3 - replica states

However, keep in mind that performing changes on both the endpoints and availability groups in this case will have an impact on the database replication process. So try to plan these changes during off-hours business.

See you.

 

10 Comments

  • Tammy says:

    Thank you SO MUCH for sharing this! One of our admins was asking how we could send the AG replication traffic on a different network, and I felt like I was trying to cross the Bridge of Death. I was able to adapt your scripts to our environment and successfully made the switch within a few nervous minutes.

    There are a few articles out there along these lines, but yours explains how to change an existing AG.

    Thanks again!

    Tammy

     
  • Rohit says:

    How do we configure replication network if 2 nodes are running SQL failover cluster. I will not be able to use NIC IP Address since failover will not understand the IP address assigned to endpoint.

     
    • David Barbarin says:

      Hi Robit,

      Sorry for the delay. I’m not sure to understand the question because SQL Server FCIs are pretty different than SQL Server AlwaysOn AGs. With SQL FCIs you can run only one instance at time on the active node. SQL Server AlwaysOn AGs use a different mechanism that includes replication stuff between replicas on two different nodes.

      Regards

       
  • AlwaysOnDBA says:

    Great article.
    Is the new dedicated private network required to be added to the wsfc cluster?

     
    • David Barbarin says:

      Hi shahasane,

      Sorry for delay. I missed the comment :-(

      Since Windows Server 2008 WSFC doesn’t require anymore a dedicated network for cluster internal between nodes. However, I continue to advice my customers to implement such network to ensure isolation between the different networks. An heavily traffic on the public network (from applications, backups and so on …) may increase network latency. Internal communication between nodes requires low latency and we want to avoid false positive issues that may lead to cluster instability. Without separating both networks there is no way to guarantee the isolation. Finally, we may apply network optimizations on internal communication network that would be not possible if we mix them.

      Hope it helps

       
  • CSHS DBA says:

    Thanks a lot for this blog. I was tasked with doing exactly the same thing in my company and it really worked

     
  • Stefan says:

    very nice article! even more clear than in MSFT technet :-) Thanks.
    We build our cluster nodes with hyper-converged NIC’s (2x10GB RDMA). All SQL Nodes in Hyper-V VMs. Do you still recommend in Guest clustering to make virtual NICs for heartbeating and replication? At the end all traffic walks through one big ‘Pipe’. Actually the installation makes automatically an APIPA IP per node for IntraClu – maybe not so elegant.
    Regards

     
    • David Barbarin says:

      Hi Stefan,

      Glad to see the article helped you. You probably right, with a RDMA link, the isolation stuff is useless in this case. Indeed, I may claim the throughput and latency are definitely incredible from my experience at some customer shops. But I would still recommend to consider the workload and the impact on the network bandwidth over the time. We may avoid any consideration by isolating the intra cluster communication from the “public” network :)

      Regards

       

Leave a Reply


3 + = eleven

David Barbarin
David Barbarin

Senior Consultant & Microsoft Technology Leader