This blog post comes from a very interesting discussion with one of my friends about the read-only capabilities of secondary replicas in the context of distributed availability groups. Initially, distributed availability groups are designed to address D/R scenarios and some migration scenario types as well. I already discussed about of one possible migration scenario here. However, we may also take advantage of using secondary replicas as read-only in Reporting Scenarios (obviously after making an assessment of whether the cost is worth it.). In addition, if you plan to introduce scale-out with secondary replicas (even with asynchronous replication) you may consider to use distributed availability groups and cascading feature which will address network bandwidth overhead especially if your cross-datacenter link is not designed to handle heavily replication workload. Considering this last scenario, my friend’s motivation (Sarah Bessard) was to assess distributed availability groups in the replacement of SQL Server replication.

As a reminder, SQL Server 2016 provides new round-robin feature with secondary read-only replicas and extending it by including additional replicas from another availability group seems to be a good idea. But here things become more complicated because transparent redirection and round-robin features sound promising but in fact let’s see if it works when distributed availability group comes into play.

Let’s have a demo on my lab environment. So for the moment two separate availability groups which run on the top of their own Windows Failover Cluster – respectively AdvGrp and AdvGrpDR

blog 116 - 01 - distributed ag - archi

At this stage, we will focus only on my second availability group AdvDrGrp. Firstly, I configured read-only routes for my 4 replicas and here the result:

SELECT 
	r.replica_server_name,
	r.read_only_routing_url,
	g.name AS group_name
FROM 
	sys.availability_replicas AS r
JOIN 
	sys.availability_groups AS g ON r.group_id = g.group_id
WHERE 
	g.name = N'AdvGrpDR'
ORDER BY 
	r.replica_server_name;

select 
	r.replica_server_name AS primary_replica,
	r.read_only_routing_url,
	rl.routing_priority,
	r2.replica_server_name AS read_only_secondary_replica,
	r2.secondary_role_allow_connections_desc,
	g.name AS availability_group
FROM 
	sys.availability_read_only_routing_lists AS rl
JOIN 
	sys.availability_replicas AS r ON rl.replica_id = r.replica_id
JOIN 
	sys.availability_replicas AS r2 ON rl.read_only_replica_id = r2.replica_id
JOIN 
	sys.availability_groups AS g ON g.group_id =  r.group_id
WHERE 
	g.name = N'AdvGrpDR'
ORDER BY 
	primary_replica, availability_group, routing_priority;
GO

blog 116 - 1 - distributed ag ro - RO config

URL read-only routes and preferred replicas are defined for all the replicas. I defined round-robin configuration for replicas WIN20161SQL16\SQL16 to WIN20163SQL16\SQL16 whereas the last one is configured with a preference order (WIN20163SQL16\SQL16 first and WIN20164SQL16\SQL16 if the previous one is not available).

After configuring read-only routes, I decided to check if round-robin comes into play before implementing my distributed availability group. Before running my test I also implemented a special extended event which includes read-only route events as follows:

CREATE EVENT SESSION [alwayson_ro] 
ON SERVER 
ADD EVENT sqlserver.hadr_evaluate_readonly_routing_info,
ADD EVENT sqlserver.read_only_route_complete,
ADD EVENT sqlserver.read_only_route_fail
ADD TARGET package0.event_file ( SET filename=N'alwayson_ro' ),
ADD TARGET package0.ring_buffer;

My test included a basic command based on SQLCMD and –K READONLY special parameter as follows:

blog 116 - 2 - distributed ag ro - RO test

According to the above output we may claim that my configuration is well configured. We may also double check by looking at the extend event output

blog 116 - 3 - distributed ag ro - xe ro output

But now let’s perform the same test after implementing my distributed availability group. The script I used was as follows:

:CONNECT WIN20161SQL16\SQL16
 
USE [master];
GO
 
-- Primary cluster 
CREATE AVAILABILITY GROUP [AdvDistGrp]  
WITH (DISTRIBUTED)   
AVAILABILITY GROUP ON 
'AdvGrp'
WITH   
(   
    LISTENER_URL = 'tcp://lst-advgrp.dbi-services.test:5022',    
    AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
    FAILOVER_MODE = MANUAL,   
	SEEDING_MODE = AUTOMATIC   
),   
'AdvGrpDR'
WITH   
(   
    LISTENER_URL = 'tcp://lst-advdrgrp.dbi-services.test:5022',   
	AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
	FAILOVER_MODE = MANUAL,   
	SEEDING_MODE = AUTOMATIC   
);    
GO   
 

:CONNECT WIN20163SQL16\SQL16
 
USE [master];
GO
 
-- secondary cluster
ALTER AVAILABILITY GROUP [AdvDistGrp]   
JOIN  
AVAILABILITY GROUP ON 
'AdvGrp'
WITH   
(   
    LISTENER_URL = 'tcp://lst-advgrp.dbi-services.test:5022',    
    AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
    FAILOVER_MODE = MANUAL,   
	SEEDING_MODE = AUTOMATIC   
),   
'AdvGrpDR'
WITH   
(   
    LISTENER_URL = 'tcp://lst-advdrgrp.dbi-services.test:5022',   
	AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
	FAILOVER_MODE = MANUAL,   
	SEEDING_MODE = AUTOMATIC   
);    
GO

blog 116 - 0 - distributed ag ro - archi

Performing the previous test after applying the new configuration gives me a different result this time.

blog 116 - 4 - distributed ag ro - RO test 2

It seems that the round-robin capability is not correctly performed although I used the same read-only routes configuration. In the same way, taking a look at the extended event output gave me no results. It seems that transparent redirection and round-robin features from the listener did not come into play this time.

Let’s perform a last test which includes moving AdvDrGrp availability to another replica to confirm transparent redirection does not work as we may expect

:CONNECT WIN20164SQL16\SQL16

ALTER AVAILABILITY GROUP AdvGrpDR FAILOVER;

blog 116 - 5 - distributed ag ro - RO test 3

Same output than previously. The AdvDrGrp availability group has moved from WIN20163SQL16\SQL16 replica to WIN20164SQL16\SQL16 replica and the connection reached out the new defined primary of the second availability group (secondary role from the distributed availability group perspective) meaning we are not redirected on one of defined secondaries.

At this stage, it seems that we will have to implement our own load balancing component – whatever it is – in order to benefit from all the secondary replicas and read-only features on the second availability group. Maybe one feature that Microsoft may consider as improvement for the future.

Happy high availability moment!

By David Barbarin