Infrastructure at your Service

David Barbarin

SQL Server 2019 availability group R/W connection redirection, routing mesh and load balancing

SQL Server 2019 availability group feature will provide secondary to primary replica read/write connection redirection. I wrote about it in a previous blog post here. It consists in redirecting client application connections to the primary replica regardless of the target server specified in the connections string. That’s pretty interesting in some scenarios as read scale-out or specific multi-subnet configurations where creating the traditional AG listener is not an viable option.

The new R/W connection redirection capability does the job but the one-million-dollar question here is what’s happen if one of the replicas specified in my connection string becomes suddenly unavailable? Referring  to the BOL  the connection will fail regardless the role that the replica on the target server plays but we can mitigate the issue by introducing the failover partner parameter in the connection string. As a reminder, the Failover Partner keyword in the connection string works in a database mirror setup and prevent prolonged application downtime. But from my point of view, we could go likely another way and get benefit to all the power of this new availability group feature by introducing a load balancer on the top of this topology as we could do with Docker Swarm or K8s architectures. Indeed, if we take a look more closely, this new provided mechanism by SQL Server 2019 is pretty similar to the routing mesh capabilities of container orchestrators with the same advantages and weaknesses as well. I wrote a blog post about Docker Swarm architectures where we need to implement a proxy to load balance the traffic to avoid getting stuck with the routing mesh capability where a node get unhealthy.

I just applied the same kind of configuration by using an HA Proxy (but you can use your own obviously) with my availability group topology and the behavior was basically the same. Here the intended behavior:

 

Here the configuration of my HAProxy including my 3 AG replicas (WIN20191, WIN20192, WIN20193) and a round robin algorithm at the bottom:

…
backend rserve_backend
    mode tcp
    option tcplog
    option log-health-checks
    option redispatch
    log global
    balance roundrobin
    timeout connect 10s
    timeout server 1m
    server WIN20191 192.168.40.205:1433 check
    server WIN20192 192.168.40.206:1433 check
    server WIN20193 192.168.40.207:1433 check

 

Let’s do a try with connections directly to my HAProxy that is listen on port 81 in my test scenario. Note that for this first test I will connect to the master database to force the local connection getting stick to each replica rather than going through the R/W redirection process. The goal is to check if the round-robin algorithm come into play …

$connectionString = "Server=192.168.40.14,81;uid=sa; pwd=xxxx;Integrated Security=False;Initial Catalog=master;pooling=false”

$connection = New-Object System.Data.SqlClient.SqlConnection
$connection.ConnectionString = $connectionString
$connection.Open()

$sqlCommandText="SELECT 'Current server : ' + @@SERVERNAME AS server_name"
$sqlCommand = New-Object system.Data.sqlclient.SqlCommand($sqlCommandText,$connection)
$sqlCommand.ExecuteScalar()

$connection.Close()
$connection.Dispose()

 

… and that’s the case as show below:

Test connexion initial server nb : 0 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 1 - 192.168.40.14,81 - Current server : WIN20192
Test connexion initial server nb : 2 - 192.168.40.14,81 - Current server : WIN20193
Test connexion initial server nb : 3 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 4 - 192.168.40.14,81 - Current server : WIN20192
Test connexion initial server nb : 5 - 192.168.40.14,81 - Current server : WIN20193

 

Let’s do a try by forcing the R/W redirection now. This time I set up the correct target database name named dummy2 for my availability group AG2019.

$connectionString = "Server=192.168.40.14,81;uid=sa; pwd=xxxx;Integrated Security=False;Initial Catalog=dummy2;pooling=false”

$connection = New-Object System.Data.SqlClient.SqlConnection
$connection.ConnectionString = $connectionString
$connection.Open()

$sqlCommandText="SELECT 'Current server : ' + @@SERVERNAME AS server_name"
$sqlCommand = New-Object system.Data.sqlclient.SqlCommand($sqlCommandText,$connection)
$sqlCommand.ExecuteScalar()

$connection.Close()
$connection.Dispose()

Test connexion initial server nb : 0 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 1 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 2 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 3 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 4 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 5 - 192.168.40.14,81 - Current server : WIN20191

 

This time the R/W redirection is taking effect and each established connection is redirected to my primary replica – WIN20191 this time.

Finally, let’s simulate an outage of one of my replicas, let’s say the WIN20193 replica with a turn off operation and let’s see what’s happen below:

Test connexion initial server nb : 32 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 33 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 34 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 35 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 36 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 37 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 38 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 39 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 40 - 192.168.40.14,81 - Current server : WIN20191

 

Well, from a connection perspective nothing has changed and the HAProxy continues to load balance connections between the remaining healthy replicas. The R/W connection redirection mechanism still continue to come into play as well. A quick look at the HAProxy indicates the WIN20193 replica got unhealthy and the HAProxy has evicted this replica from the game.

[WARNING] 203/063813 (1772) : Health check for server rserve_backend/WIN20193 failed, reason: Layer4 timeout, check durati               on: 2001ms, status: 2/3 UP.
[WARNING] 203/063818 (1772) : Health check for server rserve_backend/WIN20193 failed, reason: Layer4 timeout, check durati               on: 2002ms, status: 1/3 UP.
[WARNING] 203/063822 (1772) : Health check for server rserve_backend/WIN20193 failed, reason: Layer4 timeout, check durati               on: 2001ms, status: 0/2 DOWN.
[WARNING] 203/063822 (1772) : Server rserve_backend/WIN20193 is DOWN. 2 active and 0 backup servers left. 2 sessions activ               e, 0 requeued, 0 remaining in queue.
[WARNING] 203/063848 (1772) : Health check for server rserve_backend/WIN20193 failed, reason: Layer4 connection problem, i               nfo: "No route to host", check duration: 4ms, status: 0/2 DOWN.

 

The new R/W redirection capability provided by Microsoft will extend possible scenarios with availability groups for sure. With previous versions of SQL Server, using a load balancer was limited to R/O workloads but SQL Server 2019 will probably change the game on this topic. Let’s see what’s happen in the future!

 

 

 

 

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

David Barbarin
David Barbarin

Principal Consultant & Microsoft Technology Leader