Infrastructure at your Service

David Barbarin

SQL Server 2017 high availability and Resumable Online Indexes

Ensuring data availability is an important part of the high availability design. SQL Server AlwaysOn features including SQL Server FCIs and availability groups address some aspects of the problem but we may also rely on the online operations features to maximize data availability. Indeed, for some high-critical workloads, offline operations during the maintenance phase are not permitted and may contribute to call into question the entire architecture.

But have you ever faced an outage from your primary replica leading to a failover to the secondary replica during the database maintenance timeline? Yes, your architecture is designed to behave in this way for sure but let’s say you were in the middle of a rebuild index operation concerning a big table and everything is rolled back now. In this case, you will probably have to wait the next windows maintenance time to initiate another rebuild operation, but from the start …

Fortunately, the new Resumable Online Indexes feature from SQL Server 2017 will come to the rescue and seems promising to address such situation. Some limitations are still around and well-documented in the Microsoft documentation but I guess Microsoft will work hard to remove them in the future.

Let’s play a little bit with the Resumable Online Indexes on my AG environment that includes two replicas with cluster_type = NONE meaning we don’t rely on an underlying orchestrator to handle automatic failover of the AG resources. In this context, it will be sufficient to demonstrate how Resumable Online Indexes work.

blog 123 - 01 - resumable onlline index - scenario_

My initial configuration is as follows:

blog 123 - 02 - resumable onlline index - ag win

I used the AdventureWorks2016 database with a bigTransactionHistory table generated from the Adam Machanic script.

select 
	o.name as table_name,
	i.name as index_name,
	p.index_id,
	au.type_desc,
	au.used_pages,
	p.rows
from 
	sys.allocation_units as au
join 
	sys.partitions as p on au.container_id = p.hobt_id
join 
	sys.objects as o on o.object_id = p.object_id
left join 
	sys.indexes as i on i.object_id = p.object_id
						and i.index_id = p.index_id
where p.object_id = object_id('bigTransactionHistory');

 

blog 123 - 03 - resumable onlline index - bigTransactionHistory

Let’s then initiate a rebuild of the pk_bigTransactionHistory index by using the new RESUMABLE option as follows:

ALTER INDEX pk_bigTransactionHistory 
ON bigTransactionHistory 
REBUILD
WITH (ONLINE = ON, RESUMABLE = ON);
GO

 

By using this option, I made the rebuild phase “resumable” and I’m now able to perform additional actions like pausing or aborting the index rebuild operation. I may also use MAX_DURATION parameter to setup the duration (in minutes) of the operation before to pause it. We may easily identify resumable sessions which are running on the SQL Server instance by using the sys.dm_exec_requests DMV and new is_resumable column.

blog 123 - 04 - resumable onlline index - sys_dm_exec_requests_is_resumable

Let’s initiate a manual failover while the index is rebuilt by SQL Server on the primary replica. Then during the outage of the first primary replica, I faced the following error message:

blog 123 - 05 - resumable onlline index - error during failover

A typical error message regarding the situation … but the most interesting part comes now. If we go through the new sys.index_resumable_operations DMV, we may notice our rebuild index operation is paused.

SELECT 
	iro.last_max_dop_used,
	iro.sql_text,
	iro.start_time,
	iro.last_pause_time,
	iro.total_execution_time,
	iro.page_count, 
	iro.percent_complete
FROM  
	sys.index_resumable_operations as iro
JOIN
	sys.objects AS o ON o.object_id = iro.object_id

 

blog 123 - 06 - resumable onlline index - resume op dmv

From the above output, we may extract interesting column values:

  • percent_complete : the percent complete value at the moment of pausing the operation
  • page_count = size of the second index structure at the moment of pausing the operation
  • last_pause_time = it is self-explanatory
  • last_max_dop_used = max dop value used during the last rebuild operation. It is interesting to notice we may change this value for the same operation between pausing sessions.

Before resuming the index operation, let’s have a brief look at the transaction log used space. I asked myself some questions about the transaction behavior when the index operation is paused: Does a pause state have an impact on the transaction log? Do the VLFs touched by this operation remain active until the index is completely rebuilt? Let’s answer to this question by the following test.

A first look at the transaction log space used says that the transaction log contains records that must be backed-up.

use AdventureWorks2016;

select 
	total_log_size_in_bytes / 1024 / 1024 / 1024 as total_log_size_GB,
	used_log_space_in_bytes / 1024 / 1024  as used_log_space_MB,
	log_space_in_bytes_since_last_backup / 1024 / 1024 as log_space_MB_since_last_backup,
	used_log_space_in_percent
from sys.dm_db_log_space_usage;
go

 

blog 123 - 07 - resumable onlline index - tlog

According to the previous result, let’s perform a backup log …

backup log AdventureWorks2016
to disk = 'NUL' 
with stats = 10;
go

blog 123 - 08 - resumable onlline index - tlog

Well, pausing an index operation doesn’t rely on a remaining active transaction that may impact the transaction log retention but after a few thoughts, it seems to be obvious it works in this way.

What about storage? The resumable option is available only with online index operations meaning SQL Server must maintain the corresponding hidden structures as long as it is necessary to rebuild the underlying index. According to the BOL:

No extra resources are required for resumable index rebuild except for◦Additional space required to keep the index being built, including the time when index is being paused

Obviously, maintaining such structure may have a huge impact on your workload regarding your context. Out of curiosity, I went through the system view to see those hidden structures as follows:

select 
	o.name as table_name,
	i.name as index_name,
	p.index_id,
	au.type_desc,
	SUM(au.used_pages) as total_used_pages,
	SUM(p.rows) as total_rows,
	COUNT(*) as nb_partitions
from 
	sys.allocation_units as au
join 
	sys.partitions as p on au.container_id = p.hobt_id
join 
	sys.objects as o on o.object_id = p.object_id
left join 
	sys.indexes as i on i.object_id = p.object_id
						and i.index_id = p.index_id
where 
	p.object_id = object_id('bigTransactionHistory')
group by 
	o.name, i.name, p.index_id, au.type_desc
order by 
	o.name, p.index_id;
go

 

blog 123 - 09 - resumable onlline index - hidden index structure

In regard to the second record, a simple math – 9261 * 100. / 190150 – confirms the structure is only 4.8 percent in size of the underlying cluster index. We may retrieve this result from the first output of the new sys.index_resumble_index system view. For the third one, my guess is it corresponds to the temporary mapping index used by SQL Server to determine which records to delete in the new indexes that are being built when rows in the underlying table are updated or deleted.

Here is an execution plan sample of an update query against the bigTransactionHistory table. We may notice DELETE / UPDATE operations to a “hidden” nonclustered index related to this special index highlighted in red.

blog 123 - 10 - resumable onlline index - execution plan

Go ahead and let’s resume the pending index operation by using the new RESUME option. We may also add the MAX_DURATION option to guarantee the rebuild operation will go back to a pause state if we consider that reaching the maintenance windows limit cannot hurt the daily business workload in terms of resource for instance. An option we will definitely consider to add it with the next version of our DMK maintenance tool for SQL Server.

ALTER INDEX pk_bigTransactionHistory 
ON bigTransactionHistory RESUME WITH (MAX_DURATION = 120 MINUTES);
GO

In this blog, we just have surfaced this new maintenance option and its capabilities. Other articles will probably come in the future.

Stay tuned!

 

Leave a Reply

David Barbarin
David Barbarin

Senior Consultant & Microsoft Technology Leader