Infrastructure at your Service

David Barbarin

SQL Server: switch partition and metadata inconsistency issue

In this blog post, I would like to share with you a weird issue I faced when I implemented a sliding Windows partition scenario on SQL Server 2014 SP1 at one of my customer. The idea was to keep SQL Server audit records from several SQL Server instances into a global archiving table during two years in order to meet the customer audit specifications. Data older than year – 2 should be deleted. Based on a first implementation, we estimated the total data size to 800GB per year.

So we decided to implement the sliding Windows scenario as follows:

blog 90 - 01 - partition config

  • Right-based partition function is used in this context
  • Two partitions exist to store current and year – 1 audit data
  • Partition that contains data older than one year will be switch and then drop
  • One additional empty partition exists to avoid data movement that may lead to high resource consumption (see my previous blog post here)

In addition, we also implemented two stored procedures in order to automate sliding partition process based on the Microsoft article here.

And here came the (weird) issue. I always used the following (simplified) script so far to get table information regardless if a table is partitioned or not.

SELECT
	OBJECT_SCHEMA_NAME(p.object_id) AS SchemaName,
	OBJECT_NAME(p.object_id) AS ObjectName,
	i.name as index_name,
	i.index_id,
    p.partition_number,
    g.name AS [filegroup_name],
	au.data_space_id,
    p.rows,
	p.data_compression_desc
FROM sys.partitions AS p
JOIN sys.allocation_units AS au
    ON au.container_id = p.partition_id -- p.hobt_id 
JOIN sys.indexes i
	ON p.object_id = i.object_id
    AND p.index_id = i.index_id
JOIN sys.filegroups AS g
    ON g.data_space_id = au.data_space_id
WHERE p.object_id = object_id('dbo.t_sql_logins')
       AND p.index_id = 1;
GO

Here the output of the initial scenario in my case:

blog 90 - 02 - partition info initial scenario

Let’s say that the partition 1 contains Year-1 data.  Si I included this query to detect dynamically which partition to switch inside the stored procedure that deals with the left side of the partitioned table (that includes switch and merge operations) and this is exactly where the problem occurs.

The new scenario is as follows:

blog 90 - 02 - partition info after switching scenario

After switching the old partition and then merge the existing left boundary, I noticed metadata inconsistency regarding the following error message. In short, this message indicates my query doesn’t detect correctly which partition hosts data to delete because it seems to not be correctly aligned with archive table. They are not on the same filegroup.

ALTER TABLE SWITCH statement failed. index ‘security_audit.dbo.test_archive.PK_test_archive’ is in filegroup ‘FG_AUDIT_SQL_03′ and partition 1 of index ‘security_audit.dbo.test.PK_test’ is in filegroup ‘FG_AUDIT_SQL_02′

Fortunately, Dan Guzman (MVP Data Platform) puts me on the right track with the following query:

SELECT 
	OBJECT_SCHEMA_NAME(p.object_id) AS SchemaName
	, OBJECT_NAME(p.object_id) AS ObjectName
	, i.name AS IndexName
	, p.index_id AS IndexID
	, p.partition_number AS PartitionNumber
	, fg2.name AS FileGroupName
	, fg2.data_space_id
	, p.rows AS Rows
	, p.data_compression_desc
FROM sys.partitions AS p
INNER JOIN sys.indexes AS i 
	ON i.object_id = p.object_id
	AND i.index_id = p.index_id
INNER JOIN sys.data_spaces AS ds 
	ON ds.data_space_id = i.data_space_id
INNER JOIN sys.partition_schemes AS ps 
	ON ps.data_space_id = ds.data_space_id
INNER JOIN sys.partition_functions AS pf 
	ON pf.function_id = ps.function_id
INNER JOIN sys.destination_data_spaces AS dds2 
	ON dds2.partition_scheme_id = ps.data_space_id
	AND dds2.destination_id = p.partition_number
INNER JOIN sys.filegroups AS fg2 
	ON fg2.data_space_id = dds2.data_space_id
WHERE p.object_id in ( object_id('dbo.t_sql_logins') )
	AND P.index_id = 1

 

You may notice that the output is not pretty the same than the previous query.

blog 90 - 03 - partition info after switching scenario correct

This time the partition number 1 seems to be more accurate with the FG_AUDIT_SQL_03 value. So why this difference between the two queries? If we take a further look at the both queries, we may see a noticeable difference with the use of sys.allocation_units DMV in the first query versus sys.destination_data_spaces DMV in the second one.

Let’s divide and simplify the above queries into small pieces and let’s have a look at the following result:

SELECT 
	partition_id,
	partition_number,
	hobt_id
FROM sys.partitions 
WHERE object_id in ( object_id('dbo.t_sql_logins') );

SELECT 
	allocation_unit_id,
	container_id,
	data_space_id
FROM sys.allocation_units 
WHERE container_id IN ( 72057594040811520, 72057594040877056, 72057594041073664);

SELECT 
	name,
	data_space_id
FROM sys.filegroups;

SELECT 
	destination_id,
	data_space_id
FROM sys.destination_data_spaces;

 

blog 90 - 04 - partition info difference DMVs

Red arrows concern the first query and the green arrows concern the second one.  The arrows represent the value used for joining DMVs together. You may notice the filegroup name is different between the both queries which makes me think that the data_space_id value from the sys.allocation_units DMV is inconsistent in this case. In addition, we may notice two same data_space_id column values for each different value of container_id. The point here is that if I rebuild the corresponding index, this inconsistency disappears as shown below:

blog 90 - 05 - partition info correct result DMVs

After further investigations, I found out the following Microsoft KB3095958 that talks about metadata inconsistency after a partition switch and SQL Server 2014 SP1. Applying this KB on my environment fixed this specific issue in my case.

Happy partitioning!

 

 

 

 

 

Leave a Reply


six × = 36

David Barbarin
David Barbarin

Senior Consultant & Microsoft Technology Leader