Performance tunning is becoming a key focus for most of the DBAs. In scope of the “dbi services labs”, I made some tests on functions which can help improving these, but which are not always well-known. Today’s article is going to describe one of these, the SPARE COLUMNS.
The property of Spare columns is not the most used in client environment that we have seen.
This article present a script to find the potential columns that can have the sparse property with the usage restrictions to make a space tuning in your database and especially for Data Warehouse (DWH).

The database example that we used, is AdventureWorksDW2008R2.

Principle

The msdn explain very well the principle. I will just resume it:
“Sparse columns reduce the space requirements for null values at the cost of more overhead to retrieve non null values”
For more information, go to msdn web site

Step 1: Create a temporary table

A temporary table is created to retrieve information about the columns of the tables in the database being audited. This table provides the following information:

  • Name of th column
  • Column ID
  • Table name
  • Table ID
  • Type of the column
  • If column is a sparse column
  • If column is nullable
  • Number of row in this table
  • Number of null row in the column
  • Ratio Nb NULL Rows / Nb Rows
CREATE TABLE #temp (
ColumnName varchar(50),
ColumnID int,
TableName varchar(50),
TableId int,
TypeName varchar(50),
IsParse bit,
IsNullable bit,
NumberOfRow int,
NumberOfRowNULL int,
Ratio int)
SET NOCOUNT ON

 

Note that “SET NOCOUNT ON” option is used. In the Best Practice, setting SET NOCOUNT to ON can provide a significant performance boost, as network traffic is greatly reduced.

1CreateTable

Step 2: Insert the first information and introduction to some usage’s restrictions

We insert in the temporary table the table name, the column name and the column type which can be candidate for SPARSE by using sp_MSForEachTable.

INSERT #temp
EXEC sp_MSForEachTable '
 SELECT DISTINCT
sys.columns.name ColumnName,
sys.columns.column_id ColumnID,
SUBSTRING(''?'', 8, Len(''?'')-8) AS TableName,
sys.columns.object_id TableID,
CASE systypes.name 
WHEN ''sysname'' THEN ''nvarchar''
ELSE systypes.name
END AS TypeName,
sys.columns.is_sparse IsParse,
sys.columns.is_nullable IsNullable,
0,0,0 
 FROM sys.columns (NoLock) 
INNER JOIN systypes (NoLock) ON systypes.xtype = sys.columns.system_type_id
 WHERE sys.columns.object_id = (
SELECT id FROM sysobjects (NoLock)
WHERE name = SUBSTRING(''?'', 8, Len(''?'')-8))'

But, we can optimize this query with a first part of the usage restrictions for the properties of Spare columns:

  • The following data types cannot be specified as SPARSE: geography, geometry, image, ntext, text and timestamp

systypes.name NOT IN (”geography”, ”geometry”, ”image”, ”ntext”, ”text”, ”timestamp”)

  • A sparse column must be nullable and cannot have the ROWGUIDCOL or IDENTITY properties.

sys.columns.is_nullable = 1
sys.columns.is_rowguidcol = 0
sys.columns.is_identity = 0

  • A sparse column cannot have the FILESTREAM attribute

sys.columns.is_filestream = 0

  • A sparse column cannot have a default value

sys.columns.default_object_id = 0

  • A sparse column cannot be bound to a rule

sys.columns.rule_object_id = 0

  • Although a computed column can contain a sparse column, a computed column cannot be marked as SPARSE

sys.columns.is_computed = 0

  • A sparse column cannot be part of a user-defined table type

sys.columns.system_type_id=sys.columns.user_type_id

  • And if the columns have the Spare properties, naturally we don’t count this columns

sys.columns.is_sparse = 0

The new query with the restricted usage is :

INSERT #temp
EXEC sp_MSForEachTable '
 SELECT DISTINCT
sys.columns.name ColumnName,
sys.columns.column_id ColumnID,
SUBSTRING(''?'', 8, Len(''?'')-8) AS TableName,
sys.columns.object_id TableID,
CASE systypes.name 
WHEN ''sysname'' THEN ''nvarchar''
ELSE systypes.name
END AS TypeName,
sys.columns.is_sparse IsParse,
sys.columns.is_nullable IsNullable,
0,0,0 
 FROM sys.columns (NoLock) 
INNER JOIN systypes (NoLock) ON systypes.xtype = sys.columns.system_type_id
 WHERE sys.columns.object_id = (
SELECT id FROM sysobjects (NoLock)
WHERE name = SUBSTRING(''?'', 8, Len(''?'')-8)
and systypes.name NOT IN (''geography'', ''geometry'', ''image'', ''ntext'', ''text'', ''timestamp'')
and sys.columns.is_sparse = 0
and sys.columns.is_nullable = 1
and sys.columns.is_rowguidcol = 0
and sys.columns.is_identity = 0
and sys.columns.is_computed = 0
and sys.columns.is_filestream = 0
and sys.columns.default_object_id = 0
and sys.columns.rule_object_id = 0
and sys.columns.system_type_id=sys.columns.user_type_id
)'


In our example, we have 182 rows for the databases AdventureWorksDW2008R2

Step 3: Usage restrictions on Index, PK and data compression

This query deletes all columns that can be a part of a clustered index or a unique primary key(PK) index :

delete tps from #temp tps where exists (
select DISTINCT 'Exists' from sys.columns
inner join sys.indexes i on i.object_id = tps.TableId
inner join sys.index_columns ic on ic.column_id = tps.ColumnID inner join sys.columns c on c.object_id = tps.TableId and ic.column_id = c.column_id
where i.type =1 or i.is_primary_key = 1)



We reduce the number of row at 95 in this sample.
Sparse columns are incompatible with data compression
To see if data compression is active, you can use this query:

SELECT partition_id,
OBJECT_NAME(object_id) AS objectname,
data_compression_desc,data_compression
FROM sys.partitions where data_compression<>0;



We can see that one table (DimEmployee) use ROW Compression.
To delete columns with data compression, we use this query:

delete tps from #temp tps inner join sys.partitions p 
on p.object_id = tps.TableId where p.data_compression<>0;

We reduce the number of row at 79.

Step 4: Add count information for the selected columns

In this step, we will get the information about the totally number of row, number of row with a “null” value and calculate the ratio in % between these 2 values.

DECLARE @TableName nvarchar(1000)
DECLARE @ColumnName nvarchar(1000) 
DECLARE @vQuery nvarchar(1000) 
DECLARE @result1 INT 
DECLARE @result2 INT
DECLARE tables_cursor CURSOR FAST_FORWARD
 FOR
 SELECT TableName,ColumnName FROM #temp
 OPEN tables_cursor
FETCH NEXT FROM tables_cursor INTO @TableName,@ColumnName
 WHILE @@FETCH_STATUS = 0
 BEGIN
 -- Search the number of row in a table
 SET @vQuery = 'SELECT @result1= COUNT(*) FROM [' + @TableName + ']'
 EXEC SP_EXECUTESQL 
 @Query = @vQuery 
 , @Params = N'@result1 INT OUTPUT' 
 , @result1 = @result1 OUTPUT 
 -- Search the number of row in a table
 SET @vQuery = 'SELECT @result2= COUNT(*) FROM [' + @TableName + '] where [' + @ColumnName + '] is null'
 EXEC SP_EXECUTESQL 
 @Query = @vQuery 
 , @Params = N'@result2 INT OUTPUT' 
 , @result2 = @result2 OUTPUT 
 update #temp set NumberOfRow = @result1,NumberOfRowNULL = @result2,Ratio = (@result2*100/@result1) where 
ColumnName=@ColumnName and TableName=@TableName
 FETCH NEXT FROM tables_cursor INTO @TableName,@ColumnName
 END
 CLOSE tables_cursor
DEALLOCATE tables_cursor

2Step4

For the end, we delete all line that not has a NULL value in NumberOfRowNULL and NumberOfRow columns :

delete tps from #temp tps where NumberOfRowNULL = 0 or NumberOfRow = 0

Conclusion

In this example, you can see that the column MinPaymentAmount from the table DimReseller has 72% of Null value.
In the msdn web site, you can see that for a money type, we win 40% for 52% of null value.

This is clearly a good candidate to be SPARE column.

msdn extract :

2Step42

This code can be optimized but the different steps above help the comprehension of this functionality.
For my example, this is the changed query :

ALTER TABLE DimReseller ALTER COLUMN MinPaymentAmount ADD SPARSE

You can download the detection script here. It’s free ! 🙄