Infrastructure at your Service

Stéphane Haby

SQL Server Tips: How many different datetime are in my column and what the delta?

Few months ago, a customer asks me for finding in a column, how many rows exist with the same date & time and the delta between them. The column default value  is based on the function CURRENT_TIMESTAMP and used as key as well.
This is obviously a very bad idea but let’s go ahead…

This anti pattern may lead to a lot of duplicate keys and the customer wanted to get a picture of the situation.

To perform this task, I used the following example which includes a temporary table with one column with a datetime format:

CREATE TABLE [#tmp_time_count] (dt datetime not null)

Let’s insert a bunch a rows with CURRENT_TIMESTAMP function in the temporary table:

INSERT INTO [#tmp_time_count] SELECT CURRENT_TIMESTAMP
Go 1000

To get distinct datetime values , I used DISCTINCT and COUNT functions as follows:

SELECT COUNT(DISTINCT dt) as [number_of_time_diff] from [#tmp_time_count]

datetime_diff_01
In my test, I find 36 different times for 1000 rows.
The question now is to know how many I have on the same date & time…
To have this information, I try a lot of thing but finally, I write this query with a LEFT JOIN on the same table and a DATEPART on the datetime’s column.

SELECT DISTINCT [current].dt as [Date&Time], DATEPART(MILLISECOND,ISNULL([next].dt,0) –[current].dt) as [time_diff] FROM [#tmp_time_count] as [current] LEFT JOIN [#tmp_time_count] as [next] on [next].dt = (SELECT MIN(dt) FROM [#tmp_time_count] WHERE dt >[current].dt)

datetime_diff_02
Finally, don’t forget to drop the temporary table….

DROP TABLE [#tmp_time_count];

Et voila! I hope this above query will help you in a similar situation… But it’s not finished!

Having discussed this blog post with my colleague David Barbarin, he suggested I continue to dig further with the performance aspect by inserting more rows (let’s say 100000 rows as new exemple for this blog post).
let’s go!
To perform this test, I enabled STATISTICS TIME & IO options to get a picture of query execution statistics for each test.

SET STATISTICS TIME ON 
GO
SET STATISTICS IO ON 
GO

datetime_rw01

As you can see on the screenshot, the CPU time is 95875ms with an number of 1322685 logical reads.
This is the first step to the optimization process. I added then the following non-clustered Index:

CREATE NONCLUSTERED INDEX [dt_idx] ON [dbo].[#tmp_time_count]
(
	[dt] ASC
)
GO

datetime_rw02
After the query execution, the new result is very better, because the CPU time dropped to 624ms with a number of logical reads equals to 205223.
Adding an index was helpful to get a better result but the query execution time is still tied to the number of rows in the table
To get a more consistent result , David proposed one another possible solution including the LEAD function started since SQL Server 2012.
The new query becomes:

SELECT [current].dt, DATEPART(MILLISECOND, LEAD([current].dt, 1, 0) OVER (ORDER BY [current].dt) - [current].dt) AS [time_diff]
FROM  [#tmp_time_count] AS [current]

datetime_rw03

After running the query, CPU Time is 94ms with a number of logical reads equals to 282.
I was very impressed by the result with just the creation of an index and the use of the Lead function as well.
So, if you face a similar situation with a high volume of data, prefer working with this last query.
It was also a good reminder to have a look at the new T-SQL functions shipped with SQL Server to get faster.
Thank you David to challenge me on this topic!

Leave a Reply

Stéphane Haby
Stéphane Haby

Delivery Manager