Few weeks ago, I had the opportunity to give a session about the clustered columnstore index feature (CCI) at our In-Memory event dedicated to In-Memory technologies for Microsoft SQL Server, Oracle and SAP HANA. During our session, I explained the improvement made by Microsoft on SQL Server 2014 with the introduction of new clustered columnstore index (CCI).
The CCI includes a new structure that allows update operations: the delta store. Indeed, insert operations go directly in the delta store. Delete operations are logical and go in the deleted bitmap in the delta store and finally update operations are split into two basic operations DELETE operation followed by INSERT operation. I was very interested in how SQL Server deals with both the structures (delta store and columnstore) and memory in different scenarios. This blog post is a result of my studies and will probably concern those who are interested by internal stuff. In fact, I discussed with one of my (oracle) friend and he asked me some interesting questions about CCI and the memory management topic.
First of all, let’s begin with the storage concept: the delta store consists of a traditional row-based storage unlike the columnstore index which is column-oriented storage. The two structures are managed differently by SQL Server and they have their own memory space – respectively the CACHESTORE_COLUMSTOREOBJECTPOOL for the columnstore structure and the traditional buffer pool (BP) for the row store structure. When columnstore data are fetched from disk to memory, they come first to the BP and then to the columnstore memory pool. We may get information about columnstore memory pool by using the following query:
We’ll see how SQL Server behaves by reading data exclusively from the delta store. Let’s begin with a pretty simple table:
Next, let’s create a CCI that will include 1 compressed row group and 1 delta store (open state):
Let’s have a look at the CCI row group’s information:
Let’s execute the first query that will fetch data from the record to the delta store
Let’s have a look at the memory cache entries related to the CCI memory pool:
No entries and this is what I expected because data come only from delta store and the buffer pool is the only one concerned by this scenario. Another important point: segments are eliminated directly from disk. In order to prove it, I created an extended event to get segment elimination information as follows:
And after looking at the extended event file, I noticed that the segment was eliminated by SQL Server as expected.
The hobt_id value relies to the compressed segment in the columnstore index:
It will consist in reading data directly from the columnstore segment.
With the previous script that uses the sys.dm_os_memory_cache_entries DMV we can see this time, two cached entries from the columnstore object pool:
I would like to thank Sunil Argarwal (Principal Program Manager in SQL Server Storage Engine) for his kindness and some information he gave me, in order to read the above entry data column values. [Object type] is a very useful information here with the following meaning:
So let’s correlate with the retrieved entry data column values. SQL Server fetched the concerned segment (object_id = 1) from disk to the columnstore object memory. However, let’s have a look at the column_id value (= 2 here). In fact, I expected to get value =1 which might be related to the id column in the table dbo.test_CCI. I performed some other tests and they let me think that the column_id from the entry_data column value is in fact equal to the column_id minus 1 from the concerned table but I will check this point in the near future.
Moreover, according to the Sunil’s information, the deleted bitmap (object_id=6) was also fetched by SQL Server. I can imagine that SQL Server needs to read it to retrieve deleted records. My feeling is that these operations are fully logical and SQL Server has no way to identify a deleted record from a segment without reading the deleted bitmap.
We can assume that the columnstore memory pool is a room for storing the columnstore segments and segments are stored in LOB pages. Does SQL Server read data directly from the columnstore memory pool?
Let’s go back to the previous test. As a reminder, we want to get the max (id) from the columnstore segment related to the id columm. So SQL Server needs to read the related segment in this case. Let’s see if we can retrieve a corresponding page in the buffer pool by using the following script:
Ok, there are plenty of pages in the buffer pool related to the db_test database. Let’s focus first on data page by using the following script that consists in retrieving data pages only for the dbo.test_CCI table:
In my case, I retrieved only one page with the following detail:
We get a compressed data page and, to be more precise, a data page that comes from the delta store (id = 1001). Remember that the segment elimination is not performed for the delta store. This is why I got this page in my case.
Next, let’s have a look at the LOB pages (our segments)
We have one LOB page (TEXT_MIXPAGE type) but it seems to be empty and I admit that I don’t have any idea of this page. I will update this blog post later if I get a response.
So, to summarize and according to my tests, it’s seems that SQL Server reads LOB pages directly from the columnstore object pool and doesn’t need to use the BP in this case.
It will consist in updating data from the columnstore index and understanding how SQL Server behaves in this case.
Next, let’s update the columnstore index by using the following query:
At this point, a delta store page is created by SQL Server and we have to think differently because the storage has changed from columnar to row store. So let’s have a look at the modified pages related to the columnstore index.
Note that this time, I only focused on the modified / dirty pages in my result and I noticed that there are two data pages. The second page (1:94) in the order of this result set is pretty obvious because it concerns the record with id = 1 and col1 = toto (the modified data). However, I’m not sure to know exactly what the first page is but I can again speculate: we performed an update operation and we know that this operation is split in two basic operations DELETE + INSERT. So my feeling here is that this page lies to the deleted bitmap. Let’s have a look at the sys.column_store_row_groups DMV:
And as expected we can notice a logical deleted record in the row group with a new open delta store (and its deleted bitmap). So let’s perform a checkpoint and clear the buffer pool.
Now, we can wonder how SQL Server will retrieve data from id column = 1. Indeed, we have a deleted record into the row group from one side and the new version of the data in the delta store on the other side. So, we can guess that SQL Server will need to fetch both the data pages from the delta store and the deleted bitmap to get the correct record.
Let’s verify by performing this test and taking a look first at the memory cache entries related to the columnstore index.
SQL Server has fetched the corresponding segment (object_type =1) and the deleted bitmap (object_id=6) as well. Note that segment elimination is not performed for the concerned segment because SQL Server is not able to perform an elimination for segments that contain logical deletions.
Finally let’s retrieve the data pages in the buffer pool related to the columnstore index:
Ok we retrieved the same clean pages (is_modified = 0) and performing the same test after rebuilding the CCI yielded an empty result. In the latter case, this is the expected behaviour because rebuilding the columnstore index get rid of deleted records inside the segments. Thus, SQL Server doesn’t need the deleted bitmap.
I didn’t cover all the scenarios in this blog post and some questions are not answered yet. My intention was just to introduce some interesting internal stuff done by the CCI. This is definitely a very interesting topic that I want to cover in the near future. Please feel free to share your thoughts about this exciting feature!
By David Barbarin