dbi Blog

By Franck Pachot

Oracle has announced a big feature that should come in the next 12c patch set: the In-Memory option. It will soon be in beta version, so nothing will be published about it until it comes to production. Before that phase, I’m going to explain what we can expect from that feature, besides the ‘In-Memory’,’columnar storage’, and ‘hybrid’ buzzwords.

First, it is a redundant storage that is aimed to improve the query performance.
But this is not new: indexes and materialized views have the same goal.

Second: Redundant data will be stored In-Memory and will be optimized for it.
That is not new either if we consider TimesTen. TimesTen is an In-Memory relational database that can also be used in front of Oracle Database when used as an In-Memory Database Cache.

What is new is that it is a columnar storage. Columnar storage is already used by Oracle for compression (Hybrid Columnar Compression) but here, the goal is different. Very different. HCC uses CPU to save I/O and disk, and is mainly for read-only data. In-Memory option is for actively updated data residing in memory and the goal is to use less CPU. Let’s explain this.

I’ll use a well known relational table: SCOTT.EMP where I highligh the columns and rows that I’ll need for my query:

EMPNO	ENAME	JOB	MGR	HIREDATE	SAL	COMM	DEPTNO
7369	SMITH	CLERK	7902	17-DEC-80	800		20
7499	ALLEN	SALESMAN	7698	20-FEB-81	1600	300	30
7521	WARD	SALESMAN	7698	22-FEB-81	1250	500	30
7566	JONES	MANAGER	7839	02-APR-81	2975		20
7654	MARTIN	SALESMAN	7698	28-SEP-81	1250	1400	30
7698	BLAKE	MANAGER	7839	01-MAY-81	2850		30
7782	CLARK	MANAGER	7839	09-JUN-81	2450		10
7788	SCOTT	ANALYST	7566	19-APR-87	3000		20
7839	KING	PRESIDENT		17-NOV-81	5000		10
7844	TURNER	SALESMAN	7698	08-SEP-81	1500	0	30
7876	ADAMS	CLERK	7788	23-MAY-87	1100		20
7900	JAMES	CLERK	7698	03-DEC-81	950		30
7902	FORD	ANALYST	7566	03-DEC-81	3000		20
7934	MILLER	CLERK	7782	23-JAN-82	1300		10

This is a table. It has two dimensions: rows and columns. How do you read it ? There are two ways:

Read it by column: I have employee numbers 7369, 7499, 7521, … Their names are respectively SMITH, ALLEN, WARD, … The first one is a CLERK, the second and third ones are SALESMEN, …

Read it by row: I have an employee that I identify as number 7369, his name is SMITH, he is a CLERK, his manager is… I have another employee which is number 7499, his name is ALLEN…

Which way do you choose ? You probably read it by row. Because it is easier to think about it when we group the entities we are talking about. When you have a new employee, you fill a form with his name, job, hiredate, etc. When you update some information, you lock the row, saying ‘I’m working on employee 7369’, and update the attributes you want to change. You do that because you prefer to cluster together the information that are related, it’s easier to manage for our mind. Note that when you program, you often have the same logic. You manage objects, you implement CRUD operation to change their attributes. And your computer knows that, so it tries to be more efficient on clustered data, with prefetching or cache.

Relational Database did the same choice when physically manipulating data.
The SQL language let you have both approaches. You update one row, such as:
UPDATE EMP SET … WHERE EMPNO=7369;
or you can update one column, such as
UPDATE EMP SET SAL=SAL*1.1;
But physically, data is stored by row and is manipulated by row, as when you read the table row by row. And it is a very efficient approach when you manipulate several attributes from one row: you do only one I/O to get the block from disk, you store only one block in cache, and the CPU is able to process everything in one block. Great.

But now let’s see what happens when you don’t need all the columns. Let’s say you want to find who has a salary higher than 3000.

SELECT ENAME from EMP where SAL>=3000;

You have to read the whole table (full table scan) and read each block, and in each block read each row, and for each row find the salary, compare it to 3000 and if it is higher, find the name.

Note that I’m not talking about I/O. Even when my table fit in memory I have a lot of information to read, I have to transfert each value from memory to CPU, and use CPU cycles to process each value.

The first problem is that you have read information about 14 employees where you’re interrested only in 3 of them. So you build an index on SAL. From the index you get the rowid for the employees that have SAL>=3000. And then for each of them you go to the row to get its name. Ok, that’s not bad, but when you have a lot of queries like that, you will have to create a lot of indexes. You can create an index for each column but then there is the cost to combine to result. We are on OLTP actively updated data, so bitmap index is not a solution here. Or you can create an index for each kind of filtering you can do, but that’s a lot of indexes to maintain.

But then there is the second problem. Even if you have to process only the 3 employees you are interrested in, you will have to read the whole row, which is stored in variable length, from begining to end, in order to find the only columns that you need. This is: read row directory, follow pointer to row, read EMPNO length, bypass it, read ENAME, get it, … This is in memory, and has to be transferred to CPU to be processed, word by word.

To address that problem, you can add to your index all the columns you’re interrested in. But then you will have too many indexes to maintain. And you know how hard it is to determine which indexes can be useful or not.

Now let’s see how that works with the columnar approach we have seen above. Without any indexes, we will read the SAL column and the result is that item 8,9 and 13 are above 3000. Then we read the ENAME column and just have to get the 8,9 and 13 items. And that’s all!

Without any additional structure we had to read only what we are interrested in. And there is more. On current processors when we have the same operation to do on a several values we can do in in a whole with a vector operation. Just pass a vector of salaries to the CPU and ask it to compute the ‘>=3000’ on all of them. This is known as ‘Single Instruction Multiple Data’. This is less roundtrips between memory and CPU.

The first CPU operation is a ‘find >=3000’ on the following vector:

800

1600

1250

2975

1250

2850

2450

3000

5000

1500

1100

950

3000

1300

The second CPU operation is a ‘find values from previous result index’ on the following vector:

SMITH

ALLEN

WARD

JONES

MARTIN

BLAKE

CLARK

SCOTT

KING

TURNER

ADAMS

JAMES

FORD

MILLER

Compare that with the row approach, navigating within the block, each pointer or each column value being a CPU operation… It’s hundreds of CPU cycles that will be saved. This is something that was possible before only on supercomputers. Today our processors can do Single Instruction Multiple Data vector processing, and this is why in-memory columnar storage is coming everywhere.

And the volume to transfer can be decreased when we compress those vectors. Remember when I’ve read the ENAME in column: the second and third employees are SALESMAN. This is RLE compression where you don’t repeat previous values. You can do that with columns, probably not within a row (or your data model is not that relational I guess).

This is the goal of columnar storage: increase efficiency of memory-cpu processing for queries that don’t need to read the whole row, and without having to create indexes. Oracle has always been able to handle mixed workload quite well (OLTP and reporting in the same database), thanks to the read consistency implementation. So it is a natural evolution to go to the ‘hybrid’ approach that adds analytic capabilities to pure OLTP data models. Of course, this is not the kind of thing that is easy to implement. The columnar storage must be maintained, and must be optimized for an efficient CPU processing. The values must be easy to compare byte per byte in a single CPU operation, even when compressed. And CPU vector operation often requires that the values have the same size, so this may require fixed size storage instead of the usual variable length storage that reduces space.

So behind the feature that is supposed to accelerate our application just by switching it on, we can expect some complex technology that is able to bring In-Memory capabilities while staying in our Oracle OLTP database. In relational databases, the first goal for years was to optimize the access to the storage. And this is done by regrouping all information related to the same object, in order to get them in one I/O. Now, the goal is to improve the transfer between memory and CPU by regrouping similar information from several objects, in order to process them in vectors by the CPU. This is an orthogonal approach: row storage vs. columnar storage. Oracle will implement both. And we, the DBAs, have to understand how it works in order to use it in the right place.

Post Views: 958

Oracle 12c In-Memory option: waiting for 12.1.0.2

By Franck Pachot

Oracle Team

Leave a Reply:

Related blog articles

By Franck Pachot

Oracle Team

Leave a Reply:

Related blog articles

A first look at openSUSE Leap Micro 6.0

Another file system for Linux: bcachefs (3) – Mirroring/Replicas

Another file system for Linux: bcachefs (2) – multi device file systems

Another file system for Linux: bcachefs (1) – basics