Infrastructure at your Service

Daniel Westermann

What that in-memory term is about, and what not

Everybody is talking about in-memory databases these days. And everybody is talking about columnar store for sets of data because this can be a benefit for analytic queries. And a lot of people start mixing these terms not realizing that these topics are not exchangeable.

Traditionally in-memory means: Not persistent, but fast. And this is what pure in-memory databases are: Not persistent, but fast, but, again: not persistent!!!

They are just caching data for very fast access times. But: As soon as the in-memory database crashes or needs to be restarted the cache is gone and needs to be re-populated. So, how does this work together with relational databases like Oracle, MSSQL, PostgreSQL and all the others where persistence is one of the keys features they need to guarantee? The answer is: It does not. Yes, it does not. All of the relational databases are hybrids. Every database, and in fact every piece of software, needs to load data from disk into memory to access, modify and to save it. Nothing special about that. When it comes to relational databases all of them try to cache as much as possible in memory to get most performance for accessing and modifying the data. As soon as the data is modified it needs to be written to disk to guarantee the D in ACID. No way around this: Data has to go to persistent storage. Most of the database vendors implement something like an LRU algorithm which keeps data that is accessed heavily in memory while data that is accessed less is getting dropped from memory and needs to be re-read from disk if someone wants to read and/or modify it. The term in-memory as it is used currently is just a marketing term. Not more, not less. The “new” thing is just that memory is getting cheaper and cheaper and more and more data (even whole databases) can be cached today.
Coffee break reading: “In-memory” is not a feature, it’s a bug

Lets come to columnar storage systems. The main goal of these is to avoid reading columns from disk that are not needed to satisfy a query and therefore to reduce I/O. This is mainly beneficial for column oriented queries such as data warehouse queries. By storing the data in columnar format the system can skip reading the columns which are not needed to fulfill the request. In traditional row oriented storage systems the whole row must be read even if only few columns are requested. What has this to do with In-Memory? Nothing. Really nothing.

Both, pure In-Memory databases and databases systems which store data in columnar format have their use cases. And there are more and more open source and commercial products which offer both. But: Try not to follow the marketing stuff and be aware of what it is really about.
If you want to have more information on the topic and to meet our experts: Event « In-Memory »: boost your IT performance!

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure