Infrastructure at your Service

When a database gets shutdown for patching, operating system maintenance or other reasons you completely lose the contents of the buffer cache. No surprise. Memory is not persistent. Wouldn’t it be nice to load the most important tables to the cache when the instance starts automatically? This would reduce the time for the most important queries to return the results as blocks would not be needed to read from disk. PostgreSQL has a solution for that which is called pg_prewarm. This is a contrib module which is available by default.

The extension provided by default can be installed by just issuing the “create extension” statement:

([email protected][local]:5001) [postgres] > \! ls /u01/app/postgres/product/95/db_0/share/extension/*prewarm*
/u01/app/postgres/product/95/db_0/share/extension/pg_prewarm--1.0.sql  /u01/app/postgres/product/95/db_0/share/extension/pg_prewarm.control
([email protected][local]:5001) [postgres] > create extension pg_prewarm;
CREATE EXTENSION
Time: 174.232 ms
([email protected][local]:5001) [postgres] > \dx pg_pre*
             List of installed extensions
    Name    | Version | Schema |      Description      
------------+---------+--------+-----------------------
 pg_prewarm | 1.0     | public | prewarm relation data
(1 row)

So, how does it work? Lets create some sample data in a first step:

([email protected][local]:5001) [postgres] > create table prewarm ( a int, b varchar(100) );
CREATE TABLE
Time: 51.165 ms
([email protected][local]:5001) [postgres] > with generator as ( select generate_series(1,1000000) a )
insert into prewarm ( select a, md5(a::text) from generator );

To get some idea on how long it takes to read the whole table I’ll restart the instance and do a count(*):

[email protected]:/home/postgres/ [PG9] pg_ctl -D /u02/pgdata/PG9 stop -m fast
[email protected]:/home/postgres/ [PG9] sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches'
[email protected]:/home/postgres/ [PG9] pg_ctl -D /u02/pgdata/PG9 start

Note: the sudo command drops the kernel cache (see here for details).

[email protected]:/home/postgres/ [PG9] time psql -c "select count(*) from prewarm" postgres
  count  
---------
 1000000
(1 row)

real	0m0.908s
user	0m0.005s
sys	0m0.000s

Almost a second to scan through the 1’000’000 rows to get the count. Lets do the same test but pre-warming the table before:

[email protected]:/home/postgres/ [PG9] pg_ctl -D /u02/pgdata/PG9 stop -m fast
[email protected]:/home/postgres/ [PG9] sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches'
[email protected]:/home/postgres/ [PG9] pg_ctl -D /u02/pgdata/PG9 start
[email protected]:/home/postgres/ [PG9] psql -c "select pg_prewarm('prewarm')" postgres
 pg_prewarm 
------------
       8334
(1 row)

Note: The number returned by pg_prewarm is the number of pages affected/pre-warmed.
How long does it now take to get the result:

[email protected]:/home/postgres/ [PG9] time psql -c "select count(*) from prewarm" postgres
  count  
---------
 1000000
(1 row)


real	0m0.189s
user	0m0.002s
sys	0m0.004s

Not so bad, isn’t it? There is a second parameter for pg_prewarm which specifies the method to use (copied from the documentation):

  • prefetch: issues asynchronous prefetch requests to the operating system, if this is supported, or throws an error otherwise
  • read: reads the requested range of blocks; unlike prefetch, this is synchronous and supported on all platforms and builds, but may be slower
  • buffer: reads the requested range of blocks into the database buffer cache

The default is “buffer”. If you write a little script which pre-warms the tables you need and put this at the end of your PostgreSQL startup script you can save the time it takes to load the blocks/pages from disk when the first requests come in after a restart of the instance.

There is another extension called pg_hibernater which automatically stores the IDs of the buffers in the buffer cache when the server shuts down and re-reads them again automatically when the server starts.

To install this module: Download, extract, make install as the postgres user:

[email protected]:/var/tmp/ [PG9] unzip pg_hibernator-master.zip 
Archive:  pg_hibernator-master.zip
32b344fab58242552730e142adffa8650c7df4ac
   creating: pg_hibernator-master/
  inflating: pg_hibernator-master/LICENSE  
  inflating: pg_hibernator-master/Makefile  
  inflating: pg_hibernator-master/README.md  
  inflating: pg_hibernator-master/misc.c  
  inflating: pg_hibernator-master/pg_hibernate.c  
  inflating: pg_hibernator-master/pg_hibernate_9.3.c  
  inflating: pg_hibernator-master/pg_hibernator.h  
   creating: pg_hibernator-master/tests/
  inflating: pg_hibernator-master/tests/pg_hibernator_comparison_chart.ods  
  inflating: pg_hibernator-master/tests/test_run.txt  
[email protected]:/var/tmp/ [PG9] cd pg_hibernator-master
[email protected]:/var/tmp/pg_hibernator-master/ [PG9] make install

We’ll need to add the module to the shared_preload_libraries parameter for pg_hibernator to start up:

([email protected][local]:5001) [postgres] > show shared_preload_libraries;
 shared_preload_libraries 
--------------------------
 
(1 row)

Time: 0.283 ms

([email protected][local]:5001) [postgres] > alter system set shared_preload_libraries='pg_hibernator';
ALTER SYSTEM
Time: 59.374 ms

Restart the server:

[email protected]:/home/postgres/ [PG9] pg_ctl -D /u02/pgdata/PG9 restart -m fast

Double-check the setting:

([email protected][local]:5001) [postgres] > show shared_preload_libraries ;
 shared_preload_libraries 
--------------------------
 pg_hibernator
(1 row)

Time: 0.307 ms

There are a few parameters which control the behavior (check the documentation for details):

([email protected][local]:5001) [postgres] > show pg_hibernator.enabled;
 pg_hibernator.enabled 
-----------------------
 on
(1 row)

Time: 0.148 ms
([email protected][local]:5001) [postgres] > show pg_hibernator.parallel;
 pg_hibernator.parallel 
------------------------
 off
(1 row)

Time: 0.140 ms
([email protected][local]:5001) [postgres] > show pg_hibernator.default_database;
 pg_hibernator.default_database 
--------------------------------
 postgres
(1 row)

Time: 0.142 ms

Once the module is running there is a new background process called “Buffer Saver”. This process will save the buffer IDs when the server shuts down.

[email protected]:/home/postgres/ [PG9] ps -ef | grep "Buffer Saver"
postgres 22773 22764  0 16:12 ?        00:00:00 postgres: bgworker: Buffer Saver   
postgres 22810 20779  0 16:17 pts/2    00:00:00 grep --color=auto Buffer Saver

If you check your data directory you should find that pg_hibernater writes a file per database which has buffers in the cache on shutdown of the server:

[email protected]:/home/postgres/ [PG9] ls -la $PGDATA/pg_hibernator/
total 12
drwx------.  2 postgres postgres   32 Mar  9 16:18 .
drwx------. 19 postgres postgres 4096 Mar  9 16:18 ..
-rw-------.  1 postgres postgres   91 Mar  9 16:18 1.save
-rw-------.  1 postgres postgres  619 Mar  9 16:18 2.save

So, lets do the same test again:

[email protected]:/home/postgres/ [PG9] pg_ctl -D /u02/pgdata/PG9 stop -m fast
[email protected]:/home/postgres/ [PG9] sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches'
[email protected]:/home/postgres/ [PG9] pg_ctl -D /u02/pgdata/PG9 start

How fast is it?

[email protected]:/home/postgres/ [PG9] time psql -c "select count(*) from prewarm" postgres
  count  
---------
 1000000
(1 row)


real	0m0.623s
user	0m0.004s
sys	0m0.000s

Not as fast as with the pg_prewarm method but still faster as without any pre-warming. Maybe this is because the implementation is different, I did not test it entirely. At least it automates the save and restore of the buffer cache. For taking this into production some more testing is required.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure