When a database gets shutdown for patching, operating system maintenance or other reasons you completely lose the contents of the buffer cache. No surprise. Memory is not persistent. Wouldn’t it be nice to load the most important tables to the cache when the instance starts automatically? This would reduce the time for the most important queries to return the results as blocks would not be needed to read from disk. PostgreSQL has a solution for that which is called pg_prewarm. This is a contrib module which is available by default.
The extension provided by default can be installed by just issuing the “create extension” statement:
(postgres@[local]:5001) [postgres] > ! ls /u01/app/postgres/product/95/db_0/share/extension/*prewarm* /u01/app/postgres/product/95/db_0/share/extension/pg_prewarm--1.0.sql /u01/app/postgres/product/95/db_0/share/extension/pg_prewarm.control (postgres@[local]:5001) [postgres] > create extension pg_prewarm; CREATE EXTENSION Time: 174.232 ms (postgres@[local]:5001) [postgres] > dx pg_pre* List of installed extensions Name | Version | Schema | Description ------------+---------+--------+----------------------- pg_prewarm | 1.0 | public | prewarm relation data (1 row)
So, how does it work? Lets create some sample data in a first step:
(postgres@[local]:5001) [postgres] > create table prewarm ( a int, b varchar(100) ); CREATE TABLE Time: 51.165 ms (postgres@[local]:5001) [postgres] > with generator as ( select generate_series(1,1000000) a ) insert into prewarm ( select a, md5(a::text) from generator );
To get some idea on how long it takes to read the whole table I’ll restart the instance and do a count(*):
postgres@oel7:/home/postgres/ [PG9] pg_ctl -D /u02/pgdata/PG9 stop -m fast postgres@oel7:/home/postgres/ [PG9] sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches' postgres@oel7:/home/postgres/ [PG9] pg_ctl -D /u02/pgdata/PG9 start
Note: the sudo command drops the kernel cache (see here for details).
postgres@oel7:/home/postgres/ [PG9] time psql -c "select count(*) from prewarm" postgres count --------- 1000000 (1 row) real 0m0.908s user 0m0.005s sys 0m0.000s
Almost a second to scan through the 1’000’000 rows to get the count. Lets do the same test but pre-warming the table before:
postgres@oel7:/home/postgres/ [PG9] pg_ctl -D /u02/pgdata/PG9 stop -m fast postgres@oel7:/home/postgres/ [PG9] sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches' postgres@oel7:/home/postgres/ [PG9] pg_ctl -D /u02/pgdata/PG9 start postgres@oel7:/home/postgres/ [PG9] psql -c "select pg_prewarm('prewarm')" postgres pg_prewarm ------------ 8334 (1 row)
Note: The number returned by pg_prewarm is the number of pages affected/pre-warmed.
How long does it now take to get the result:
postgres@oel7:/home/postgres/ [PG9] time psql -c "select count(*) from prewarm" postgres count --------- 1000000 (1 row) real 0m0.189s user 0m0.002s sys 0m0.004s
Not so bad, isn’t it? There is a second parameter for pg_prewarm which specifies the method to use (copied from the documentation):
- prefetch: issues asynchronous prefetch requests to the operating system, if this is supported, or throws an error otherwise
- read: reads the requested range of blocks; unlike prefetch, this is synchronous and supported on all platforms and builds, but may be slower
- buffer: reads the requested range of blocks into the database buffer cache
The default is “buffer”. If you write a little script which pre-warms the tables you need and put this at the end of your PostgreSQL startup script you can save the time it takes to load the blocks/pages from disk when the first requests come in after a restart of the instance.
There is another extension called pg_hibernater which automatically stores the IDs of the buffers in the buffer cache when the server shuts down and re-reads them again automatically when the server starts.
To install this module: Download, extract, make install as the postgres user:
postgres@oel7:/var/tmp/ [PG9] unzip pg_hibernator-master.zip Archive: pg_hibernator-master.zip 32b344fab58242552730e142adffa8650c7df4ac creating: pg_hibernator-master/ inflating: pg_hibernator-master/LICENSE inflating: pg_hibernator-master/Makefile inflating: pg_hibernator-master/README.md inflating: pg_hibernator-master/misc.c inflating: pg_hibernator-master/pg_hibernate.c inflating: pg_hibernator-master/pg_hibernate_9.3.c inflating: pg_hibernator-master/pg_hibernator.h creating: pg_hibernator-master/tests/ inflating: pg_hibernator-master/tests/pg_hibernator_comparison_chart.ods inflating: pg_hibernator-master/tests/test_run.txt postgres@oel7:/var/tmp/ [PG9] cd pg_hibernator-master postgres@oel7:/var/tmp/pg_hibernator-master/ [PG9] make install
We’ll need to add the module to the shared_preload_libraries parameter for pg_hibernator to start up:
(postgres@[local]:5001) [postgres] > show shared_preload_libraries; shared_preload_libraries -------------------------- (1 row) Time: 0.283 ms (postgres@[local]:5001) [postgres] > alter system set shared_preload_libraries='pg_hibernator'; ALTER SYSTEM Time: 59.374 ms
Restart the server:
postgres@oel7:/home/postgres/ [PG9] pg_ctl -D /u02/pgdata/PG9 restart -m fast
Double-check the setting:
(postgres@[local]:5001) [postgres] > show shared_preload_libraries ; shared_preload_libraries -------------------------- pg_hibernator (1 row) Time: 0.307 ms
There are a few parameters which control the behavior (check the documentation for details):
(postgres@[local]:5001) [postgres] > show pg_hibernator.enabled; pg_hibernator.enabled ----------------------- on (1 row) Time: 0.148 ms (postgres@[local]:5001) [postgres] > show pg_hibernator.parallel; pg_hibernator.parallel ------------------------ off (1 row) Time: 0.140 ms (postgres@[local]:5001) [postgres] > show pg_hibernator.default_database; pg_hibernator.default_database -------------------------------- postgres (1 row) Time: 0.142 ms
Once the module is running there is a new background process called “Buffer Saver”. This process will save the buffer IDs when the server shuts down.
postgres@oel7:/home/postgres/ [PG9] ps -ef | grep "Buffer Saver" postgres 22773 22764 0 16:12 ? 00:00:00 postgres: bgworker: Buffer Saver postgres 22810 20779 0 16:17 pts/2 00:00:00 grep --color=auto Buffer Saver
If you check your data directory you should find that pg_hibernater writes a file per database which has buffers in the cache on shutdown of the server:
postgres@oel7:/home/postgres/ [PG9] ls -la $PGDATA/pg_hibernator/ total 12 drwx------. 2 postgres postgres 32 Mar 9 16:18 . drwx------. 19 postgres postgres 4096 Mar 9 16:18 .. -rw-------. 1 postgres postgres 91 Mar 9 16:18 1.save -rw-------. 1 postgres postgres 619 Mar 9 16:18 2.save
So, lets do the same test again:
postgres@oel7:/home/postgres/ [PG9] pg_ctl -D /u02/pgdata/PG9 stop -m fast postgres@oel7:/home/postgres/ [PG9] sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches' postgres@oel7:/home/postgres/ [PG9] pg_ctl -D /u02/pgdata/PG9 start
How fast is it?
postgres@oel7:/home/postgres/ [PG9] time psql -c "select count(*) from prewarm" postgres count --------- 1000000 (1 row) real 0m0.623s user 0m0.004s sys 0m0.000s
Not as fast as with the pg_prewarm method but still faster as without any pre-warming. Maybe this is because the implementation is different, I did not test it entirely. At least it automates the save and restore of the buffer cache. For taking this into production some more testing is required.