Infrastructure at your Service

In PostgreSQL a row or tuple can not span multiple pages (a page is typically 8kB), but of course you can store larger rows and PostgreSQL brakes and compresses these rows into smaller chunks by using a technique called TOAST. Once your table contains a toast-able data type a so-called toast table is created automatically. Up to PostgreSQL 13 you had no choice how the data is compressed, but a recent commit brings the option to use LZ4 as compression method for TOAST. As you can read in the Wikipedia article, LZ4 is all about compression and decompression speed, so let’s have a look.

If you want to make use of this new feature, you need to compile PostgreSQL with support for it:

[email protected]:/home/postgres/postgresql/ [pgdev] ./configure --help | grep LZ4
  --with-lz4              build with LZ4 support
  LZ4_CFLAGS  C compiler flags for LZ4, overriding pkg-config
  LZ4_LIBS    linker flags for LZ4, overriding pkg-config

For that to work, the operating system needs to have the corresponding libraries installed. For Debian this is:

[email protected]:/home/postgres/postgresql/ [pgdev] apt search liblz4-dev
Sorting... Done
Full Text Search... Done
liblz4-dev/stable,now 1.8.3-1 amd64 [installed]
  Fast LZ compression algorithm library - development files

Having that ready and PostgreSQL compiled and installed, lets create two tables: One with the default compression and one with the new LZ4 compression:

postgres=# create table t1 ( a text );
CREATE TABLE
postgres=# create table t2 ( a text compression LZ4 );
CREATE TABLE
postgres=# \d+ t1
                                          Table "public.t1"
 Column | Type | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
--------+------+-----------+----------+---------+----------+-------------+--------------+-------------
 a      | text |           |          |         | extended | pglz        |              | 
Access method: heap

postgres=# \d+ t2
                                          Table "public.t2"
 Column | Type | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
--------+------+-----------+----------+---------+----------+-------------+--------------+-------------
 a      | text |           |          |         | extended | lz4         |              | 
Access method: heap

Both tables automatically got a toast table attached:

postgres=# select reltoastrelid from pg_class where relname in ('t1','t2');
 reltoastrelid 
---------------
         16387
         16392
(2 rows)

postgres=# select oid,relname from pg_class where oid in (16387,16392 );
  oid  |    relname     
-------+----------------
 16387 | pg_toast_16384
 16392 | pg_toast_16389
(2 rows)

postgres=# \d+ pg_toast.pg_toast_16384
TOAST table "pg_toast.pg_toast_16384"
   Column   |  Type   | Storage 
------------+---------+---------
 chunk_id   | oid     | plain
 chunk_seq  | integer | plain
 chunk_data | bytea   | plain
Owning table: "public.t1"
Indexes:
    "pg_toast_16384_index" PRIMARY KEY, btree (chunk_id, chunk_seq)
Access method: heap

postgres=# \d+ pg_toast.pg_toast_16389
TOAST table "pg_toast.pg_toast_16389"
   Column   |  Type   | Storage 
------------+---------+---------
 chunk_id   | oid     | plain
 chunk_seq  | integer | plain
 chunk_data | bytea   | plain
Owning table: "public.t2"
Indexes:
    "pg_toast_16389_index" PRIMARY KEY, btree (chunk_id, chunk_seq)
Access method: heap

Let’s check if there is a difference if we populate those tables with some dummy data:

postgres=# \timing on
Timing is on.
postgres=# insert into t1(a) select lpad('a',1000000,'a') from generate_series(1,1000);
INSERT 0 1000
Time: 4643.583 ms (00:04.644)
postgres=# insert into t2(a) select lpad('a',1000000,'a') from generate_series(1,1000);
INSERT 0 1000
Time: 314.107 ms
postgres=# truncate table t1,t2;
TRUNCATE TABLE
Time: 4143.579 ms (00:04.144)
postgres=# insert into t1(a) select lpad('a',1000000,'a') from generate_series(1,1000);
INSERT 0 1000
Time: 4759.809 ms (00:04.760)
postgres=# insert into t2(a) select lpad('a',1000000,'a') from generate_series(1,1000);
INSERT 0 1000
Time: 1011.221 ms (00:01.011)
postgres=# truncate table t1,t2;
TRUNCATE TABLE
Time: 41.449 ms
postgres=# insert into t1(a) select lpad('a',1000000,'a') from generate_series(1,1000);
INSERT 0 1000
Time: 4507.416 ms (00:04.507)
postgres=# insert into t2(a) select lpad('a',1000000,'a') from generate_series(1,1000);
INSERT 0 1000
Time: 299.458 ms

This is a huge difference, and I’ve repeated that test several times and got almost the same numbers. That’s really a great improvement regarding speed. But what about the size on disk?

postgres=# select relname,reltoastrelid from pg_class where relname in ('t1','t2');
 relname | reltoastrelid 
---------+---------------
 t1      |         16387
 t2      |         16392
(2 rows)

Time: 3.091 ms
postgres=# select oid,relname from pg_class where oid in (16387,16392);
  oid  |    relname     
-------+----------------
 16387 | pg_toast_16384
 16392 | pg_toast_16389
(2 rows)

Time: 0.836 ms
postgres=# select pg_size_pretty(pg_relation_size('pg_toast.pg_toast_16384'));
 pg_size_pretty 
----------------
 12 MB
(1 row)

Time: 0.506 ms
postgres=# select pg_size_pretty(pg_relation_size('pg_toast.pg_toast_16389'));
 pg_size_pretty 
----------------
 4000 kB
(1 row)

Time: 0.743 ms

Quite impressive. In addition to the speed, we also get a great reduction on disk. Because compression is better with LZ4, we see fewer rows in the toast table for t2:

postgres=# select count(*) from pg_toast.pg_toast_16384;
 count 
-------
  6000
(1 row)

Time: 5.807 ms
postgres=# select count(*) from pg_toast.pg_toast_16389;
 count 
-------
  2000
(1 row)

Of course the string I used here is not very representative, but this new feature really looks promising. There is also a new parameter if you want to change to this behavior globally;

postgres=# show default_toast_compression ;
 default_toast_compression 
---------------------------
 pglz
(1 row)
postgres=# alter system set default_toast_compression = 'lz4';
ALTER SYSTEM
postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure