Infrastructure at your Service

Daniel Westermann

Parallel pg_dump is slow by default?

Short answer: Yes, it is. Being at a customer the last days we wanted to parallel pg_dump a 2TB database. We were quite surprised that it was quite slow and it was not immediately clear why it was. Well, the answer is in the documentation: When you go for parallel dumps you need to use the directory format and this comes with: “This format is compressed by default and also supports parallel dumps.”. Compression takes time, so the question was if we could disable compression which was not clear from the statement: Does “compressed by default” mean that it per default is compressed and you can not change that or does it just mean it is the default, but you can change it?

As always, lets setup a short test case:

postgres=# create table dmp1 as 
           select a,a::varchar b,now() c 
             from generate_series ( 1, 1000000) a;
SELECT 1000000
postgres=# create table dmp2 as select * from dmp1;
SELECT 1000000
postgres=# create table dmp3 as select * from dmp1;
SELECT 1000000
postgres=# create table dmp4 as select * from dmp1;
SELECT 1000000
postgres=# \d dmp*
                        Table "public.dmp1"
 Column |           Type           | Collation | Nullable | Default 
--------+--------------------------+-----------+----------+---------
 a      | integer                  |           |          | 
 b      | character varying        |           |          | 
 c      | timestamp with time zone |           |          | 

                        Table "public.dmp2"
 Column |           Type           | Collation | Nullable | Default 
--------+--------------------------+-----------+----------+---------
 a      | integer                  |           |          | 
 b      | character varying        |           |          | 
 c      | timestamp with time zone |           |          | 

                        Table "public.dmp3"
 Column |           Type           | Collation | Nullable | Default 
--------+--------------------------+-----------+----------+---------
 a      | integer                  |           |          | 
 b      | character varying        |           |          | 
 c      | timestamp with time zone |           |          | 

                        Table "public.dmp4"
 Column |           Type           | Collation | Nullable | Default 
--------+--------------------------+-----------+----------+---------
 a      | integer                  |           |          | 
 b      | character varying        |           |          | 
 c      | timestamp with time zone |           |          | 

We have four tables each containing 1’000’000 rows. When we use pg_dump in parallel with the default it looks like this:

postgres@pgbox:/home/postgres/ [PG10] mkdir /var/tmp/dmp
postgres@pgbox:/home/postgres/ [PG10] time pg_dump --format=d --jobs=4 --file=/var/tmp/dmp/ postgres

real	0m2.788s
user	0m2.459s
sys	0m0.597s
postgres@pgbox:/home/postgres/ [PG10] ls -la /var/tmp/dmp/
total 19528
drwxr-xr-x. 2 postgres postgres    4096 Mar  9 07:16 .
drwxrwxrwt. 4 root     root          51 Mar  9 07:15 ..
-rw-r--r--. 1 postgres postgres      25 Mar  9 07:16 3113.dat.gz
-rw-r--r--. 1 postgres postgres      25 Mar  9 07:16 3114.dat.gz
-rw-r--r--. 1 postgres postgres      25 Mar  9 07:16 3115.dat.gz
-rw-r--r--. 1 postgres postgres 4991138 Mar  9 07:16 3116.dat.gz
-rw-r--r--. 1 postgres postgres 4991138 Mar  9 07:16 3117.dat.gz
-rw-r--r--. 1 postgres postgres 4991138 Mar  9 07:16 3118.dat.gz
-rw-r--r--. 1 postgres postgres 4991138 Mar  9 07:16 3119.dat.gz
-rw-r--r--. 1 postgres postgres    5819 Mar  9 07:16 toc.dat

As stated in the documentation the result is compressed. When speed is more important then the size on disk you can however disable the compression:

postgres@pgbox:/home/postgres/ [PG10] rm -rf /var/tmp/dmp/*
postgres@pgbox:/home/postgres/ [PG10] time pg_dump --format=d --jobs=4 --file=/var/tmp/dmp/ --compress=0 postgres

real	0m5.357s
user	0m0.065s
sys	0m0.460s
postgres@pgbox:/home/postgres/ [PG10] ls -la /var/tmp/dmp/
total 171040
drwxr-xr-x. 2 postgres postgres     4096 Mar  9 07:18 .
drwxrwxrwt. 4 root     root           51 Mar  9 07:15 ..
-rw-r--r--. 1 postgres postgres        5 Mar  9 07:18 3113.dat
-rw-r--r--. 1 postgres postgres        5 Mar  9 07:18 3114.dat
-rw-r--r--. 1 postgres postgres        5 Mar  9 07:18 3115.dat
-rw-r--r--. 1 postgres postgres 43777797 Mar  9 07:18 3116.dat
-rw-r--r--. 1 postgres postgres 43777797 Mar  9 07:18 3117.dat
-rw-r--r--. 1 postgres postgres 43777797 Mar  9 07:18 3118.dat
-rw-r--r--. 1 postgres postgres 43777797 Mar  9 07:18 3119.dat
-rw-r--r--. 1 postgres postgres     5819 Mar  9 07:18 toc.dat

In my case it got slower than the compressed dump but this is because I do not really have fast disks on my little VM. When you have a good storage solution disabling compression should bring you more speed.

Leave a Reply

Daniel Westermann
Daniel Westermann

Senior Consultant and Technology Leader Open Infrastructure