Infrastructure at your Service

When parallel execution was introduced with PostgreSQL 9.6 it was quite limited and supported only a few operations to run in parallel. Over time more and more operations were enabled to run in parallel. When PostgreSQL 15 will be released next year it will come with parallel distinct. Here is an easy example:

If you do something like this, e.g. in PostgreSQL 13:

postgres=# select version();
                                              version                                              
---------------------------------------------------------------------------------------------------
 PostgreSQL 13.3 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)
postgres=# create table t1 ( a int );
CREATE TABLE
postgres=# insert into t1 select 1 from generate_series(1,100000);
INSERT 0 100000
postgres=# insert into t1 select 2 from generate_series(1,300000);
INSERT 0 300000
postgres=# 
postgres=# insert into t1 select 4 from generate_series(1,400000);
INSERT 0 400000
postgres=# insert into t1 select 5 from generate_series(1,500000);
INSERT 0 500000
postgres=# analyze t1;
ANALYZE

A distinct over the “a” column will always run serially:

postgres=# explain analyze select distinct a from t1;

                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=22003.00..22003.04 rows=4 width=4) (actual time=1295.168..1295.172 rows=4 loops=1)
   Group Key: a
   Batches: 1  Memory Usage: 24kB
   ->  Seq Scan on t1  (cost=0.00..18753.00 rows=1300000 width=4) (actual time=0.041..608.124 rows=1300000 loops=1)
 Planning Time: 0.249 ms
 Execution Time: 1295.887 ms
(6 rows)

Doing the same test against the current master branch shows that parallel execution is triggered:

postgres=# select version();
                                               version                                                
------------------------------------------------------------------------------------------------------
 PostgreSQL 15devel on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)

postgres=# create table t1 ( a int );
CREATE TABLE
postgres=# insert into t1 select 1 from generate_series(1,100000);
INSERT 0 100000
postgres=# insert into t1 select 2 from generate_series(1,300000);
INSERT 0 300000
postgres=# insert into t1 select 4 from generate_series(1,400000);
INSERT 0 400000
postgres=# insert into t1 select 5 from generate_series(1,500000);
INSERT 0 500000
postgres=# analyze t1;
ANALYZE
postgres=# explain analyze select distinct a from t1;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=13524.79..13524.83 rows=4 width=4) (actual time=1382.720..1383.346 rows=4 loops=1)
   ->  Sort  (cost=13524.79..13524.81 rows=8 width=4) (actual time=1382.716..1383.332 rows=12 loops=1)
         Sort Key: a
         Sort Method: quicksort  Memory: 25kB
         ->  Gather  (cost=13523.83..13524.67 rows=8 width=4) (actual time=1382.681..1383.307 rows=12 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  HashAggregate  (cost=12523.83..12523.87 rows=4 width=4) (actual time=1340.241..1340.245 rows=4 loops=3)
                     Group Key: a
                     Batches: 1  Memory Usage: 24kB
                     Worker 0:  Batches: 1  Memory Usage: 24kB
                     Worker 1:  Batches: 1  Memory Usage: 24kB
                     ->  Parallel Seq Scan on t1  (cost=0.00..11169.67 rows=541667 width=4) (actual time=0.028..663.671 rows=433333 loops=3)
 Planning Time: 0.338 ms
 Execution Time: 1383.477 ms
(15 rows)

Nice, details here.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure