Infrastructure at your Service

Daniel Westermann

Parallel sequential scans are coming to PostgreSQL

It didn’t made it into the upcoming PostgreSQL 9.5 release but it will probably be coming in 9.6: Parallel sequential scans. If you take a look at the development documentation some new parameters showed up:

  • max_parallel_degree: The maximum degree of parallelism for an individual parallel operation.
  • parallel_tuple_cost: Sets the planner’s estimate of the cost of transferring a tuple from a parallel worker process to another process. The default is 0.1
  • parallel_setup_cost: Sets the planner’s estimate of the cost of launching parallel worker processes. The default is 1000.

Lets do a quick test. If you want to setup a PostgreSQL instance from the nightly development snapshot you can follow this post.

As you can see below I am running a 9.6 development version:

[email protected]:/home/postgres/ [dummy] install/bin/psql -p 6666 postgres
Null display is "NULL".
Timing is on.
psql (9.6devel)
Type "help" for help.

For testing parallel sequential scans we need to raise the max_parallel_degree parameter which is zero by default:

([email protected][local]:6666) [postgres] > show max_parallel_degree;
(1 row)

Time: 0.302 ms
([email protected][local]:6666) [postgres] > alter system set max_parallel_degree=5;
Time: 5.509 ms

([email protected][local]:6666) [postgres] > select pg_reload_conf();
LOG:  received SIGHUP, reloading configuration files
LOG:  parameter "max_parallel_degree" changed to "5"
(1 row)

Time: 19.252 ms

Note: The log messages displayed when the configuration was reloaded are new, too.

Now lets create some data to test with:

([email protected][local]:6666) [postgres] > create table tt ( a int, b int );

Time: 427.362 ms([email protected][local]:6666) 
[postgres] > insert into tt (a,b) 
                values ( generate_series(1,1000000),generate_series(1,1000000));
INSERT 0 1000000
Time: 4912.641 ms

So, lets see parallel sequential scan in action:

([email protected][local]:6666) [postgres] > explain analyze select count(*) from tt where a between 5 and 50000;
                                                         QUERY PLAN                                                          
 Aggregate  (cost=13800.29..13800.30 rows=1 width=0) (actual time=94.067..94.068 rows=1 loops=1)
   ->  Gather  (cost=1000.00..13677.60 rows=49076 width=0) (actual time=0.655..91.073 rows=49996 loops=1)
         Number of Workers: 2
         ->  Parallel Seq Scan on tt  (cost=0.00..7770.00 rows=49076 width=0) (actual time=8.465..33.548 rows=16665 loops=3)
               Filter: ((a >= 5) AND (a <= 50000))
               Rows Removed by Filter: 316668 ab
 Planning time: 0.280 ms
 Execution time: 94.127 ms
(8 rows)

Note the “Number of Workers” and “Parallel Seq Scan” information in the execution plan. Looks promising … More posts about testing this in more detail to come.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure