Infrastructure at your Service

Daniel Westermann

PostgreSQL partitioning (5): Partition pruning

This is the next post in the PostgreSQL partitioning series. If you missed the previous ones here they are:

  1. PostgreSQL partitioning (1): Preparing the data set
  2. PostgreSQL partitioning (2): Range partitioning
  3. PostgreSQL partitioning (3): List partitioning
  4. PostgreSQL partitioning (4): Hash partitioning

This time we will have a look at partition pruning. Never heard of that? Don’t worry, hopefully it will be clear at the end of this blog post.

When you ask Wikipedia for pruning you get this: “Pruning is a horticultural and silvicultural practice involving the selective removal of certain parts of a plant, such as branches, buds, or roots. Reasons to prune plants include deadwood removal, shaping (by controlling or redirecting growth), improving or sustaining health, reducing risk from falling branches, preparing nursery specimens for transplanting, and both harvesting and increasing the yield or quality of flowers and fruits.”

Although this is about plants it almost exactly describes partition pruning as well: “selective removal of certain parts of a plant”. In our case it is removal of partitions when it is known that the partition(s) can not contain data we are asking for.

Lets come back to our range partitioned table. We partitioned the table by the “time_of_stop” column:

postgres=# select * from pg_partition_tree('traffic_violations_p');
            relid             |     parentrelid      | isleaf | level 
------------------------------+----------------------+--------+-------
 traffic_violations_p         |                      | f      |     0
 traffic_violations_p_default | traffic_violations_p | t      |     1
 traffic_violations_p_2012    | traffic_violations_p | t      |     1
 traffic_violations_p_2013    | traffic_violations_p | t      |     1
 traffic_violations_p_2014    | traffic_violations_p | t      |     1
 traffic_violations_p_2015    | traffic_violations_p | t      |     1
 traffic_violations_p_2016    | traffic_violations_p | t      |     1
 traffic_violations_p_2017    | traffic_violations_p | t      |     1
 traffic_violations_p_2018    | traffic_violations_p | t      |     1
 traffic_violations_p_2019    | traffic_violations_p | t      |     1
 traffic_violations_p_2020    | traffic_violations_p | t      |     1

Each partition contains data from one year. If we ask for data from 2013 PostgreSQL should only read that partition and just ignore the others.

postgres=# explain select count(*) from traffic_violations_p where date_of_stop = date('02-FEB-2013');
                                                QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=11393.96..11393.97 rows=1 width=8)
   ->  Gather  (cost=11393.75..11393.96 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=10393.75..10393.76 rows=1 width=8)
               ->  Parallel Seq Scan on traffic_violations_p_2013  (cost=0.00..10393.29 rows=185 width=0)
                     Filter: (date_of_stop = '2013-02-02'::date)
(6 rows)

Indeed that is happening and only the traffic_violations_p_2013 is considered. All other partitions will just be ignored and that of course is a performance improvement. This is the simple case and it is partition pruning at planning time. Because we have a literal in the where clause PostgreSQL can already decide at planning time which partitions it needs to read and which can be skipped.

Consider this example:

select count(*) 
  from traffic_violations_p 
 where date_of_stop = select date('01-FEB-2013');

In this case PostgreSQL can not know what will be the actual values coming back from the sub-select at planning time. If you are on PostgreSQL 10 there is not much you can do as partition pruning at execution time made it into PostgreSQL 11. As I am on PostgreSQL 12 Beta 1 it should work:

postgres=# postgres=# explain         
select count(*) 
  from traffic_violations_p 
 where date_of_stop = ( select to_date('01.01.2014','DD.MM.YYYY'));
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=85047.50..85047.51 rows=1 width=8)
   InitPlan 1 (returns $0)
     ->  Result  (cost=0.00..0.01 rows=1 width=4)
   ->  Gather  (cost=85047.28..85047.49 rows=2 width=8)
         Workers Planned: 2
         Params Evaluated: $0
         ->  Partial Aggregate  (cost=84047.28..84047.29 rows=1 width=8)
               ->  Parallel Append  (cost=0.00..84042.52 rows=1901 width=0)
                     ->  Parallel Seq Scan on traffic_violations_p_2015  (cost=0.00..12924.89 rows=269 width=0)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2014  (cost=0.00..12235.54 rows=255 width=0)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2016  (cost=0.00..12097.57 rows=252 width=0)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2018  (cost=0.00..12051.87 rows=249 width=0)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2017  (cost=0.00..10996.34 rows=228 width=0)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2013  (cost=0.00..10393.29 rows=218 width=0)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2012  (cost=0.00..8351.41 rows=172 width=0)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2019  (cost=0.00..4959.83 rows=246 width=0)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_default  (cost=0.00..11.54 rows=15 width=0)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2020  (cost=0.00..10.74 rows=1 width=0)
                           Filter: (date_of_stop = $0)
(28 rows)

The execution plan shows that all partitions will be scanned and that is no surprise. When you want to see partition pruning at execution time you actually have to execute the statement, so explain(analyze):

postgres=# explain(analyze)
select count(*) 
  from traffic_violations_p 
 where date_of_stop = ( select to_date('01.01.2014','DD.MM.YYYY'));
                                                                           QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=85047.50..85047.51 rows=1 width=8) (actual time=149.747..149.747 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.012..0.013 rows=1 loops=1)
   ->  Gather  (cost=85047.28..85047.49 rows=2 width=8) (actual time=145.730..150.004 rows=3 loops=1)
         Workers Planned: 2
         Params Evaluated: $0
         Workers Launched: 2
         ->  Partial Aggregate  (cost=84047.28..84047.29 rows=1 width=8) (actual time=119.148..119.149 rows=1 loops=3)
               ->  Parallel Append  (cost=0.00..84042.52 rows=1901 width=0) (actual time=119.052..119.127 rows=189 loops=3)
                     ->  Parallel Seq Scan on traffic_violations_p_2015  (cost=0.00..12924.89 rows=269 width=0) (never executed)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2014  (cost=0.00..12235.54 rows=255 width=0) (actual time=119.024..119.077 rows=189 loops=3)
                           Filter: (date_of_stop = $0)
                           Rows Removed by Filter: 74405
                     ->  Parallel Seq Scan on traffic_violations_p_2016  (cost=0.00..12097.57 rows=252 width=0) (never executed)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2018  (cost=0.00..12051.87 rows=249 width=0) (never executed)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2017  (cost=0.00..10996.34 rows=228 width=0) (never executed)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2013  (cost=0.00..10393.29 rows=218 width=0) (never executed)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2012  (cost=0.00..8351.41 rows=172 width=0) (never executed)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2019  (cost=0.00..4959.83 rows=246 width=0) (never executed)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_default  (cost=0.00..11.54 rows=15 width=0) (never executed)
                           Filter: (date_of_stop = $0)
                     ->  Parallel Seq Scan on traffic_violations_p_2020  (cost=0.00..10.74 rows=1 width=0) (never executed)
                           Filter: (date_of_stop = $0)
 Planning Time: 0.604 ms
 Execution Time: 150.120 ms
(32 rows)

The keywords here are “(never executed)” and this is partition pruning at execution time. There are some limitations with this but this will be covered in the last post of this series when we will look at corner cases when it comes to partitioning in PostgreSQL.

The next post will cover attaching and detaching of partitions.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure