Infrastructure at your Service

Daniel Westermann

What are custom and generic plans in PostgreSQL?

I have already written a post about prepared statements in PostgreSQL some time ago. What I did not mention in that post is the concept of generic and custom plans. So lets have a look at those.

As always, we start with creating a demo table and populate that table with some sample data:

pgbench=# create table demo ( a int, b text );
CREATE TABLE
pgbench=# insert into demo select i, 'aaa' from generate_series (1,100) i;
INSERT 0 100
pgbench=# insert into demo select i, 'bbb' from generate_series (101,200) i;
INSERT 0 100
pgbench=# insert into demo select i, 'ccc' from generate_series (201,300) i;
INSERT 0 100
pgbench=# analyze demo;
ANALYZE

Now that we have some data we can prepare a statement we would like to execute with various values:

pgbench=# prepare my_stmt as select * from demo where b = $1;
PREPARE

Btw: You can check for the currently available prepared statements in your session by querying the pg_prepared_statements catalog view:

pgbench=# select * from pg_prepared_statements;
  name   |                      statement                      |         prepare_time          | parameter_types | from_sql 
---------+-----------------------------------------------------+-------------------------------+-----------------+----------
 my_stmt | prepare my_stmt as select * from demo where b = $1; | 2019-02-05 13:15:39.232521+01 | {text}          | t

When we explain(analyze) that statement what do we see?

pgbench=# explain (analyze) execute my_stmt ( 'aaa' );
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.111..0.230 rows=100 loops=1)
   Filter: (b = 'aaa'::text)
   Rows Removed by Filter: 200
 Planning time: 0.344 ms
 Execution time: 0.285 ms
(5 rows)

In the “Filter” line of the execution plan we can see the real value (‘aaa’) we passed to our prepared statement. When you see that, it is a so called custom plan. When PostgreSQL goes for a custom plan that means the statement will be re-planned for the provided set of parameters. When you execute that a few times more:

pgbench=# explain (analyze) execute my_stmt ( 'aaa' );
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.045..0.158 rows=100 loops=1)
   Filter: (b = 'aaa'::text)
   Rows Removed by Filter: 200
 Planning time: 0.243 ms
 Execution time: 0.225 ms
(5 rows)

pgbench=# explain (analyze) execute my_stmt ( 'aaa' );
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.035..0.123 rows=100 loops=1)
   Filter: (b = 'aaa'::text)
   Rows Removed by Filter: 200
 Planning time: 0.416 ms
 Execution time: 0.173 ms
(5 rows)

pgbench=# explain (analyze) execute my_stmt ( 'aaa' );
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.036..0.124 rows=100 loops=1)
   Filter: (b = 'aaa'::text)
   Rows Removed by Filter: 200
 Planning time: 0.195 ms
 Execution time: 0.178 ms
(5 rows)

pgbench=# explain (analyze) execute my_stmt ( 'aaa' );
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.035..0.126 rows=100 loops=1)
   Filter: (b = 'aaa'::text)
   Rows Removed by Filter: 200
 Planning time: 0.192 ms
 Execution time: 0.224 ms
(5 rows)

pgbench=# explain (analyze) execute my_stmt ( 'aaa' );
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.038..0.130 rows=100 loops=1)
   Filter: (b = $1)
   Rows Removed by Filter: 200
 Planning time: 0.191 ms
 Execution time: 0.183 ms
(5 rows)

… you will see that the “Filter” line changes from displaying the actual parameter to a place holder. Now we have a generic plan. This generic plan will not change anymore for the lifetime of the prepared statement no matter which value you pass into the prepared statement:

pgbench=# explain (analyze) execute my_stmt ( 'bbb' );
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.096..0.219 rows=100 loops=1)
   Filter: (b = $1)
   Rows Removed by Filter: 200
 Planning time: 0.275 ms
 Execution time: 0.352 ms
(5 rows)

pgbench=# explain (analyze) execute my_stmt ( 'ccc' );
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.090..0.132 rows=100 loops=1)
   Filter: (b = $1)
   Rows Removed by Filter: 200
 Planning time: 0.084 ms
 Execution time: 0.204 ms
(5 rows)

pgbench=# explain (analyze) execute my_stmt ( null );
                                           QUERY PLAN                                           
------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.033..0.033 rows=0 loops=1)
   Filter: (b = $1)
   Rows Removed by Filter: 300
 Planning time: 0.018 ms
 Execution time: 0.051 ms
(5 rows)

When you take a look at the source code (src/backend/utils/cache/plancache.c) you will see why it changes after 5 executions:

/*
 * choose_custom_plan: choose whether to use custom or generic plan
 *
 * This defines the policy followed by GetCachedPlan.
 */
static bool
choose_custom_plan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
        double          avg_custom_cost;

        /* One-shot plans will always be considered custom */
        if (plansource->is_oneshot)
                return true;

        /* Otherwise, never any point in a custom plan if there's no parameters */
        if (boundParams == NULL)
                return false;
        /* ... nor for transaction control statements */
        if (IsTransactionStmtPlan(plansource))
                return false;

        /* See if caller wants to force the decision */
        if (plansource->cursor_options & CURSOR_OPT_GENERIC_PLAN)
                return false;
        if (plansource->cursor_options & CURSOR_OPT_CUSTOM_PLAN)
                return true;

        /* Generate custom plans until we have done at least 5 (arbitrary) */
        if (plansource->num_custom_plans < 5)
                return true;

Even if we change the data and analyze the table again we will still get a generic plan once PostgreSQL went for it:

pgbench=# insert into demo select i, 'ddd' from generate_series (201,210) i;
INSERT 0 10
pgbench=# insert into demo select i, 'ee' from generate_series (211,211) i;
INSERT 0 1
pgbench=# analyze demo;
ANALYZE
pgbench=# select b,count(*) from demo group by b order by b;
  b  | count 
-----+-------
 aaa |   100
 bbb |   100
 ccc |   100
 ddd |    10
 ee  |     1
(5 rows)

pgbench=# explain (analyze) execute my_stmt ( 'ddd' );
                                           QUERY PLAN                                           
------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.88 rows=78 width=8) (actual time=0.147..0.151 rows=10 loops=1)
   Filter: (b = $1)
   Rows Removed by Filter: 300
 Planning time: 0.293 ms
 Execution time: 0.190 ms
(5 rows)

The situation changes when we have much more data, data is not uniformly distributed and we have an index on the column “b”:

pgbench=# truncate demo;
TRUNCATE TABLE
pgbench=# insert into demo select i, 'aaa' from generate_series (1,1000000) i;
INSERT 0 1000000
pgbench=# insert into demo select i, 'bbb' from generate_series (1000001,2000000) i;
INSERT 0 1000000
pgbench=# insert into demo select i, 'ccc' from generate_series (2000001,3000000) i;
INSERT 0 1000000
pgbench=# insert into demo select i, 'eee' from generate_series (3000001,3000010) i;
INSERT 0 10
pgbench=# create index i1 on demo (b);
CREATE INDEX
pgbench=# select b,count(*) from demo group by b order by b;
  b  |  count  
-----+---------
 aaa | 1000000
 bbb | 1000000
 ccc | 1000000
 eee |      10
(4 rows)

pgbench=# prepare my_stmt as select * from demo where b = $1;
PREPARE

No matter how often we execute the following statement (which asks for ‘eee’), we never get a generic plan:

pgbench=# explain (analyze) execute my_stmt ('eee');
                                                QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
 Index Scan using i1 on demo  (cost=0.43..4.45 rows=1 width=8) (actual time=0.054..0.061 rows=10 loops=1)
   Index Cond: (b = 'eee'::text)
 Planning time: 0.249 ms
 Execution time: 0.106 ms
(4 rows)

-----> REPEAT THAT HOW OFTEN YOU WANT BUT AT LEAST 10 TIMES

pgbench=# explain (analyze) execute my_stmt ('eee');
                                                QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
 Index Scan using i1 on demo  (cost=0.43..4.45 rows=1 width=8) (actual time=0.054..0.061 rows=10 loops=1)
   Index Cond: (b = 'eee'::text)
 Planning time: 0.249 ms
 Execution time: 0.106 ms

This is because the custom plan (which includes the costs for re-planning) is always cheaper than the generic plan (which does not include the costs for re-planning) when we have a data distribution like that. The documentation is very clear about that: “Using EXECUTE values which are rare in columns with many duplicates can generate custom plans that are so much cheaper than the generic plan, even after adding planning overhead, that the generic plan might never be used”.

Hope that helps.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure