Infrastructure at your Service

Daniel Westermann

Can I do it with PostgreSQL? – 14 – optimizer hints

This is a question that comes up quite often: How can I use optimizer hints in PostgreSQL as I can do it in Oracle? Well, you cant, and the reasons are this:

  • Poor application code maintainability: hints in queries require massive refactoring.
  • Interference with upgrades: today’s helpful hints become anti-performance after an upgrade.
  • Encouraging bad DBA habits slap a hint on instead of figuring out the real issue.
  • Does not scale with data size: the hint that’s right when a table is small is likely to be wrong when it gets larger.
  • Failure to actually improve query performance: most of the time, the optimizer is actually right.
  • Interfering with improving the query planner: people who use hints seldom report the query problem to the project.

But this does not mean that you cant influence the optimizer (or “planner” in PostgreSQL wording), it is just not working in the same way. Lets have a look.

On of the reasons that the planner does not choose an index over a sequential scan is that the parameter effective_cache_size is not set properly. To understand what it does you have to know that PostgreSQL works together with the operating system file cache/disk cache very well. It is not required, as you do it in Oracle, to give most of the available memory of the server to the database. Usually you start with 25% of the total available memory and give that to PostgreSQL by setting the parameter shared_buffers to that value. When pages fall out of that region it is still likely that they are available in the disk cache and can be retrieved from there without going down to disk. And this is what effective_cache_size is about: Setting this parameter does not consume more memory but is telling PostgreSQL how big the total cache of the system really is, so shared_buffers plus disk cache. This gets taken into consideration by the planner. A good starting point is 50 to 75% of the available memory. Lets do a quick test to show how this behaves. Lets generate some data:

postgres=# \! cat a.sql
drop table if exists t1;
create table t1 ( a int );
with generator as 
 ( select a.*
     from generate_series ( 1, 5000000 ) a
    order by random()
insert into t1 ( a ) 
     select a
       from generator;
create index i1 on t1(a);
analyze verbose t1;
select * from pg_size_pretty ( pg_relation_size ('t1' ));
select * from pg_size_pretty ( pg_total_relation_size('t1'));
postgres=# \i a.sql
INSERT 0 5000000
psql:a.sql:12: INFO:  analyzing "public.t1"
psql:a.sql:12: INFO:  "t1": scanned 22124 of 22124 pages, containing 5000000 live rows and 0 dead rows; 30000 rows in sample, 5000000 estimated total rows
 173 MB
(1 row)
 280 MB
(1 row)
postgres=# show shared_buffers ;
(1 row)

The table without the index is big enough to not fit into shared_buffers (173MB) and even bigger of course including the index (280MB). When we set effective_cache_size to a very low value we get costs of 40.55 for the statement below (almost no disk cache):

postgres=# SET effective_cache_size TO '1 MB';
postgres=# explain SELECT * FROM t1 ORDER BY  a limit 10;
                                     QUERY PLAN                                      
 Limit  (cost=0.43..40.55 rows=10 width=4)
   ->  Index Only Scan using i1 on t1  (cost=0.43..20057243.41 rows=5000000 width=4)
(2 rows)

Setting this to a more realistic value decreases the costs because it is expected to find the index in the disk cache:

postgres=# SET effective_cache_size TO '5 GB';
postgres=# explain SELECT * FROM t1 ORDER BY  a limit 10;
                                    QUERY PLAN                                     
 Limit  (cost=0.43..0.87 rows=10 width=4)
   ->  Index Only Scan using i1 on t1  (cost=0.43..218347.46 rows=5000000 width=4)
(2 rows)

This is the first “hint” you can set to influence the optimizer/planner. But there are many others. What PostgreSQL allows you to do is to enable or disable features of the planner:

postgres=# select name from pg_settings where name like 'enable%';

Using the same data from above we could disable the index only scan:

postgres=# set enable_indexonlyscan=false;
postgres=# explain (analyze,buffers) SELECT * FROM t1 ORDER BY  a limit 10;
                                                       QUERY PLAN                                                        
 Limit  (cost=0.43..0.87 rows=10 width=4) (actual time=0.019..0.058 rows=10 loops=1)
   Buffers: shared hit=13
   ->  Index Scan using i1 on t1  (cost=0.43..218347.46 rows=5000000 width=4) (actual time=0.017..0.036 rows=10 loops=1)
         Buffers: shared hit=13
 Planning time: 0.057 ms
 Execution time: 0.084 ms
(6 rows)

postgres=# set enable_indexonlyscan=true;
postgres=# explain (analyze,buffers) SELECT * FROM t1 ORDER BY  a limit 10;
                                                          QUERY PLAN                                                          
 Limit  (cost=0.43..0.87 rows=10 width=4) (actual time=0.025..0.072 rows=10 loops=1)
   Buffers: shared hit=13
   ->  Index Only Scan using i1 on t1  (cost=0.43..218347.46 rows=5000000 width=4) (actual time=0.023..0.048 rows=10 loops=1)
         Heap Fetches: 10
         Buffers: shared hit=13
 Planning time: 0.068 ms
 Execution time: 0.105 ms
(7 rows)

But the documentation clearly states: “If the default plan chosen by the optimizer for a particular query is not optimal, a temporary solution is to use one of these configuration parameters to force the optimizer to choose a different plan”. For testing and troubleshooting this can be handy.

Another way to influence the optimizer/planner is to set the planner cost constants:

 postgres=# select name from pg_settings where name like '%cost%' and name not like '%vacuum%';
(7 rows)

What they mean is pretty well documented and how you need to set them (if you need to change them at all) depends on your hardware and application. There are others as well, such as the *collapse_limit* parameters and the parameters for the Genetic Query Optimizer.

Conclusion: There are several ways you can influence the optimizer/planner in PostgreSQL it is just not by using hints.


  • Houri says:


    it seems that even when you cancel for example the hash join and the merge join options (set enable_hashjoin=false;) and you submit a query that can’t be executed via a nested loop, the query planner will use the hash join even though it is disabled.

    Just try select * from t1 full outer join t2 on t1.n1=t2.n1;

    Best regards

    • Daniel Westermann says:

      Hi Houri,

      yes, I know. These enable* parameters do not completely switch the functionality off. When there is no other way PostgreSQL will do it anyway.


  • Les Evens says:

    OK. So all the other databases are wrong by allowing hints but PostGres is right. Stupid. If you need a hint you need a hint. Having to play all these other games because it isn’t allowed is crazy.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure