Infrastructure at your Service

Daniel Westermann

Telling the PostgreSQL optimizer more about your functions

When you reference/call functions in PostgreSQL the optimizer does not really know much about the cost nor the amount of rows that a function returns. This is not really surprising as it is hard to predict what the functions is doing and how many rows will be returned for a given set of input parameters. What you might not know is, that indeed you can tell the optimizer a bit more about your functions.

As usual let’s start with a little test setup:

postgres=# create table t1 ( a int, b text, c date );
CREATE TABLE
postgres=# insert into t1 select a,a::text,now() from generate_series(1,1000000) a;
INSERT 0 1000000
postgres=# create unique index i1 on t1(a);
CREATE INDEX
postgres=# analyze t1;
ANALYZE

A simple table containing 1’000’000 rows and one unique index. In addition let’s create a simple function that will return exactly one row from that table:

create or replace function f_tmp ( a_id in int ) returns setof t1
as $$
declare
begin
  return query select * from t1 where a = $1;
end;
$$ language plpgsql;

What is the optimizer doing when you call that function?

postgres=# explain (analyze) select f_tmp (1);
                                         QUERY PLAN                                         
--------------------------------------------------------------------------------------------
 ProjectSet  (cost=0.00..5.27 rows=1000 width=32) (actual time=0.654..0.657 rows=1 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.003..0.004 rows=1 loops=1)
 Planning Time: 0.047 ms
 Execution Time: 0.696 ms
(4 rows)

We know that only one row will be returned but the optimizer is assuming that 1000 rows will be returned. This is the default and documented. So, no matter how many rows will really be returned, PostgreSQL will always estimate 1000. But you have some control and can tell the optimizer that the function will return one row only:

create or replace function f_tmp ( a_id in int ) returns setof t1
as $$
declare
begin
  return query select * from t1 where a = $1;
end;
$$ language plpgsql
   rows 1;

Looking again at the execution plan again:

postgres=# explain (analyze) select f_tmp (1);
                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 ProjectSet  (cost=0.00..0.27 rows=1 width=32) (actual time=0.451..0.454 rows=1 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.003..0.004 rows=1 loops=1)
 Planning Time: 0.068 ms
 Execution Time: 0.503 ms
(4 rows)

Instead of 1000 rows we now do see that only 1 row was estimated which is what we specified when we created the function. Of course this is a very simple example and in reality you often might not be able to tell exactly how many rows will be returned from a function. But at least you can provide a better estimate as the default of 1000. In addition you can also specify a cost for your function (based on cpu_operator_cost):

create or replace function f_tmp ( a_id in int ) returns setof t1
as $$
declare
begin
  return query select * from t1 where a = $1;
end;
$$ language plpgsql
   rows 1
   cost 1;

If you use functions remember that you can give the optimizer more information and that there is a default of 1000.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure