Infrastructure at your Service

Daniel Westermann

Can I do it with PostgreSQL? – 15 – invisible indexes

It has been quite a while since the last post in this series. Today we’ll look at what you know from Oracle as: Invisible indexes. In case you wonder what they might be useful for: Imagine you want to test if an index would benefit one or more queries without affecting the production workload. In other words: Wouldn’t it be cool to create an index but somehow tell the optimizer not to use it for the ongoing queries? This is what invisible indexes are about: Create an index which you believe should improve performance for one or more queries but at the same step make sure that it is not taken into account when the query plan is generated and then executed. The bad news is: This is not possible in PostgreSQL core. The good news is: There is an extension which does exactly this.

The extension is called hypopg and is available via github. The readme states that it works on all PostgreSQL versions starting with 9.2, so lets try it with PostgreSQL 10 Beta1.

postgres@pgbox:/home/postgres/ [PG10B] psql -X postgres
psql (10beta1 dbi services build)
Type "help" for help.

postgres=# select version();
                                                            version                                                            
-------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 10beta1 dbi services build on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit
(1 row)

postgres=# 

Getting the extension downloaded, compiled and installed is straight forward:

postgres@pgbox:/home/postgres/ [PG10B] wget https://github.com/dalibo/hypopg/archive/master.zip
postgres@pgbox:/home/postgres/ [PG10B] unzip master.zip 
postgres@pgbox:/home/postgres/ [PG10B] cd hypopg-master/
postgres@pgbox:/home/postgres/hypopg-master/ [PG10B] make install
postgres@pgbox:/home/postgres/hypopg-master/ [PG10B] psql -X -c "create extension hypopg" postgres
CREATE EXTENSION
postgres@pgbox:/home/postgres/hypopg-master/ [PG10B] psql -X -c "\dx" postgres
                     List of installed extensions
  Name   | Version  |   Schema   |             Description             
---------+----------+------------+-------------------------------------
 hypopg  | 1.1.0dev | public     | Hypothetical indexes for PostgreSQL
 plpgsql | 1.0      | pg_catalog | PL/pgSQL procedural language
(2 rows)

Here we go, all fine until now and we should be ready to use it. Obviously we need a table and some data to test with:

postgres@pgbox:/home/postgres/ [PG10B] psql -X postgres
psql (10beta1 dbi services build)
Type "help" for help.

postgres=# \! cat a.sql
drop table if exists t1;
create table t1 ( a int );
with generator as 
 ( select a.*
     from generate_series ( 1, 5000000 ) a
    order by random()
 )
insert into t1 ( a ) 
     select a
       from generator;
postgres=# \i a.sql
DROP TABLE
CREATE TABLE
INSERT 0 5000000
postgres=# analyze t1;
ANALYZE
postgres=# select * from pg_size_pretty ( pg_total_relation_size ('t1') );
 pg_size_pretty 
----------------
 173 MB
(1 row)

So now we have a table containing some data. The only choice PostgreSQL has to fetch one or more rows is to use a sequential scan (which is a full table scan in Oracle):

postgres=# explain select * from t1 where a = 5;
                             QUERY PLAN                              
---------------------------------------------------------------------
 Gather  (cost=1000.00..49165.77 rows=1 width=4)
   Workers Planned: 2
   ->  Parallel Seq Scan on t1  (cost=0.00..48165.67 rows=1 width=4)
         Filter: (a = 5)
(4 rows)

Although PostgreSQL already knows that only one row needs to be returned (rows=1) it still needs to read the whole table. Lets look at how that looks like when we really execute the query by using “explain (analyze)”:

postgres=# explain (analyze) select * from t1 where a = 5;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..49165.77 rows=1 width=4) (actual time=133.292..133.839 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on t1  (cost=0.00..48165.67 rows=1 width=4) (actual time=110.446..124.888 rows=0 loops=3)
         Filter: (a = 5)
         Rows Removed by Filter: 1666666
 Planning time: 0.055 ms
 Execution time: 135.465 ms
(8 rows)

What kicked in here is parallel query which is available since PostgreSQL 9.6 but this is not really important for the scope of this post. Coming back to the invisible or hypothetical indexes: Having the extension installed we can now do something like this:

postgres=# SELECT * FROM hypopg_create_index('CREATE INDEX ON t1 (a)');
 indexrelid |     indexname     
------------+-------------------
      16399 | btree_t1_a
(1 row)

postgres=# select * from pg_size_pretty ( pg_total_relation_size ('t1') );
 pg_size_pretty 
----------------
 173 MB
(1 row)

What this did is to create a hypothetical index but without consuming any space (pg_total_relation_size counts the indexes as well), so it is pretty fast. What happens to our query now?

postgres=# explain select * from t1 where a = 5;
                                   QUERY PLAN                                    
---------------------------------------------------------------------------------
 Index Only Scan using btree_t1_a on t1  (cost=0.06..8.07 rows=1 width=4)
   Index Cond: (a = 5)
(2 rows)

Quite cool, the index is really getting used and we did not consume any resources for the index itself. Could be a good index to implement. What you need to know is, that this does not work for “explain analyze” as this really executes the query (and we do not really have an index on disk):

postgres=# explain (analyze) select * from t1 where a = 5;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..49165.77 rows=1 width=4) (actual time=76.247..130.235 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on t1  (cost=0.00..48165.67 rows=1 width=4) (actual time=106.861..124.252 rows=0 loops=3)
         Filter: (a = 5)
         Rows Removed by Filter: 1666666
 Planning time: 0.043 ms
 Execution time: 131.866 ms
(8 rows)

If you want to list all the hypothetical indexes you can do this as well:

postgres=# select * from hypopg_list_indexes();
 indexrelid |     indexname     | nspname | relname | amname 
------------+-------------------+---------+---------+--------
      16399 | btree_t1_a | public  | t1      | btree
(1 row)

Of course you can drop them when not anymore required:

postgres=# select * from  hypopg_drop_index(16399);
 hypopg_drop_index 
-------------------
 t
(1 row)

postgres=# SELECT * FROM hypopg_list_indexes();
 indexrelid | indexname | nspname | relname | amname 
------------+-----------+---------+---------+--------
(0 rows)

Hope this helps …

2 Comments

Leave a Reply

Daniel Westermann
Daniel Westermann

Senior Consultant and Technology Leader Open Infrastructure