PostgreSQL supports index only scans since version 9.2 which was released in September 2013. The purpose of an index only scan is to fetch all the required values entirely from the index without visiting the table (the heap) at all. Of course that can speed up a query because avoiding to touch the heap, means reading less data and reading less data is obviously faster than reading more data. So index only scans are a good thing but unfortunately it does not always mean that the heap is not touched.

As always, lets start by creating a sample table and populate it with some data:

postgres=# create table t1 ( a int, b int, c int );
CREATE TABLE
postgres=# insert into t1 select a.*,a.*,a.* from generate_series(1,1000000) a;
INSERT 0 1000000
postgres=# d+ t1
                                    Table "public.t1"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
--------+---------+-----------+----------+---------+---------+--------------+-------------
 a      | integer |           |          |         | plain   |              | 
 b      | integer |           |          |         | plain   |              | 
 c      | integer |           |          |         | plain   |              | 

Without any index a query like the following one needs to read the whole table for getting the result:

postgres=# explain (analyze,buffers,costs off) select a from t1 where b = 5;
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Gather (actual time=2.187..158.023 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=5406
   ->  Parallel Seq Scan on t1 (actual time=68.645..119.828 rows=0 loops=3)
         Filter: (b = 5)
         Rows Removed by Filter: 333333
         Buffers: shared hit=5406
 Planning time: 0.209 ms
 Execution time: 158.079 ms
(10 rows)

In this case PostgreSQL decides to do a parallel sequential scan which is fine. The only other option would be to do a serial sequential scan as we do not have any indexes on that table. What people usually do in such cases is to create an index like this one:

postgres=# create index i1 on t1(b);
CREATE INDEX
postgres=# d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | integer |           |          | 
 c      | integer |           |          | 
Indexes:
    "i1" btree (b)

Having that index in place PostgreSQL can use it to return the results faster:

postgres=# explain (analyze,buffers,costs off) select a from t1 where b = 5;
                             QUERY PLAN                              
---------------------------------------------------------------------
 Index Scan using i1 on t1 (actual time=0.035..0.037 rows=1 loops=1)
   Index Cond: (b = 5)
   Buffers: shared hit=4
 Planning time: 0.174 ms
 Execution time: 0.081 ms
(5 rows)

As you can see above the index is used but PostgreSQL will still have to visit the heap for getting the value of “a”. We can improve that even further by creating an index that contains all the information we need to satisfy the query:

postgres=# create index i2 on t1 (b,a);
CREATE INDEX
postgres=# d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | integer |           |          | 
 c      | integer |           |          | 
Indexes:
    "i1" btree (b)
    "i2" btree (b, a)

What will happen now is, that PostgreSQL will switch to an index only scan:

postgres=# explain (analyze,buffers,costs off) select a from t1 where b = 5;
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Index Only Scan using i2 on t1 (actual time=0.111..0.113 rows=1 loops=1)
   Index Cond: (b = 5)
   Heap Fetches: 1
   Buffers: shared hit=1 read=3
 Planning time: 0.515 ms
 Execution time: 0.161 ms
(6 rows)

But: There is still a fetch from the heap. Why that? For answering that, lets list the files on disk for that table:

postgres=# select pg_relation_filepath('t1');
 pg_relation_filepath 
----------------------
 base/34013/34082
(1 row)

postgres=# ! ls -l $PGDATA/base/34013/34082*
-rw-------. 1 postgres postgres 44285952 Nov  8 04:51 /u02/pgdata/10/PG103/base/34013/34082
-rw-------. 1 postgres postgres    32768 Nov  8 04:46 /u02/pgdata/10/PG103/base/34013/34082_fsm

… and here we go: The table has a free space map but the visibility map is not yet there. Without the visibility map PostgreSQL can not know if all the rows in that page are visible to all current transactions and therefore has to visit the heap to get that information. As soon as we create the visibility map:

postgres=# vacuum t1;
VACUUM
postgres=# ! ls -l $PGDATA/base/34013/34082*
-rw-------. 1 postgres postgres 44285952 Nov  8 04:51 /u02/pgdata/10/PG103/base/34013/34082
-rw-------. 1 postgres postgres    32768 Nov  8 04:46 /u02/pgdata/10/PG103/base/34013/34082_fsm
-rw-------. 1 postgres postgres     8192 Nov  8 07:18 /u02/pgdata/10/PG103/base/34013/34082_vm
postgres=# explain (analyze,buffers,costs off) select a from t1 where b = 5;
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Index Only Scan using i2 on t1 (actual time=0.052..0.054 rows=1 loops=1)
   Index Cond: (b = 5)
   Heap Fetches: 0
   Buffers: shared hit=4
 Planning time: 0.446 ms
 Execution time: 0.106 ms
(6 rows)

… the fetch from the heap is gone and we have a real index only scan (although the visibility map is always scanned). To demonstrate that in more detail lets get the physical location of the row we want to read:

>
postgres=# select ctid,* from t1 where b=5;
 ctid  | a | b | c 
-------+---+---+---
 (0,5) | 5 | 5 | 5
(1 row)

Now we know that the row is in block 0 and it is the 5th row in that block. Let’s check, for that block, if all rows are visible to all current transactions:

postgres=# create extension pg_visibility;
CREATE EXTENSION
postgres=# select pg_visibility_map('t1'::regclass, 0);
 pg_visibility_map 
-------------------
 (t,f)
(1 row)

Yes, they are (the first “t”, which is true, means all visible). What happens when we update the row in a second session?

postgres=# update t1 set a=8 where b=5;
UPDATE 1

Do we still get a “true” when we ask if all rows in that block are visible to all transactions?

postgres=# select pg_visibility_map('t1'::regclass, 0);
 pg_visibility_map 
-------------------
 (f,f)
(1 row)

No, and that means two things: First of all a modification to a page clears the bit in the visibility map. The second consequence is, that our index only scan will need to visit the heap again:

postgres=# explain (analyze,buffers,costs off) select a from t1 where b = 5;
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Index Only Scan using i2 on t1 (actual time=0.263..0.267 rows=1 loops=1)
   Index Cond: (b = 5)
   Heap Fetches: 2
   Buffers: shared hit=6 dirtied=3
 Planning time: 0.205 ms
 Execution time: 0.328 ms
(6 rows)

The question now is: Why two heap fetches? First of all every update in PostgreSQL creates a new row:

postgres=# select ctid,* from t1 where b=5;
   ctid    | a | b | c 
-----------+---+---+---
 (5405,76) | 8 | 5 | 5
(1 row)

Our row is now in a new block (and even if if would be in the same block it would be at another location in the block) and that of course also affects the index entry which points to that row. The index still points to the old version of the row and there is the pointer to the current version which means two heap fetches (when you update a column that is not part of the index, that is called a hot update, more on that in another post). For the next execution we see one heap fetch again:

postgres=# explain (analyze,buffers,costs off) select a from t1 where b = 5;
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Index Only Scan using i2 on t1 (actual time=0.022..0.023 rows=1 loops=1)
   Index Cond: (b = 5)
   Heap Fetches: 1
   Buffers: shared hit=5
 Planning time: 0.093 ms
 Execution time: 0.047 ms
(6 rows)

Not sure why only one, at the moment, but I’ll update this blog once I have more information.

What you need to remember is, that an index only scan is not always index only. Depending on how many modifications are happening on that table, it might well be that PostgreSQL must visit the heap quite a lot of times which of course slows down things. For tables where most of the blocks are static an index only scan is great.