Infrastructure at your Service

Daniel Westermann

Dealing with corrupted system indexes in PostgreSQL

This is something you do not want to see on an important PostgreSQL system:

[email protected]:/home/postgres/ [pgdev] psql
psql: error: could not connect to server: FATAL:  index "pg_class_oid_index" contains unexpected zero page at block 0
HINT:  Please REINDEX it.

The hint is pretty clear on how you should fix this, but anyway, lets do a short demo on how to do it.

There are various cases that could lead to corrupt indexes or tables in PostgreSQL: It could be PostgreSQL bug, of course. It could be an issue on the file system. It could also be caused by a power loss and your storage system somehow was not able to write the file(s) on disk anymore.

For this little demo we’ll have a look at what indexes are available on the pg_class catalog table:

postgres=# select * from pg_indexes where tablename = 'pg_class';
 schemaname | tablename |             indexname             | tablespace |                                                    indexdef                                                    
------------+-----------+-----------------------------------+------------+----------------------------------------------------------------------------------------------------------------
 pg_catalog | pg_class  | pg_class_oid_index                |            | CREATE UNIQUE INDEX pg_class_oid_index ON pg_catalog.pg_class USING btree (oid)
 pg_catalog | pg_class  | pg_class_relname_nsp_index        |            | CREATE UNIQUE INDEX pg_class_relname_nsp_index ON pg_catalog.pg_class USING btree (relname, relnamespace)
 pg_catalog | pg_class  | pg_class_tblspc_relfilenode_index |            | CREATE INDEX pg_class_tblspc_relfilenode_index ON pg_catalog.pg_class USING btree (reltablespace, relfilenode)

Taking the first one as an example lets corrupt the index by writing zeros to the beginning of the file on disk:

postgres=# select pg_relation_filepath('pg_class_oid_index');
 pg_relation_filepath 
----------------------
 base/12710/16386
(1 row)
postgres=# \! echo $PGDATA
/u02/pgdata/DEV
postgres=# \! dd if=/dev/zero of=/u02/pgdata/DEV/base/12710/16386 bs=8k count=2
2+0 records in
2+0 records out
16384 bytes (16 kB, 16 KiB) copied, 0.000226611 s, 72.3 MB/s

What happens when we ask for something that should come from the index we just destroyed?

postgres=# select oid from pg_class where oid = 123344;
ERROR:  could not read block 2 in file "base/12710/16386": read only 0 of 8192 bytes

This can’t obviously work so lets apply the fix that PostgreSQL recommended above and re-index the index:

postgres=# reindex index pg_class_oid_index;
REINDEX
postgres=# select oid from pg_class where oid = 123344;
 oid 
-----
(0 rows)

After re-indexing (which completely re-writes the index) all is fine again. One issue with re-index is, that it blocks concurrent activity against the table. You might think that you can do this concurrently (which is not blocking) but this is not the case for system indexes:

postgres=# reindex index concurrently pg_class_oid_index;
ERROR:  cannot reindex system catalogs concurrently

So now we know how we can fix a corrupted index on a running system when we are already connected. A much worse situation would be this:

postgres=# select pg_relation_filepath('pg_class_oid_index');
 pg_relation_filepath 
----------------------
 base/12710/16389
(1 row)

postgres=# \! dd if=/dev/zero of=/u02/pgdata/DEV/base/12710/16389 bs=8k count=2
2+0 records in
2+0 records out
16384 bytes (16 kB, 16 KiB) copied, 0.00029232 s, 56.0 MB/s
postgres=# \q
21:57:03 [email protected]:/home/postgres/ [pgdev] pg_ctl restart -m fast

This is the same test as before (note that the file name changed as re-index has created a brand new index) but in addition PostgreSQL was restarted. Until now all seems fine as we did not see any issues when PostgreSQL started up, but:

[email protected]:/home/postgres/ [pgdev] psql postgres
psql: error: could not connect to server: FATAL:  index "pg_class_oid_index" contains unexpected zero page at block 0
HINT:  Please REINDEX it.

You are not able to connect anymore so how can you re-index the corrupted index then? There is a parameter called “ignore_system_indexes” that you can use for this case. It tells PostgreSQL to ignore any indexes on system catalog tables.

[email protected]:/home/postgres/ [pgdev] echo "ignore_system_indexes='true'" >> $PGDATA/postgresql.auto.conf
[email protected]:/home/postgres/ [pgdev] pg_ctl restart -m fast

Once this is set connections are possible again and you can apply the same fix:

[email protected]:/home/postgres/ [pgdev] psql -X postgres
psql (13devel)
Type "help" for help.

postgres=# reindex index pg_class_oid_index;
REINDEX
postgres=# alter system set ignore_system_indexes = 'false';
ALTER SYSTEM
postgres=# \q
[email protected]:/home/postgres/ [pgdev] pg_ctl restart -m fast
...
[email protected]:/home/postgres/ [pgdev] psql -X postgres
psql (13devel)
Type "help" for help.

postgres=# 

Hopefully you’ll never see index corruption on system indexes in real life, but if you do, it is good to know the options for fixing it.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure