Infrastructure at your Service

Daniel Westermann

Can there be orphaned data files in PostgreSQL?

PostgreSQL, as all other relational database systems, needs to persist data on disk, either by writing the writing the write ahead log or by synchronization the data files on disk whenever there is a checkpoint. When it comes to the data files PostgreSQL creates a new one once a relation reaches segment_site, which is 1GB by default. Because of this a relation can consist of many files if the relation is growing. The question I want to look at in this post is, if there can be orphaned files. A file would be orphaned if it is not part/referenced by any relation but the file is still there on disk. That would be a waste of disk space on the one hand but it would also be inconsistent with what is stored in PostgreSQL catalog.

To start with, lets create an empty table:

postgres=# create table t1 ( a int );

This already created a file on disk and we can ask PostgreSQL which file that actually is:

postgres=# select pg_relation_filepath('t1');
(1 row)

Of course this file is also there on the file system and standard operating system utilities can be used to verify this:

[email protected]:/u02/pgdata/12/ [121] ls -la $PGDATA/base/12724/24577
-rw-------. 1 postgres postgres 0 Nov 13 16:53 /u02/pgdata/12/base/12724/24577

Because there is noting in the table yet, the file is empty. PostgreSQL comes with a small utility called oid2name which also can be used to check the location on disk for a given relation:

[email protected]:/u02/pgdata/12/ [121] oid2name -t t1 -x
From database "postgres":
  Filenode  Table Name    Oid  Schema  Tablespace
     24577          t1  24577  public  pg_default

Once we populate the table with data the file will be growing and once it reaches the segment_site PostgreSQL will add another file to the relation (please notice that my segment_size is 2GB, instead of the default of 1GB):

postgres=# insert into t1 select * from generate_series(1,1000000);
INSERT 0 1000000
postgres=# \! ls -lha $PGDATA/base/12724/24577
-rw-------. 1 postgres postgres 35M Nov 13 17:03 /u02/pgdata/12/base/12724/24577
postgres=# insert into t1 select * from generate_series(1,100000000);
INSERT 0 100000000
postgres=# \! ls -la $PGDATA/base/12724/24577*
-rw-------. 1 postgres postgres 2147483648 Nov 13 17:07 /u02/pgdata/12/base/12724/24577
-rw-------. 1 postgres postgres 1513545728 Nov 13 17:08 /u02/pgdata/12/base/12724/24577.1
-rw-------. 1 postgres postgres     917504 Nov 13 17:07 /u02/pgdata/12/base/12724/24577_fsm

The “*_fsm” file is the free space map which will not be discussed here (it is used to track the available space in the data files).

Now that it is clear what happens on disk we’ll come back to the initial question of this post: Can it happen that files on disk do not belong to any relation? Consider this example: In one session we start a new transaction and create an empty table without committing the transaction. Additionally we get the PID of the session and the location on disk:

postgres=# begin;
postgres=# create table t2 ( a int );
postgres=# select pg_relation_filepath('t2');
(1 row)

postgres=# select * from pg_backend_pid();
(1 row)

Without committing we can already check the file on disk, as PostgreSQL already created it:

postgres=# \! ls -la $PGDATA/base/12724/24580
-rw-------. 1 postgres postgres 0 Nov 13 17:17 /u02/pgdata/12/base/12724/24580

If the server is crashing now, or the out of memory killer kicks in and kills the session: what happens then? We can simulate that quite easy by killing the session without giving it any chance to cleanup (which is signal 9):

[email protected]:/home/postgres/ [121] kill -9 7170

Going back to the psql session it lost the connection but then immediately reconnects. :

postgres-# select 1;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# select 1;
(1 row)

To summarize: The session was killed before the transaction was able to complete and the transaction already created a table. What must happen is, that the transaction is rolled back and the table must not be there:

postgres=# select * from t2;
ERROR:  relation "t2" does not exist
LINE 1: select * from t2;

This is fine and what is expected. But do we still see the file on disk?

[email protected]:/home/postgres/ [121] ls -lha $PGDATA/base/12724/24580
-rw-------. 1 postgres postgres 0 Nov 13 17:17 /u02/pgdata/12/base/12724/24580

Yes, now we really have an orphaned file which does not belong to any relation PostgreSQL knows of:

postgres=# select relname from pg_class where oid = '24580';
(0 rows)

So there might be cases when you are required to cleanup the files on disk. Image you did huge load like this and just before the load completes the session is killed:

postgres=# begin;
postgres=# create table t3 ( a int );
postgres=# select pg_relation_filepath('t3');
(1 row)

postgres=# select * from pg_backend_pid();
(1 row)

postgres=# insert into t3 select * from generate_series(1,10000000);
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

Now the file is not empty anymore and disk space is consumed that is for nothing:

[email protected]:/home/postgres/ [121] ls -lha $PGDATA/base/12724/32769
-rw-------. 1 postgres postgres 235M Nov 13 17:42 /u02/pgdata/12/base/12724/32769

In the worst case that could be gigabytes or terabytes of wasted space. Is there at least a way to detect such orphaned files? You would need to compare what PostgreSQL has recorded in the catalog against the file system and then delete everything PostgreSQL is not aware of and this needs to be done very, very carefully.

First of all you’d need to get the OID of the database you want to check:

postgres=# select oid from pg_database where datname = 'postgres';
(1 row)

Once you have that you know the location on disk, which is $PGDATA/base/[OID_OF_THE_DATABASE] (I am not considering tablespaces here). From here on you can list all the files that should belong to a relation (the regular expression is testing for numbers as only those are important here):

postgres=# select * 
             from pg_ls_dir ( '/u02/pgdata/12/base/12724' ) as file 
            where file  ~ '^[0-9]*$';


All of these should have entry in pg_class (otherwise PostgreSQL is not aware of them).

Finally, getting the list of orphaned files:

postgres=# select * from pg_ls_dir ( '/u02/pgdata/12/base/12724' ) as file where file ~ '^[0-9]*$' and file::text not in (select oid::text from pg_class );
(3 rows)

That would be the files you need to check carefully. If you are sure it is an orphaned file you may delete it (after you make a backup, of course).


  • Vaish says:

    Such a great post! I really enjoyed reading it. The examples at every point are wonderful.
    Whenever I notice this scenario, I usually just do pg_dump and restore the data from that. Because the orphaned data files don’t have an entry in pg_class, they will not be present in the resulting pg_dump script as well.
    Your method of finding out the orphaned file is very helpful. Thank you! 🙂

  • Hannu Krosing says:

    Great post with a very clear explanation.

    Just one small correction – the last query should use `pg_class.relfilenode` and not `pg_class.oid`

    While the two are the same initially, there are commands which change it, like `CLUSTER`, `VACUUM FULL`, various froms of `ALTER TABLE` etc. So if you start deleting based on oid, you may end up deleting live data.

    Another dangerous corner case to remember is ongoing data loads, CLUSTER commands and anything else similar – make sure that you are not deleting files that are currently being created as part of these. Maybe create a list of potentially orphaned files and re-check after all currently running transactions have finished.

    T make matters worse, the PostgreSQL documentation [*] also has a worrying comment saying “zero means this is a “mapped” relation whose disk file name is determined by low-level state” which makes me even more cautious to just delete anything not matching.



  • Hannu Krosing says:

    After a little more investigating I found out that the correct way to do it is to use `pg_relation_filenode(c.oid)` for finding the node name base oid, else some important files like those belonging to `pg_type` and some other system catalogs will show up as orphaned files.

    You definitely don’t want to delete these !

  • Chris Hamel says:

    I have been looking for this information for such a long time… fantastic.

  • kvaikla says:

    What about

    # for FILE in `ls -1 | grep ‘^[0-9]*$’`; do oid2name -d mydb -f $FILE; done

    Filenode Table Name
    is empty, then file is orphan

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure