Infrastructure at your Service

Daniel Westermann

Why is the default postgres database slightly larger than template1?

You probably know that a fresh initialized PostgreSQL cluster comes with three databases by default: template0, template1, and postgres. If you want to know why they are there, and what their purpose is, check this post. In this post we’ll look at something you may never have noticed: Why is the postgres database, even after a fresh initdb, slightly larger than template1 and template0?

Let’s start from scratch and initialize a new cluster:

[email protected]:/home/postgres/ [pg14] initdb -D /var/tmp/dummy
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /var/tmp/dummy ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Europe/Zurich
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    pg_ctl -D /var/tmp/dummy -l logfile start

Once we start start that up and ask for the size of the databases, you’ll notice the difference:

[email protected]:/home/postgres/ [pg14] export PGPORT=8888
[email protected]:/home/postgres/ [pg14] pg_ctl -D /var/tmp/dummy start
waiting for server to start....2021-05-20 11:04:49.527 CEST [2708] LOG:  starting PostgreSQL 14devel on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2021-05-20 11:04:49.528 CEST [2708] LOG:  listening on IPv6 address "::1", port 8888
2021-05-20 11:04:49.528 CEST [2708] LOG:  listening on IPv4 address "127.0.0.1", port 8888
2021-05-20 11:04:49.541 CEST [2708] LOG:  listening on Unix socket "/tmp/.s.PGSQL.8888"
2021-05-20 11:04:49.557 CEST [2709] LOG:  database system was shut down at 2021-05-20 11:03:23 CEST
2021-05-20 11:04:49.566 CEST [2708] LOG:  database system is ready to accept connections
 done
server started

[email protected]:/home/postgres/ [pg14] psql -c "\l+"
                                                                    List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   |  Size   | Tablespace |                Description                 
-----------+----------+----------+-------------+-------------+-----------------------+---------+------------+--------------------------------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                       | 8265 kB | pg_default | default administrative connection database
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 8113 kB | pg_default | unmodifiable empty database
           |          |          |             |             | postgres=CTc/postgres |         |            | 
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 8113 kB | pg_default | default template for new databases
           |          |          |             |             | postgres=CTc/postgres |         |            | 
(3 rows)

The results may differ in your environment, depending on what version of PostgreSQL you are using. This is 14devel, but it doesn’t really matter: You will see that the “postgres” database is a bit larger than the other two. But how can this be? The first database which is created by initdb, is template1. The other two are just copies of it. But then they should all have the same size, shouldn’t they?

To understand what’s going on here, we first need to know how the database size is calculated, when you use the “\l+” shortcut. This is easy to check:

postgres=# \set ECHO_HIDDEN on
postgres=# \l+
********* QUERY **********
SELECT d.datname as "Name",
       pg_catalog.pg_get_userbyid(d.datdba) as "Owner",
       pg_catalog.pg_encoding_to_char(d.encoding) as "Encoding",
       d.datcollate as "Collate",
       d.datctype as "Ctype",
       pg_catalog.array_to_string(d.datacl, E'\n') AS "Access privileges",
       CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
            THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))
            ELSE 'No Access'
       END as "Size",
       t.spcname as "Tablespace",
       pg_catalog.shobj_description(d.oid, 'pg_database') as "Description"
FROM pg_catalog.pg_database d
  JOIN pg_catalog.pg_tablespace t on d.dattablespace = t.oid
ORDER BY 1;
**************************

                                                                    List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   |  Size   | Tablespace |                Description                 
-----------+----------+----------+-------------+-------------+-----------------------+---------+------------+--------------------------------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                       | 8265 kB | pg_default | default administrative connection database
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 8113 kB | pg_default | unmodifiable empty database
           |          |          |             |             | postgres=CTc/postgres |         |            | 
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 8113 kB | pg_default | default template for new databases
           |          |          |             |             | postgres=CTc/postgres |         |            | 
(3 rows)

The statement that is used in the background uses the pg_database_size administrative function to calculate the size for each database. But what is pg_database_size actually doing? Time to check the source. pg_database_size is defined in “src/backend/utils/adt/dbsize.c”:

Datum
pg_database_size_oid(PG_FUNCTION_ARGS)
{
        Oid                     dbOid = PG_GETARG_OID(0);
        int64           size;

        size = calculate_database_size(dbOid);

        if (size == 0)
                PG_RETURN_NULL();

        PG_RETURN_INT64(size);
}

This in fact calls “calculate_database_size”, which is defined in the same file. If you check this one you’ll see this:

...
        /* Shared storage in pg_global is not counted */

        /* Include pg_default storage */
        snprintf(pathname, sizeof(pathname), "base/%u", dbOid);
        totalsize = db_dir_size(pathname);

        /* Scan the non-default tablespaces */
        snprintf(dirpath, MAXPGPATH, "pg_tblspc");
        dirdesc = AllocateDir(dirpath);
...

All it does is to calculate the size of the directory in db_dir_size (and any tablespaces if there are any):

/* Return physical size of directory contents, or 0 if dir doesn't exist */
static int64
db_dir_size(const char *path)
...
        while ((direntry = ReadDir(dirdesc, path)) != NULL)
        {

It is looping over all the files and then sums up the size of the files it finds there. Nothing is excluded as you can easily verify:

[email protected]:/home/postgres/ [pg14] cd /var/tmp/dummy/base/
[email protected]:/var/tmp/dummy/base/ [pg14] oid2name 
All databases:
    Oid  Database Name  Tablespace
----------------------------------
  13974       postgres  pg_default
  13973      template0  pg_default
      1      template1  pg_default
[email protected]:/var/tmp/dummy/base/ [pg14] cd 13974
[email protected]:/var/tmp/dummy/base/13974/ [pg14] dd if=/dev/zero of=dummy bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.0924525 s, 1.1 GB/s

[email protected]:/var/tmp/dummy/base/13974/ [pg14] psql -c "\l+"
                                                                    List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   |  Size   | Tablespace |                Description                 
-----------+----------+----------+-------------+-------------+-----------------------+---------+------------+--------------------------------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                       | 108 MB  | pg_default | default administrative connection database
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 8113 kB | pg_default | unmodifiable empty database
           |          |          |             |             | postgres=CTc/postgres |         |            | 
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 8113 kB | pg_default | default template for new databases
           |          |          |             |             | postgres=CTc/postgres |         |            | 
(3 rows)

[email protected]:/var/tmp/dummy/base/13974/ [pg14] rm dummy 

All I did here, is to create a file a bit larger than 100MB in the directory of the “postgres” database, and asked for the database size once more. As you can see above, the size of this file is counted.

What information can we get out of this? The difference between the “postgres” and the “template0” and “template1” databases must come from the files on disk. Let’s look at the directories:

[email protected]:/home/postgres/ [pg14] du -s /var/tmp/dummy/base/13974/
8280    /var/tmp/dummy/base/13974/   ## postgres
[email protected]:/home/postgres/ [pg14] du -s /var/tmp/dummy/base/1/
8124    /var/tmp/dummy/base/1/       ## template1

Same picture here: “postgres” is larger than “template1” but what is the difference. Actually the difference can easily be spotted:

[email protected]:/home/postgres/ [pg14] ls -ltr /var/tmp/dummy/base/13974/ | tail -3    # postgres
-rw------- 1 postgres postgres   8192 May 20 11:03 1247_vm
-rw------- 1 postgres postgres 114688 May 20 11:03 1247
-rw------- 1 postgres postgres 156486 May 20 11:06 pg_internal.init
[email protected]:/home/postgres/ [pg14] ls -ltr /var/tmp/dummy/base/1/ | tail -3        # template1
-rw------- 1 postgres postgres  24576 May 20 11:03 13792_fsm
-rw------- 1 postgres postgres  65536 May 20 11:03 13792
-rw------- 1 postgres postgres 106496 May 20 11:03 1259

To verify that this file really is the reason: Stop the instance, delete the file and compare again:

[email protected]:/home/postgres/ [pg14] pg_ctl -D /var/tmp/dummy/ stop
waiting for server to shut down.... done
server stopped
[email protected]:/home/postgres/ [pg14] rm /var/tmp/dummy/base/13974/pg_internal.init
[email protected]:/home/postgres/ [pg14] du -s /var/tmp/dummy/base/13974/
8124    /var/tmp/dummy/base/13974/
[email protected]:/home/postgres/ [pg14] du -s /var/tmp/dummy/base/1/
8124    /var/tmp/dummy/base/1/

Exactly the same now. Once you start the instance again, the file will be re-created:

[email protected]:/home/postgres/ [pg14] pg_ctl -D /var/tmp/dummy/ start
[email protected]:/home/postgres/ [pg14] ls -ltr /var/tmp/dummy/base/13974/ | tail -3
-rw------- 1 postgres postgres 770048 May 20 11:03 1255
-rw------- 1 postgres postgres   8192 May 20 11:03 1247_vm
-rw------- 1 postgres postgres 114688 May 20 11:03 1247
[email protected]:/home/postgres/ [pg14] psql -c "\l+"
                                                                    List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   |  Size   | Tablespace |                Description                 
-----------+----------+----------+-------------+-------------+-----------------------+---------+------------+--------------------------------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                       | 8265 kB | pg_default | default administrative connection database
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 8113 kB | pg_default | unmodifiable empty database
           |          |          |             |             | postgres=CTc/postgres |         |            | 
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 8113 kB | pg_default | default template for new databases
           |          |          |             |             | postgres=CTc/postgres |         |            | 
(3 rows)

[email protected]:/home/postgres/ [pg14] ls -ltr /var/tmp/dummy/base/13974/ | tail -3
-rw------- 1 postgres postgres   8192 May 20 11:03 1247_vm
-rw------- 1 postgres postgres 114688 May 20 11:03 1247
-rw------- 1 postgres postgres 156486 May 20 11:45 pg_internal.init

Last question for today: What is that file for? The answer is, once more, in the source code (src/include/utils/relcache.h):

/*
 * Name of relcache init file(s), used to speed up backend startup
 */
#define RELCACHE_INIT_FILENAME  "pg_internal.init"

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure