A while ago I posted this tweet on Twitter:
#Postgresql friends: 8000 databases. One cluster, 8000 clusters, Kubernetes,, what would be your choice? Let's assume 10gb per database. I know backup/restore, inodes etc, no discussions about that, just thoughts and ideas
— Daniel Westermann (@westermanndanie) January 27, 2021
While this might sound unusual, this actually was under consideration by one of our customers. Each of their customers usually gets a dedicated database and is free to do in that database whatever they want. Because they can also create multiple schemas, a schema only solution was not the way to go. To be clear right now: We did not implement it like this, the reasons are in this post. The actual implementation will follow with a follow up post.
To see what it means to work with such many databases let’s initialize the cluster, nothing special here:
postgres@pg13:/home/postgres/ [pg131] initdb -D /u02/pgdata/13 The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale "C.UTF-8". The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". Data page checksums are disabled. creating directory /u02/pgdata/13 ... ok creating subdirectories ... ok selecting dynamic shared memory implementation ... posix selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting default time zone ... Etc/UTC creating configuration files ... ok running bootstrap script ... ok performing post-bootstrap initialization ... ok syncing data to disk ... ok initdb: warning: enabling "trust" authentication for local connections You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb. Success. You can now start the database server using: pg_ctl -D /u02/pgdata/13 -l logfile start
Starting up:
postgres@pg13:/home/postgres/ [pg131] pg_ctl -D /u02/pgdata/13/ start waiting for server to start....2021-02-10 20:51:19.878 UTC [15930] LOG: starting PostgreSQL 13.1 dbi services build on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit 2021-02-10 20:51:19.879 UTC [15930] LOG: listening on IPv4 address "127.0.0.1", port 5432 2021-02-10 20:51:19.883 UTC [15930] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2021-02-10 20:51:19.889 UTC [15931] LOG: database system was shut down at 2021-02-10 20:49:54 UTC 2021-02-10 20:51:19.895 UTC [15930] LOG: database system is ready to accept connections done server started postgres@pg13:/home/postgres/ [pg131] psql -c "select version()" postgres version ----------------------------------------------------------------------------------------------------------- PostgreSQL 13.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit (1 row)
Generating 8000 databases in PostgreSQL is quite easy, all you need to do is something like this:
postgres@pg13:/home/postgres/ [pg131] for i in {1..8000}; do psql -c "create database db_${i}" postgres; done CREATE DATABASE CREATE DATABASE CREATE DATABASE ... postgres@pg13:/home/postgres/ [pg131] psql -c "select count(*) from pg_database" postgres count ------- 8003 (1 row)
This will take some time but PostgreSQL quite happily creates the databases. You can actually create much more databases, if you really want, up to 4,294,950,911. Now that we have 8000 databases, let’s do some math. A single PostgreSQL database consumes around 8MB of disk space. As we have 8000 databases, that is around 64000MB of disk space, for just the empty databases:
postgres@pg13:/home/postgres/ [pg131] du -sh /u02/pgdata/13/ 60G /u02/pgdata/13/
This is quite some space, that could be used more efficiently. As all of these databases should be owned by a separate user, that makes 8000 users in addition:
postgres@pg13:/home/postgres/ [pg131] for i in {1..8000}; do psql -c "create user u_${i}" postgres; done postgres@pg13:/home/postgres/ [pg131] for i in {1..8000}; do psql -c "alter database db_${i} owner to u_${i}" postgres; done
Now lets assume each of those database contains 100 tables, 100 indexes and 100 sequences. Initially I wanted to do it like this, but this is actually a very bad idea:
postgres@pg13:/home/postgres/ [pg131] for i in {1..8000} > do > for e in {1..100} > do > psql -c "create table tab${e}(a int primary key generated always as identity, b text, c date)" db_${i} > done > done CREATE TABLE CREATE TABLE CREATE TABLE ...
Creating 800000 connections is catastrophic for performance, so a much better way of doing it, is like this (and this are still 8000 connections and it will take some time to complete):
postgres@pg13:/home/postgres/ [pg131] for e in {1..100} > do > echo "create table tab${e}(a int primary key generated always as identity, b text, c date);" >> /var/tmp/load.sql > done postgres@pg13:/home/postgres/ [pg131] for i in {1..8000} > do > psql -f /var/tmp/load.sql db_${i} > done CREATE TABLE CREATE TABLE CREATE TABLE ... postgres@pg13:/home/postgres/ [pg131] rm -f /var/tmp/load.sql
Once that completed, let’s check the space usage again:
postgres@pg13:/home/postgres/ [pg131] du -sh /u02/pgdata/13/ 88G /u02/pgdata/13/
We’ve added 28GB and do not have a single user generated row in any of these databases, this is quite huge. Only from this point of view, it does not seems a good idea to combine as many databases as in this example into a single cluster. But there is more to consider than space usage:
- When you have 8000 databases you need to expect 8000 concurrent sessions. How do you handle that, especially if each of those connections goes to a different database?
- Backup and Restore: Physical backup and restore only works for the whole cluster. Of course could restore the whole cluster to another system and then pick specific databases you are interested in, but this will take some time as well.
- Autovacuum has potentially to process many databases in parallel, so how many worker processes can you give?
- 8000 databases with 100 tables each are at least 80’0000 files on disk, not counting FSM, VM and index files. Depending on your file system and your segment size you might need to tweak the file system as well.
- Access permissions: Managing access permissions for at least 8000 users might become tricky. In addition pg_hba.conf gets large.
- I guess it all comes down to: Are you willing to invest into a monster machine with as many cores as currently are available on the market, which lots and lots of memory and the fastest disks you can get? And then you probably want at least two of them so you can failover in case of a disaster, or even three if you want to have automated failover? And then you need at least one test system as close to production as possible.