Infrastructure at your Service

There often is a discussion if tablespaces should be used in PostgreSQL. People used to work with Oracle are often a bit surprised by this discussion as there is no way around using tablespaces in that technology. PostgreSQL does not force you to use tablespaces at all and I am pretty sure that there are plenty of PostgreSQL installations out there that do not use them and never even considered using them and that is absolutely fine. In our PostgreSQL basics workshop we cover tablespaces but recommend not to use them if there are no good reasons. Implementing tablespaces in PostgreSQL just because you are forced to use them in other databases is not the way to go. In this post I’d like to talk about one reason where tablespaces can make sense, more precisely: temp_tablespaces.

As the name implies, temporary tablespaces are there for temporary objects. The question is then, what exactly is considered as a temporary object? Well, the obvious case is temporary tables. Without any temporary tablespaces, when you create a temporary table, the temporary files get created in the default tablespace of the current database. In a standard PostgreSQL setup it looks like this:

postgres=# \l+
                                                                    List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   |  Size   | Tablespace |                Description                 
-----------+----------+----------+-------------+-------------+-----------------------+---------+------------+--------------------------------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                       | 872 MB  | pg_default | default administrative connection database
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 7393 kB | pg_default | unmodifiable empty database
           |          |          |             |             | postgres=CTc/postgres |         |            | 
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 7393 kB | pg_default | default template for new databases
           |          |          |             |             | postgres=CTc/postgres |         |            | 

All the databases have a default tablespace called “pg_default” which is a kind of pseudo tablespace as it does not really exist. Asking the catalog about the location of that tablespace will show an empty location:

postgres=# select spcname AS "Name"
postgres-#      , pg_catalog.pg_get_userbyid(spcowner) AS "Owner"
postgres-#      , pg_catalog.pg_tablespace_location(oid) AS "Location"
postgres-#   from pg_catalog.pg_tablespace
postgres-#  where pg_catalog.pg_tablespace.spcname = 'pg_default'
postgres-#  order by 1;
    Name    |  Owner   | Location 
------------+----------+----------
 pg_default | postgres | 
(1 row)

If we create temporary objects, where will the files be created then?

postgres=# create temporary table tmp1 ( a int, b text, c date );
CREATE TABLE
postgres=# select pg_relation_filepath('tmp1');
 pg_relation_filepath 
----------------------
 base/12732/t3_16436
(1 row)

This is the standard directory of my “postgres” database:

[email protected]:/home/postgres/ [pgdev] cd $PGDATA
[email protected]:/u02/pgdata/DEV/ [pgdev] oid2name 
All databases:
    Oid  Database Name  Tablespace
----------------------------------
  12732       postgres  pg_default
  12731      template0  pg_default
      1      template1  pg_default
[email protected]:/u02/pgdata/DEV/ [pgdev] ls -l base/12732/t3_16436
-rw-------. 1 postgres postgres 0 Mar 12 18:17 base/12732/t3_16436

So, by default, files required for temporary tables go to the same location as all the other files that make up the specific database. If we populate the temporary table the files will grow, of course:

postgres=# insert into tmp1 (a,b,c) select i,i::text,now() from generate_series(1,100) i;
INSERT 0 100
postgres=# \! ls -la $PGDATA/base/12732/t3_16436
-rw-------. 1 postgres postgres 8192 Mar 12 18:41 /u02/pgdata/DEV/base/12732/t3_16436
postgres=# insert into tmp1 (a,b,c) select i,i::text,now() from generate_series(1,1000) i;
INSERT 0 1000
postgres=# \! ls -la $PGDATA/base/12732/t3_16436
-rw-------. 1 postgres postgres 49152 Mar 12 18:42 /u02/pgdata/DEV/base/12732/t3_16436
postgres=# 

The consequence of that is, that I/O for temporary tables will compete with I/O for all the other objects in this PostgreSQL cluster. Another consequence is, that temporary tables can potentially fill your file system until it is full and no one will be be able to work from then on. This is the first reason for creating one or more dedicated temporary tablespaces: By doing this you can avoid that temporary tables going crazy impact your whole cluster as long as the temporary tablespace is on it’s own file system.

Creating a temporary tablespace is not different from creating a normal tablespace as it is actually exactly the same:

postgres=# \! mkdir /var/tmp/tbstmp
postgres=# create tablespace tbstmp location '/var/tmp/tbstmp';
CREATE TABLESPACE
postgres=# \db+
                                      List of tablespaces
    Name    |  Owner   |    Location     | Access privileges | Options |  Size   | Description 
------------+----------+-----------------+-------------------+---------+---------+-------------
 pg_default | postgres |                 |                   |         | 886 MB  | 
 pg_global  | postgres |                 |                   |         | 575 kB  | 
 tbstmp     | postgres | /var/tmp/tbstmp |                   |         | 0 bytes | 
(3 rows)

Once we have the new tablespace we can tell PostgreSQL to use it as the default for temporary objects:

postgres=# alter system set temp_tablespaces = 'tbstmp';
ALTER SYSTEM
postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

postgres=# show temp_tablespaces ;
 temp_tablespaces 
------------------
 tbstmp
(1 row)

Creating another temporary table will result in the files being created in the new location:

postgres=# create temporary table tmp2 ( a int, b text, c date );
CREATE TABLE
postgres=# select pg_relation_filepath('tmp2');
              pg_relation_filepath              
------------------------------------------------
 pg_tblspc/16442/PG_13_202003051/12732/t3_16443
(1 row)

postgres=# \! ls -la $PGDATA/pg_tblspc/
total 4
drwx------.  2 postgres postgres   19 Mar 12 18:50 .
drwx------. 20 postgres postgres 4096 Mar 12 18:54 ..
lrwxrwxrwx.  1 postgres postgres   15 Mar 12 18:50 16442 -> /var/tmp/tbstmp
postgres=# \! ls -la $PGDATA/pg_tblspc/16442/
total 0
drwx------. 3 postgres postgres  29 Mar 12 18:50 .
drwxrwxrwt. 7 root     root     163 Mar 12 18:49 ..
drwx------. 3 postgres postgres  19 Mar 12 18:53 PG_13_202003051
postgres=# \! ls -la $PGDATA/pg_tblspc/16442/PG_13_202003051/
total 0
drwx------. 3 postgres postgres 19 Mar 12 18:53 .
drwx------. 3 postgres postgres 29 Mar 12 18:50 ..
drwx------. 2 postgres postgres 54 Mar 12 18:53 12732
postgres=# \! ls -la $PGDATA/pg_tblspc/16442/PG_13_202003051/12732/
total 8
drwx------. 2 postgres postgres   54 Mar 12 18:53 .
drwx------. 3 postgres postgres   19 Mar 12 18:53 ..
-rw-------. 1 postgres postgres    0 Mar 12 18:53 t3_16443
-rw-------. 1 postgres postgres    0 Mar 12 18:53 t3_16446
-rw-------. 1 postgres postgres 8192 Mar 12 18:53 t3_16448

If you are wondering why there are three files, here is the answer:

postgres=# select relname from pg_class where oid in (16443,16446,16448);
       relname        
----------------------
 pg_toast_16443
 pg_toast_16443_index
 tmp2
(3 rows)

The toast objects get created as well as I have a “text” column in my temporary table. Creating a temporary table with data types that do not require toast objects will result in one file only:

postgres=# create temporary table tmp3 ( a int, b date );
CREATE TABLE
postgres=# select pg_relation_filepath('tmp3');
              pg_relation_filepath              
------------------------------------------------
 pg_tblspc/16442/PG_13_202003051/12732/t3_16449
(1 row)
postgres=# \! ls -la $PGDATA/pg_tblspc/16442/PG_13_202003051/12732/
total 8
drwx------. 2 postgres postgres   70 Mar 12 19:07 .
drwx------. 3 postgres postgres   19 Mar 12 18:53 ..
-rw-------. 1 postgres postgres    0 Mar 12 18:53 t3_16443
-rw-------. 1 postgres postgres    0 Mar 12 18:53 t3_16446
-rw-------. 1 postgres postgres 8192 Mar 12 18:53 t3_16448
-rw-------. 1 postgres postgres    0 Mar 12 19:07 t3_16449

So for now we know that all temporary tables will go to the new temporary table space. What else will go there from now on. There is a parameter log_temp_files which can be used to report temp file usage into the PostgreSQL log file and this comes quite handy if you want to know what goes there. The default setting is “-1” which means of, “0” means log everything, all other values greater than “1” specify the minimum size of the temp files for being logged. Setting it to “0” will, as said, log all temp files being created in the background so let’s do that:

postgres=# alter system set log_temp_files = 0;
ALTER SYSTEM
postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

Creating another temporary table and then checking the log file will confirm that this is working and we’ll get the information we want:

postgres=# create temporary table tmp4 as select * from generate_series(1,1000000);
SELECT 1000000

The entry in the log file will look like this and it confirms that the temporary files have been written to the temporary tablespsace we created above:

2020-03-13 02:33:35.658 CET - 1 - 10535 - [local] - [email protected] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10535.0", size 14000000
2020-03-13 02:33:35.658 CET - 2 - 10535 - [local] - [email protected] STATEMENT:  create temporary table tmp4 as select * from generate_series(1,1000000);

But: If you create, with the default configuration of PostgreSQL, this temporary table:

postgres=# create temporary table tmp4 as select * from generate_series(1,100000);
SELECT 10000

… you will not see any lines in the log file for this. Why? Because there is temp_buffers and temporary files will be only be reported in the log file if they exceed the value of this parameter. In the default configuration this is ‘8MB’ and that is not enough for the smaller temporary table to be logged. Decreasing the parameter will log the temporary files for the smaller table as well:

postgres=# set temp_buffers = '1024kB';
SET
postgres=# create temporary table tmp5 as select * from generate_series(1,100000);
SELECT 100000

So now we know how to log the creation of temporary files to the PostgreSQL log file. What other operations will cause temporary file to be created? Sorts?

postgres=# set work_mem = '64kB';
SET
postgres=# select * from generate_series(1,1000000) order by random();

Yes, definitely:

2020-03-13 02:47:14.297 CET - 19 - 10609 - [local] - [email protected] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.2", size 26083328
2020-03-13 02:47:14.297 CET - 20 - 10609 - [local] - [email protected] STATEMENT:  select * from generate_series(1,1000000) order by random();
2020-03-13 02:47:14.298 CET - 21 - 10609 - [local] - [email protected] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.1", size 14000000
2020-03-13 02:47:14.298 CET - 22 - 10609 - [local] - [email protected] STATEMENT:  select * from generate_series(1,1000000) order by random();
2020-03-13 02:47:14.298 CET - 23 - 10609 - [local] - [email protected] LOG:  duration: 2994.386 ms  statement: select * from generate_series(1,1000000) order by random();

What about creating indexes?

postgres=# create table tt1 as select * from generate_series(1,1000000);
SELECT 1000000
postgres=# create index ii1 on tt1(generate_series );
CREATE INDEX

Yes, that creates temporary files as well:

2020-03-13 02:54:00.933 CET - 33 - 10609 - [local] - [email protected] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.0.sharedfileset/0.0", size 12394496
2020-03-13 02:54:00.933 CET - 34 - 10609 - [local] - [email protected] STATEMENT:  create index ii1 on tt1(generate_series );
2020-03-13 02:54:00.934 CET - 35 - 10609 - [local] - [email protected] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.0.sharedfileset/1.0", size 7659520
2020-03-13 02:54:00.934 CET - 36 - 10609 - [local] - [email protected] STATEMENT:  create index ii1 on tt1(generate_series );
2020-03-13 02:54:00.948 CET - 37 - 10609 - [local] - [email protected] LOG:  duration: 1149.625 ms  statement: create index ii1 on tt1(generate_series );

Foreign keys?

postgres=# create table ttt1 as select * from generate_series(1,1000000) a;
SELECT 1000000
postgres=# create unique index iii1 on ttt1(a);
CREATE INDEX
postgres=# insert into ttt2 select a,a from generate_series(1,1000000) a;
INSERT 0 1000000
postgres=# alter table ttt2 add constraint fk_t foreign key (b) references ttt1(a);
ALTER TABLE

Yes, that as well:

2020-03-13 03:01:07.127 CET - 65 - 10609 - [local] - [email protected] LOG:  duration: 1127.768 ms  statement: alter table ttt2 add constraint fk_t foreign key (b) references ttt1(a);
2020-03-13 03:01:15.375 CET - 66 - 10609 - [local] - [email protected] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.6", size 67374

So quite a few operations that generate temporary files in background. Separating this on a separate mount point actually can make a lot of sense. From a performance perspective (if I/O spread on the storage layer as well) but also from a security perspective as huge operations that require temporary files will not affect “normal” operations on the instance.

There is another case generating temporary files which is not maybe not clear to everybody. Consider this:

postgres=# create materialized view mv1 as select ttt1.a, ttt2.b from ttt1, ttt2 where ttt1.a = ttt2.b;
SELECT 1000000

This will create many temporary files in the background as well:

...
2020-03-13 03:11:03.721 CET - 4146 - 10609 - [local] - [email protected] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.1289", size 22704
2020-03-13 03:11:03.721 CET - 4147 - 10609 - [local] - [email protected] STATEMENT:  create materialized view mv1 as select ttt1.a, ttt2.b from ttt1, ttt2 where ttt1.a = ttt2.b;
2020-03-13 03:11:03.722 CET - 4148 - 10609 - [local] - [email protected] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.143", size 23136
2020-03-13 03:11:03.722 CET - 4149 - 10609 - [local] - [email protected] STATEMENT:  create materialized view mv1 as select ttt1.a, ttt2.b from ttt1, ttt2 where ttt1.a = ttt2.b;
2020-03-13 03:11:03.723 CET - 4150 - 10609 - [local] - [email protected] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.1166", size 23136
2020-03-13 03:11:03.723 CET - 4151 - 10609 - [local] - [email protected] STATEMENT:  create materialized view mv1 as select ttt1.a, ttt2.b from ttt1, ttt2 where ttt1.a = ttt2.b;
2020-03-13 03:11:03.723 CET - 4152 - 10609 - [local] - [email protected] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.792", size 23640
...

And even refreshes consume temporary files:

postgres=# create unique index mv_i1 on mv1(a);
CREATE INDEX
postgres=# refresh materialized view concurrently mv1;
REFRESH MATERIALIZED VIEW
postgres=# 

From the log file:

...
2020-03-13 03:14:05.866 CET - 20543 - 10609 - [local] - [email protected] STATEMENT:  refresh materialized view concurrently mv1;
2020-03-13 03:14:05.866 CET - 20544 - 10609 - [local] - [email protected] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.8193", size 26768
2020-03-13 03:14:05.866 CET - 20545 - 10609 - [local] - [email protected] CONTEXT:  SQL statement "CREATE TEMP TABLE pg_temp_3.pg_temp_16513_2 AS SELECT mv.ctid AS tid, newdata FROM public.mv1 mv FULL JOIN pg_temp_3.pg_temp_16513 newdata ON (newdata.a OPERATOR(pg_catalog.=) mv.a AND newdata OPERATOR(pg_catalog.*=) mv) WHERE newdata IS NULL OR mv IS NULL ORDER BY tid"
2020-03-13 03:14:05.866 CET - 20546 - 10609 - [local] - [email protected] STATEMENT:  refresh materialized view concurrently mv1;
2020-03-13 03:14:05.866 CET - 20547 - 10609 - [local] - [email protected] LOG:  temporary file: path "pg_tblspc/16442/PG_13_202003051/pgsql_tmp/pgsql_tmp10609.6147", size 28487
2020-03-13 03:14:05.866 CET - 20548 - 10609 - [local] - [email protected] CONTEXT:  SQL statement "CREATE TEMP TABLE pg_temp_3.pg_temp_16513_2 AS SELECT mv.ctid AS tid, newdata FROM public.mv1 mv FULL JOIN pg_temp_3.pg_temp_16513 newdata ON (newdata.a OPERATOR(pg_catalog.=) mv.a AND newdata OPERATOR(pg_catalog.*=) mv) WHERE newdata IS NULL OR mv IS NULL ORDER BY tid"
...

There are more operations that require temporary files in the background but for the scope of this post we stop here. One last thing: The amount of temporary files generated can also be limited by temp_file_limit:

postgres=# set temp_file_limit='1MB';
SET
postgres=# refresh materialized view concurrently mv1;
ERROR:  temporary file size exceeds temp_file_limit (1024kB)
postgres=# 

One Comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure