Infrastructure at your Service

Daniel Westermann

Can I do it with PostgreSQL? – 8 – Transportable tablespaces

My colleague Franck posted a comment to one of my last “Can I do it with PostgreSQL” blog posts. The comment he posted is: “Here is an idea for a future “Can I do it with PostgreSQL?”. My favorite Oracle feature: transportable tablespaces.” When you’ve read my post about how PostgreSQL implements tablespaces then you probably already know the answer: No, you can not do this in PostgreSQL. Having thought about this some time I decided to check the Oracle documentation for what transportable tablespaces are good for. I know the basics and how you can do it in Oracle, but better check twice 🙂

According to the documentation the main goal of transportable tablespaces is to transport data. This means you can transport one or more (which then becomes a transportable tablespace set) tablespaces from one host to another and then “plug” that set into an existing database (as long as the tablespaces are self containing). One, depending on the use case, great thing you can do with it is that the target database does not need to have the same standard block size as the source database. This means you can transport a tablespace space with a 16k block size to a database with a default block size of 8k. This is another thing you can not do in PostgreSQL: In PostgreSQL the block size is configured at configure/compile time. Once you have the compiled binaries you can not change that afterwards.

Probably the greatest benefit of transportable tablespaces is that it saves you time in moving your data around. You just copy the data files that make up your tablespace(s) and then use expdp/impdp for the meta data, that’s it. When you go for the multi-tenant architecture you can use transportable tablespaces to make a non-CDB a pluggable database on the target, too. For a more detailed introduction you can check the documentation (linked above).

Back to PostgreSQL: What options do you have for transporting your data from one database to another?

The tool of choice for most cases probably is pg_dump. The big advantage of pg_dump is that you can use it over the network and directly write everything you want to export into the target database using a pipe:

pg_dump -C -h [SOURCE_HOST] -U [USER] [DATABASE] | psql -h [TARGETHOST] -U [TARGETUSER] [TARGETDATABASE]

You can even do that using parallel processes when you combine pg_dump with pg_restore although using a pipe is not supported in that case. To demonstrate this we’ll need some sample data (I am using PostgreSQL 9.5.4 as the source):

[email protected]:/home/postgres/ [PG954] psql
psql (9.5.4 dbi services build)
Type "help" for help.

([email protected][local]:5438) [postgres] > create database test;
CREATE DATABASE
([email protected][local]:5438) [postgres] > \c test
You are now connected to database "test" as user "postgres".
([email protected][local]:5438) [test] > create table t1 ( a int, b int );
CREATE TABLE
([email protected][local]:5438) [test] > create table t2 ( a int, b int );
CREATE TABLE
([email protected][local]:5438) [test] > insert into t1 values (generate_series(1,1000000), generate_series(1,1000000));
INSERT 0 1000000
([email protected][local]:5438) [test] > insert into t2 select * from t1;
INSERT 0 1000000

Lets say I want to copy the data to a PostgreSQL 9.6.1 instance on the same host. How can I do that? Quite easy:

[email protected]:/home/postgres/ [PG961] pg_dump -V
pg_dump (PostgreSQL) 9.6.1 dbi services build
[email protected]:/home/postgres/ [PG961] mkdir /var/tmp/exp
[email protected]:/home/postgres/ [PG961] pg_dump -h localhost -p 5438 -C -c -F d -j 2 -f /var/tmp/exp/ test
[email protected]:/home/postgres/ [PG961] pg_restore -h localhost -p 5439 -d postgres -F d -C /var/tmp/exp/
[email protected]:/home/postgres/ [PG961] psql -c "\l" postgres
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 test      | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
(4 rows)

[email protected]:/home/postgres/ [PG961] psql -c "\d" test
        List of relations
 Schema | Name | Type  |  Owner   
--------+------+-------+----------
 public | t1   | table | postgres
 public | t2   | table | postgres
(2 rows)

Not a big deal and possible over the network, too.

Another option would be to use pg_basebackup to create a complete new instance from the source (source and target need to be of the same PostgreSQL release in this case). Again this is possible over the network.

You can even use rsync to copy the whole cluster to a new host and then do a second resync while the source is down for a short time. This will copy only the files that changed since the first rsync and will probably be very fast, but you will need a downtime of the source for the second rsync.

There are other methods for moving your data around in PostgreSQL but the above are the most popular.

Hope this helps …

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure