Infrastructure at your Service

Daniel Westermann

PostgreSQL 11: Instant add column with a non null default value

As I am currently preparing my session for the Swiss PGDay which is about some of the new features for PostgreSQL 11, I though this one is worth a blog post as well. Up to PostgreSQL 10 when you add a column to table which has a non null default value the whole table needed to be rewritten. With PostgreSQL 11 this is not anymore the case and adding a column in such a way is almost instant. Lets check.

We start by creating a test table in PostgreSQL 10:

postgres=# select version();
                                       version                                                           
--------------------------------------------------------------------------------------------------------
 PostgreSQL 10.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit
(1 row)

postgres=# create table test ( a int, b text, c timestamp );
CREATE TABLE
postgres=# insert into test (a,b,c) 
           select aa.*, md5(aa::text), now() 
             from generate_series ( 1, 1000000 ) aa;
INSERT 0 1000000

This gave us 1’000’000 rows and what I want to do is to check the amount of sequential scans against the table before and after the alter table.

postgres=# select seq_scan from pg_stat_user_tables where relid = 'test'::regclass;
 seq_scan 
----------
        0
postgres=# alter table test add column d text default 'a';
ALTER TABLE
Time: 1252.188 ms (00:01.252)
postgres=# select seq_scan from pg_stat_user_tables where relid = 'test'::regclass;
 seq_scan 
----------
        1

As you can see a sequential scan happened when the alter table was performed and it took more than a second for the alter table to complete. Lets do the same in PostgreSQL 11.

Creating the table:

postgres=# select version();
                                                            version                                                            
-------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 11devel dbi services build on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit
(1 row)
postgres=# create table test ( a int, b text, c timestamp );
CREATE TABLE
postgres=# insert into test (a,b,c) select aa.*, md5(aa::text), now() from generate_series ( 1, 1000000 ) aa ;
INSERT 0 1000000

Doing the same test again:

postgres=# select seq_scan from pg_stat_user_tables where relid = 'test'::regclass;
 seq_scan 
----------
        0
(1 row)

postgres=# alter table test add column d text default 'a';
ALTER TABLE
Time: 5.064 ms
postgres=# select seq_scan from pg_stat_user_tables where relid = 'test'::regclass;
 seq_scan 
----------
        0
(1 row)

No sequential scan at all and it only took 5 ms for the alter table to complete. This is quite a huge improvement. The question is how does that work in the background? Actually the idea is quite simple. The catalog table pg_attribute got two new columns called “attmissingval” and “atthasmissing”:

postgres=# \d pg_attribute
              Table "pg_catalog.pg_attribute"
    Column     |   Type    | Collation | Nullable | Default 
---------------+-----------+-----------+----------+---------
 attrelid      | oid       |           | not null | 
 attname       | name      |           | not null | 
 atttypid      | oid       |           | not null | 
 attstattarget | integer   |           | not null | 
 attlen        | smallint  |           | not null | 
 attnum        | smallint  |           | not null | 
 attndims      | integer   |           | not null | 
 attcacheoff   | integer   |           | not null | 
 atttypmod     | integer   |           | not null | 
 attbyval      | boolean   |           | not null | 
 attstorage    | "char"    |           | not null | 
 attalign      | "char"    |           | not null | 
 attnotnull    | boolean   |           | not null | 
 atthasdef     | boolean   |           | not null | 
 atthasmissing | boolean   |           | not null | 
 attidentity   | "char"    |           | not null | 
 attisdropped  | boolean   |           | not null | 
 attislocal    | boolean   |           | not null | 
 attinhcount   | integer   |           | not null | 
 attcollation  | oid       |           | not null | 
 attacl        | aclitem[] |           |          | 
 attoptions    | text[]    |           |          | 
 attfdwoptions | text[]    |           |          | 
 attmissingval | anyarray  |           |          | 
Indexes:
    "pg_attribute_relid_attnam_index" UNIQUE, btree (attrelid, attname)
    "pg_attribute_relid_attnum_index" UNIQUE, btree (attrelid, attnum)

As soon as a new column with a non null default value is added to a table these columns get populated. We can see that when we check for our current table. The column we added has that set in pg_attribute:

postgres=# select attmissingval
                , atthasmissing 
             from pg_attribute 
            where attrelid = 'test'::regclass 
              and attname = 'd';

 attmissingval | atthasmissing 
---------------+---------------
 {a}           | t

(1 row)

We know that all the rows in that table should have the new default value but we know also that the table was not rewritten. So whenever you select from the table and a row is missing the default it will be populated from pg_attribute:

postgres=# select d from test where a = 1;
 d 
---
 a

For new rows the default will be there anyway and as soon as the table is rewritten the flag is cleared:

postgres=# vacuum full test;
VACUUM
postgres=# select attmissingval
                , atthasmissing 
             from pg_attribute 
            where attrelid = 'test'::regclass 
              and attname = 'd';

 attmissingval | atthasmissing 
---------------+---------------
               | f
(1 row)

Nice feature.

Leave a Reply

Daniel Westermann
Daniel Westermann

Senior Consultant and Technology Leader Open Infrastructure