Infrastructure at your Service

Daniel Westermann

Converting a column from one data type to another in PostgreSQL

Last week at a customer the following question came up: We have a varchar2 column in Oracle that contains a date but actually is stored as a string. When we want to convert this column to be a real date data type in Oracle we can use dbms_redefinition. How can we do that in PostgreSQL? The first answer that came to me mind is: Add a new column (of data type date) and populate it with the converted string from the source column, drop the source column and rename the new column to the name of the dropped column. This for sure will work but it is not the most convenient way: What happens if the application is somehow dependent on the order of the columns? Shouldn’t be the case if the application is written well, but you never know.

To start with lets generate some test data:

drop table if exists t1;
create table t1 ( a varchar(20) );
insert into t1 (a) values ('01012017');
insert into t1 (a) values ('02012017');
insert into t1 (a) values ('03012017');
insert into t1 (a) values ('04012017');
insert into t1 (a) values ('05012017');
insert into t1 (a) values ('06012017');
insert into t1 (a) values ('07012017');
insert into t1 (a) values ('08012017');
insert into t1 (a) values ('09012017');
insert into t1 (a) values ('10012017');
insert into t1 (a) values ('11012017');
([email protected][local]:5440) [postgres] > select * from t1;
    a     
----------
 01012017
 02012017
 03012017
 04012017
 05012017
 06012017
 07012017
 08012017
 09012017
 10012017
 11012017
(11 rows)

We now have a varchar column holding the date values as a string. When you look at the “alter table” command in PostgtreSQL you’ll notice something like this:

where action is one of:

    ADD [ COLUMN ] [ IF NOT EXISTS ] column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
    DROP [ COLUMN ] [ IF EXISTS ] column_name [ RESTRICT | CASCADE ]
    ALTER [ COLUMN ] column_name [ SET DATA ] TYPE data_type [ COLLATE collation ] [ USING expression ]

What will really help here is the “using” keyword because you can do things like this:

([email protected][local]:5440) [postgres] > alter table t1 alter column a type date using to_date(a,'DDMMYYYY');
ALTER TABLE
([email protected][local]:5440) [postgres] > \d t1
     Table "public.t1"
 Column | Type | Modifiers 
--------+------+-----------
 a      | date | 

That’s really cool: You can pass a function (this mean a self written function as well) to the alter command to specify on how you want to do the conversion.

When there is an index on the column, what happens to the index?

([email protected][local]:5440) [postgres] > \d t1
             Table "public.t1"
 Column |         Type          | Modifiers 
--------+-----------------------+-----------
 a      | character varying(20) | 
Indexes:
    "i1" btree (a)
([email protected][local]:5440) [postgres] > alter table t1 alter column a type date using to_date(a,'DDMMYYYY');
ALTER TABLE
Time: 5.931 ms
([email protected][local]:5440) [postgres] > \d t1
     Table "public.t1"
 Column | Type | Modifiers 
--------+------+-----------
 a      | date | 
Indexes:
    "i1" btree (a)
([email protected][local]:5440) [postgres] > select indisvalid,indisready,indislive from pg_index where indexrelid = 'i1'::regclass; 
 indisvalid | indisready | indislive 
------------+------------+-----------
 t          | t          | t
(1 row)

Looks fine as well, lets do a quick test if the index is really usable:

Time: 0.453 ms
([email protected][local]:5440) [postgres] > insert into t1 select * from t1;
INSERT 0 11
Time: 2.373 ms
([email protected][local]:5440) [postgres] > insert into t1 select * from t1;
INSERT 0 22
Time: 39.653 ms
([email protected][local]:5440) [postgres] > insert into t1 select * from t1;
INSERT 0 44
Time: 1.110 ms
([email protected][local]:5440) [postgres] > insert into t1 select * from t1;
INSERT 0 88
Time: 1.072 ms
([email protected][local]:5440) [postgres] > insert into t1 select * from t1;
INSERT 0 176
Time: 1.455 ms
([email protected][local]:5440) [postgres] > insert into t1 select * from t1;
INSERT 0 352
Time: 1.432 ms
([email protected][local]:5440) [postgres] > insert into t1 select * from t1;
INSERT 0 704
Time: 3.344 ms
([email protected][local]:5440) [postgres] > insert into t1 select * from t1;
INSERT 0 1408
Time: 20.972 ms
([email protected][local]:5440) [postgres] > explain analyze select a from t1 where a = to_date('01012017','dd.mm.yyyy');
                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Index Only Scan using i1 on t1  (cost=0.27..8.29 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=1)
   Index Cond: (a = to_date('01012017'::text, 'dd.mm.yyyy'::text))
   Heap Fetches: 0
 Planning time: 0.062 ms
 Execution time: 0.029 ms
(5 rows)

Perfect. The remaining question is: Does this operation block others from reading the table while it is being executed? Lets generate some more date to make the select operation a bit longer and then “watch” the statement in a separate session while we execute the conversion:

Time: 0.453 ms
drop table if exists t1;
create table t1 ( a varchar(20) );
insert into t1 (a) values ('01012017');
insert into t1 (a) values ('02012017');
insert into t1 (a) values ('03012017');
insert into t1 (a) values ('04012017');
insert into t1 (a) values ('05012017');
insert into t1 (a) values ('06012017');
insert into t1 (a) values ('07012017');
insert into t1 (a) values ('08012017');
insert into t1 (a) values ('09012017');
insert into t1 (a) values ('10012017');
insert into t1 (a) values ('11012017');
([email protected][local]:5440) [postgres] > insert into t1 select * from t1;
INSERT 0 11
-- and so on and so on ...
([email protected][local]:5440) [postgres] > insert into t1 select * from t1;
INSERT 0 360448

Then, in session 1 I am doing this:

Time: 0.453 ms
([email protected][local]:5440) [postgres] > select count(*) from t1 where a = '01012017';
 count  
--------
 131072
(1 row)
([email protected][local]:5440) [postgres] > \watch 0.1

In session 2 I am doing the conversion:

Time: 0.453 ms
([email protected][local]:5440) [postgres] > alter table t1 alter column a type date using to_date(a,'DDMMYYYY');
ALTER TABLE

In session 1 you’ll notice that the statement is blocked and finally you get this:

Time: 0.453 ms
ERROR:  date/time field value out of range: "01012017"
LINE 1: select count(*) from t1 where a = '01012017';
                                          ^
HINT:  Perhaps you need a different "datestyle" setting.

Conclusion: Converting a column from one data type to another is nothing you want to do when you system is live. When you can afford some downtime the “using” syntax is what you want to do and it is working quite fast. Hope this helps …

2 Comments

  • Mikhail Velikikh says:

    Hi Daniel,

    @ When you can afford some downtime the “using” syntax is what you want to do and it is working quite fast.

    Can we execute the “ALTER TABLE ALTER COLUMN USING” command in parallel? I am asking this because there are some huge tables for which the serial execution leads to an unacceptable downtime (and the definition of “huge” varies from one environment to another, it comes down to the available resources and hardware).
    A DBMS_REDEFINITION process can be executed in parallel that could greatly decrease the overall duration of it.
    Here is an example: Oracle Applications Labs Best Practices: Implementing Large-Scale Demantra Table Rebuilds To Improve Performance with Zero Downtime (Doc ID 1587179.1)

    Regarding the Oracle Database, the most recent version 12.2 now allows us to partition existing tables: http://docs.oracle.com/database/122/SQLRF/ALTER-TABLE.htm#GUID-552E7373-BF93-477D-9DA3-B2C9386F2877__MODIFY_TO_PARTITIONED-3134CF5C
    I have tinkered with it for a while and it resembles a traditional DBMS_REDEFINITION process very well. So, technically, Oracle might have implemented the datatype conversion functionality like it is implemented in PostgreSQL.
    I hope Oracle will provide this functionality soon. Now Oracle is pretty much strict on this matter: http://docs.oracle.com/database/122/SQLRF/ALTER-TABLE.htm#d261581e8573

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure