Last week at a customer the following question came up: We have a varchar2 column in Oracle that contains a date but actually is stored as a string. When we want to convert this column to be a real date data type in Oracle we can use dbms_redefinition. How can we do that in PostgreSQL? The first answer that came to me mind is: Add a new column (of data type date) and populate it with the converted string from the source column, drop the source column and rename the new column to the name of the dropped column. This for sure will work but it is not the most convenient way: What happens if the application is somehow dependent on the order of the columns? Shouldn’t be the case if the application is written well, but you never know.

To start with lets generate some test data:

drop table if exists t1;
create table t1 ( a varchar(20) );
insert into t1 (a) values ('01012017');
insert into t1 (a) values ('02012017');
insert into t1 (a) values ('03012017');
insert into t1 (a) values ('04012017');
insert into t1 (a) values ('05012017');
insert into t1 (a) values ('06012017');
insert into t1 (a) values ('07012017');
insert into t1 (a) values ('08012017');
insert into t1 (a) values ('09012017');
insert into t1 (a) values ('10012017');
insert into t1 (a) values ('11012017');
(postgres@[local]:5440) [postgres] > select * from t1;
    a     
----------
 01012017
 02012017
 03012017
 04012017
 05012017
 06012017
 07012017
 08012017
 09012017
 10012017
 11012017
(11 rows)

We now have a varchar column holding the date values as a string. When you look at the “alter table” command in PostgtreSQL you’ll notice something like this:

where action is one of:

    ADD [ COLUMN ] [ IF NOT EXISTS ] column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
    DROP [ COLUMN ] [ IF EXISTS ] column_name [ RESTRICT | CASCADE ]
    ALTER [ COLUMN ] column_name [ SET DATA ] TYPE data_type [ COLLATE collation ] [ USING expression ]

What will really help here is the “using” keyword because you can do things like this:

(postgres@[local]:5440) [postgres] > alter table t1 alter column a type date using to_date(a,'DDMMYYYY');
ALTER TABLE
(postgres@[local]:5440) [postgres] > d t1
     Table "public.t1"
 Column | Type | Modifiers 
--------+------+-----------
 a      | date | 

That’s really cool: You can pass a function (this mean a self written function as well) to the alter command to specify on how you want to do the conversion.

When there is an index on the column, what happens to the index?

(postgres@[local]:5440) [postgres] > d t1
             Table "public.t1"
 Column |         Type          | Modifiers 
--------+-----------------------+-----------
 a      | character varying(20) | 
Indexes:
    "i1" btree (a)
(postgres@[local]:5440) [postgres] > alter table t1 alter column a type date using to_date(a,'DDMMYYYY');
ALTER TABLE
Time: 5.931 ms
(postgres@[local]:5440) [postgres] > d t1
     Table "public.t1"
 Column | Type | Modifiers 
--------+------+-----------
 a      | date | 
Indexes:
    "i1" btree (a)
(postgres@[local]:5440) [postgres] > select indisvalid,indisready,indislive from pg_index where indexrelid = 'i1'::regclass; 
 indisvalid | indisready | indislive 
------------+------------+-----------
 t          | t          | t
(1 row)

Looks fine as well, lets do a quick test if the index is really usable:

Time: 0.453 ms
(postgres@[local]:5440) [postgres] > insert into t1 select * from t1;
INSERT 0 11
Time: 2.373 ms
(postgres@[local]:5440) [postgres] > insert into t1 select * from t1;
INSERT 0 22
Time: 39.653 ms
(postgres@[local]:5440) [postgres] > insert into t1 select * from t1;
INSERT 0 44
Time: 1.110 ms
(postgres@[local]:5440) [postgres] > insert into t1 select * from t1;
INSERT 0 88
Time: 1.072 ms
(postgres@[local]:5440) [postgres] > insert into t1 select * from t1;
INSERT 0 176
Time: 1.455 ms
(postgres@[local]:5440) [postgres] > insert into t1 select * from t1;
INSERT 0 352
Time: 1.432 ms
(postgres@[local]:5440) [postgres] > insert into t1 select * from t1;
INSERT 0 704
Time: 3.344 ms
(postgres@[local]:5440) [postgres] > insert into t1 select * from t1;
INSERT 0 1408
Time: 20.972 ms
(postgres@[local]:5440) [postgres] > explain analyze select a from t1 where a = to_date('01012017','dd.mm.yyyy');
                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Index Only Scan using i1 on t1  (cost=0.27..8.29 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=1)
   Index Cond: (a = to_date('01012017'::text, 'dd.mm.yyyy'::text))
   Heap Fetches: 0
 Planning time: 0.062 ms
 Execution time: 0.029 ms
(5 rows)

Perfect. The remaining question is: Does this operation block others from reading the table while it is being executed? Lets generate some more date to make the select operation a bit longer and then “watch” the statement in a separate session while we execute the conversion:

Time: 0.453 ms
drop table if exists t1;
create table t1 ( a varchar(20) );
insert into t1 (a) values ('01012017');
insert into t1 (a) values ('02012017');
insert into t1 (a) values ('03012017');
insert into t1 (a) values ('04012017');
insert into t1 (a) values ('05012017');
insert into t1 (a) values ('06012017');
insert into t1 (a) values ('07012017');
insert into t1 (a) values ('08012017');
insert into t1 (a) values ('09012017');
insert into t1 (a) values ('10012017');
insert into t1 (a) values ('11012017');
(postgres@[local]:5440) [postgres] > insert into t1 select * from t1;
INSERT 0 11
-- and so on and so on ...
(postgres@[local]:5440) [postgres] > insert into t1 select * from t1;
INSERT 0 360448

Then, in session 1 I am doing this:

Time: 0.453 ms
(postgres@[local]:5440) [postgres] > select count(*) from t1 where a = '01012017';
 count  
--------
 131072
(1 row)
(postgres@[local]:5440) [postgres] > watch 0.1

In session 2 I am doing the conversion:

Time: 0.453 ms
(postgres@[local]:5440) [postgres] > alter table t1 alter column a type date using to_date(a,'DDMMYYYY');
ALTER TABLE

In session 1 you’ll notice that the statement is blocked and finally you get this:

Time: 0.453 ms
ERROR:  date/time field value out of range: "01012017"
LINE 1: select count(*) from t1 where a = '01012017';
                                          ^
HINT:  Perhaps you need a different "datestyle" setting.

Conclusion: Converting a column from one data type to another is nothing you want to do when you system is live. When you can afford some downtime the “using” syntax is what you want to do and it is working quite fast. Hope this helps …