Infrastructure at your Service

Daniel Westermann

External tables in PostgreSQL?

In Oracle you can use the external table feature to load data from files into the database. The traditional way to do this in PostgreSQL is to use the copy command. But there is another option which makes use of foreign data wrappers. The foreign data wrapper for doing this is file_fdw.

For creating some data here is a little script which generates 1’000 lines of the form: ID, TEXT:

#!/bin/bash

OUTPUT="/var/tmp/data.csv"
`rm -rf ${OUTPUT}`

for i in {1..1000}
do
  STRING=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)
  echo "${i} , ${STRING}" >> /var/tmp/data.csv
done

For loading that into PostgreSQL the procedure is simple. First we need to create the extension and then define a server and a foreign table:

(postgres@[local]:5000) [postgres] > create extension file_fdw;
CREATE EXTENSION
Time: 752.715 ms
(postgres@[local]:5000) [postgres] > create server srv_file_fdw foreign data wrapper file_fdw;
CREATE SERVER
Time: 23.317 ms
(postgres@[local]:5000) [postgres] > create foreign table t_csv ( a int, b varchar(50) )
                                            server srv_file_fdw 
                                            options ( filename '/var/tmp/data.csv', format 'csv' );
CREATE FOREIGN TABLE
Time: 74.843 ms

From now we can access the table and load the data:

(postgres@[local]:5000) [postgres] > select count(*) from t_csv;
 count 
-------
  1000
(1 row)

Time: 0.707 ms
(postgres@[local]:5000) [postgres] > create table t_csv2 as select * from t_csv;
SELECT 1000
Time: 147.859 ms

Currently, as outlined in the documentation, you can not write to a foreign table defined with file_fdw:

(postgres@[local]:5000) [postgres] > insert into t_csv values (-1,'a');
ERROR:  cannot insert into foreign table "t_csv"
Time: 18.113 ms

Using explain you can get some details of the file:

(postgres@[local]:5000) [postgres] > explain select count(*) from t_csv;
                             QUERY PLAN                             
--------------------------------------------------------------------
 Aggregate  (cost=171.15..171.16 rows=1 width=0)
   ->  Foreign Scan on t_csv  (cost=0.00..167.10 rows=1621 width=0)
         Foreign File: /var/tmp/data.csv
         Foreign File Size: 38893
(4 rows)

Time: 0.521 ms

If you wonder why the estimate for the rows is wrong:

(postgres@[local]:5000) [postgres] > analyze t_csv;
ANALYZE
Time: 79.925 ms
(postgres@[local]:5000) [postgres] > explain select count(*) from t_csv;
                             QUERY PLAN                             
--------------------------------------------------------------------
 Aggregate  (cost=107.50..107.51 rows=1 width=0)
   ->  Foreign Scan on t_csv  (cost=0.00..105.00 rows=1000 width=0)
         Foreign File: /var/tmp/data.csv
         Foreign File Size: 38893
(4 rows)

Time: 0.296 ms

here you go …

Leave a Reply

Daniel Westermann
Daniel Westermann

Senior Consultant and Technology Leader Open Infrastructure