Infrastructure at your Service

In Oracle you can use the external table feature to load data from files into the database. The traditional way to do this in PostgreSQL is to use the copy command. But there is another option which makes use of foreign data wrappers. The foreign data wrapper for doing this is file_fdw.

For creating some data here is a little script which generates 1’000 lines of the form: ID, TEXT:

#!/bin/bash

OUTPUT="/var/tmp/data.csv"
`rm -rf ${OUTPUT}`

for i in {1..1000}
do
  STRING=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)
  echo "${i} , ${STRING}" >> /var/tmp/data.csv
done

For loading that into PostgreSQL the procedure is simple. First we need to create the extension and then define a server and a foreign table:

([email protected][local]:5000) [postgres] > create extension file_fdw;
CREATE EXTENSION
Time: 752.715 ms
([email protected][local]:5000) [postgres] > create server srv_file_fdw foreign data wrapper file_fdw;
CREATE SERVER
Time: 23.317 ms
([email protected][local]:5000) [postgres] > create foreign table t_csv ( a int, b varchar(50) )
                                            server srv_file_fdw 
                                            options ( filename '/var/tmp/data.csv', format 'csv' );
CREATE FOREIGN TABLE
Time: 74.843 ms

From now we can access the table and load the data:

([email protected][local]:5000) [postgres] > select count(*) from t_csv;
 count 
-------
  1000
(1 row)

Time: 0.707 ms
([email protected][local]:5000) [postgres] > create table t_csv2 as select * from t_csv;
SELECT 1000
Time: 147.859 ms

Currently, as outlined in the documentation, you can not write to a foreign table defined with file_fdw:

([email protected][local]:5000) [postgres] > insert into t_csv values (-1,'a');
ERROR:  cannot insert into foreign table "t_csv"
Time: 18.113 ms

Using explain you can get some details of the file:

([email protected][local]:5000) [postgres] > explain select count(*) from t_csv;
                             QUERY PLAN                             
--------------------------------------------------------------------
 Aggregate  (cost=171.15..171.16 rows=1 width=0)
   ->  Foreign Scan on t_csv  (cost=0.00..167.10 rows=1621 width=0)
         Foreign File: /var/tmp/data.csv
         Foreign File Size: 38893
(4 rows)

Time: 0.521 ms

If you wonder why the estimate for the rows is wrong:

([email protected][local]:5000) [postgres] > analyze t_csv;
ANALYZE
Time: 79.925 ms
([email protected][local]:5000) [postgres] > explain select count(*) from t_csv;
                             QUERY PLAN                             
--------------------------------------------------------------------
 Aggregate  (cost=107.50..107.51 rows=1 width=0)
   ->  Foreign Scan on t_csv  (cost=0.00..105.00 rows=1000 width=0)
         Foreign File: /var/tmp/data.csv
         Foreign File Size: 38893
(4 rows)

Time: 0.296 ms

here you go …

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure