Infrastructure at your Service

Daniel Westermann

Can I do it with PostgreSQL? – 4 – External tables

In the last posts of this series we talked about restore points, how you could do things that would require the dual table in Oracle and how you can make use of tablespaces in PostgreSQL. In this post we’ll look at what my colleague Clemens thinks is one of the greatest features in Oracle. Can you do external external tables in PostgreSQL?

The easy answers is: yes, of course you can. And you can do it in various ways. To start with we’ll need a sample file were we can load data from. For the test here we’ll use this one. Note that this file uses Windows line feeds which you’ll need to convert to unix style if you are working on Linux like me. You can use VI to do this.

Once you extracted the file the content looks like this:

[email protected]:/home/postgres/ [PG961] head -2 FL_insurance_sample.csv
policyID,statecode,county,eq_site_limit,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity
119736,FL,CLAY COUNTY,498960,498960,498960,498960,498960,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1

So, we have a total of 18 columns and 36634 rows to test with. Should be fine 🙂

How can we bring that into PostgreSQL? Clemens talked about SQL*Loader in his post. There is a similar project for PostgreSQL called pg_bulkload which we’ll not be talking about. We will look at two options you can use to load data from files into PostgreSQL which are available by default:

  1. copy
  2. file_fdw

What we need no matter with which option we go first is the definition of the table. These are the columns we need:

[email protected]:/home/postgres/ [PG961] head -1 FL_insurance_sample.csv | sed 's/,/,\n/g'
policyID,
statecode,
county,
eq_site_limit,
hu_site_limit,
fl_site_limit,
fr_site_limit,
tiv_2011,
tiv_2012,
eq_site_deductible,
hu_site_deductible,
fl_site_deductible,
fr_site_deductible,
point_latitude,
point_longitude,
line,
construction,
point_granularity

So the create table statement will look something like this:

([email protected][local]:5439) [postgres] > create table exttab ( policyID int,
                                                           statecode varchar(2),
                                                           county varchar(50),
                                                           eq_site_limit numeric,
                                                           hu_site_limit numeric,
                                                           fl_site_limit numeric,
                                                           fr_site_limit numeric,
                                                           tiv_2011 numeric,
                                                           tiv_2012 numeric,
                                                           eq_site_deductible numeric,
                                                           hu_site_deductible numeric,
                                                           fl_site_deductible numeric,
                                                           fr_site_deductible numeric,
                                                           point_latitude numeric,
                                                           point_longitude numeric,
                                                           line varchar(50),
                                                           construction varchar(50),
                                                           point_granularity int);
CREATE TABLE

Now that we have the table we can use copy to load the data:

([email protected][local]:5439) [postgres] > copy exttab from '/home/postgres/FL_insurance_sample.csv' with csv header;
COPY 36634
([email protected][local]:5439) [postgres] > select count(*) from exttab;
 count 
-------
 36634
(1 row)

Quite fast. But there is a downside with this approach. As Clemens mentions in his posts one of the benefits of external tables in Oracle is that you can access the file via standard SQL and do transformations before the data arrives in the database. Can you do the same with PostgreSQL? Yes, if you use the file_fdw foreign data wrapper.

The file_fdw is available by default:

([email protected][local]:5439) [postgres] > create extension file_fdw;
CREATE EXTENSION
Time: 442.777 ms
([email protected][local]:5439) [postgres] > \dx
                        List of installed extensions
   Name   | Version |   Schema   |                Description                
----------+---------+------------+-------------------------------------------
 file_fdw | 1.0     | public     | foreign-data wrapper for flat file access
 plpgsql  | 1.0     | pg_catalog | PL/pgSQL procedural language

([email protected][local]:5439) [postgres] > create server srv_file_fdw foreign data wrapper file_fdw;
CREATE SERVER
([email protected][local]:5439) [postgres] > create foreign table exttab2  ( policyID int,
                                statecode varchar(2),
                                county varchar(50),
                                eq_site_limit numeric,     
                                hu_site_limit numeric,     
                                fl_site_limit numeric,     
                                fr_site_limit numeric,     
                                tiv_2011 numeric,          
                                tiv_2012 numeric,          
                                eq_site_deductible numeric,
                                hu_site_deductible numeric,
                                fl_site_deductible numeric,
                                fr_site_deductible numeric,
                                point_latitude numeric,    
                                point_longitude numeric,   
                                line varchar(50),          
                                construction varchar(50),  
                                point_granularity int)     
server srv_file_fdw options ( filename '/home/postgres/FL_insurance_sample.csv', format 'csv', header 'true' );
CREATE FOREIGN TABLE

([email protected][local]:5439) [postgres] > select count(*) from exttab2;
 count 
-------
 36634
(1 row)

From now on you can work with the file by accessing it using standard SQL and all the options you have with SQL are available. Very much the same as Clemens states in his post: “Because external tables can be accessed through SQL. You have all possibilities SQL-queries offer. Prallelism, difficult joins with internal or other external tables and of course all complex operations SQL allows. ETL became much easier using external tables, because it allowed to process data through SQL joins and filters already before it was loaded in the database.”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure