Infrastructure at your Service

Daniel Westermann

PostgreSQL as a central reporting hub? – Yes, of course

For every client we do projects for there usually are database systems from various vendors. In most places you at least find some Oracle and MS SQL Server databases. Probably somewhere there are even some MySQL/MariaDB or PostgreSQL instance running. The Oracle and MS SQL Server stuff almost always is there because vendors require it. For MySQL/MariaDB the common use case are web applications, ticketing systems and home grown developments. At some point in time there might be a requirement to centralize important data of all these databases into a central reporting hub. The more data you have the more there is a requirement to analyze it and to create reports that drive some sorts of decisions. PostgreSQL is very well prepared to assist in this. If you have read the last posts about connecting your PostgreSQL instance to either Oracle, MS SQL Server, MariaDB/MySQL or even other PostgreSQL instances you might already know into what direction this post will go.

This is a sample use case: There is a requirement to centralize all the sales figures of a company into one big reporting database. The (simplified) database landscape in this company looks exactly as described above:

pg_reporting_1

There is one Oracle database, one MS SQL Server instance, one MariaDB/MySQL instance and one PostgreSQL instance. Each of them holds some sales figures which shall be integrated into the reporting database. For the demo setup I use the Dell DVD Store Database sample schema. You can read how to set this up for:

As the goal is to have a PostgreSQL instance as a central reporting hub the desired database landscape will approximately look like this:

pg_reporting_2

Lets assume all the foreign data wrappers are already setup in the central PostgreSQL instance:

([email protected][local]:4445) [postgres] > \dx
                                                 List of installed extensions
     Name     |    Version    |   Schema   |                                    Description                                    
--------------+---------------+------------+-----------------------------------------------------------------------------------
 mysql_fdw    | 1.0           | public     | Foreign data wrapper for querying a MySQL server
 oracle_fdw   | 1.1           | public     | foreign data wrapper for Oracle access
 plpgsql      | 1.0           | pg_catalog | PL/pgSQL procedural language
 postgres_fdw | 1.0           | ds2_oracle | foreign-data wrapper for remote PostgreSQL servers
 tds_fdw      | 2.0.0-alpha.1 | public     | Foreign data wrapper for querying a TDS database (Sybase or Microsoft SQL Server)

In addition lets assume all the foreign servers and all the user mappings are there and are working:

([email protected][local]:4445) [postgres] > select srvname,srvoptions from pg_foreign_server;
   srvname    |                                      srvoptions                                       
--------------+---------------------------------------------------------------------------------------
 mysql_server | {host=192.168.22.41,port=3306}
 mssql_svr    | {servername=192.168.22.102,port=1433,database=ds2,tds_version=7.3,msg_handler=notice}
 oracle_srv   | {dbserver=//192.168.22.42/DELLDVD}
 postgres_srv | {host=192.168.22.40,port=5432,dbname=ds2}
(4 rows)

([email protected][local]:4445) [postgres] > select * from pg_user_mappings;
 umid  | srvid |   srvname    | umuser | usename  |               umoptions                
-------+-------+--------------+--------+----------+----------------------------------------
 65547 | 65546 | mysql_server |     10 | postgres | {username=web,password=web}
 65587 | 65586 | mssql_svr    |     10 | postgres | {username=ds2user,password=ds2}
 65615 | 65614 | oracle_srv   |     10 | postgres | {user=DS2,password=ds2}
 65676 | 65675 | postgres_srv |     10 | postgres | {user=ds2,password=ds2}

As the central reporting database is already connected to all the other database systems how could you organize the fetching of the data? One approach is to create a separate schema for each of the foreign databases and one additional schema that combines the data:

([email protected][local]:4445) [postgres] > create schema ds2_mssql;
([email protected][local]:4445) [postgres] > create schema ds2_oracle;
([email protected][local]:4445) [postgres] > create schema ds2_mysql;
([email protected][local]:4445) [postgres] > create schema ds2_postgresql;
([email protected][local]:4445) [postgres] > create schema ds2_combined;
([email protected][local]:4445) [postgres] > \dn
      List of schemas
      Name      |  Owner   
----------------+----------
 ds2_combined   | postgres
 ds2_mssql      | postgres
 ds2_mysql      | postgres
 ds2_oracle     | postgres
 ds2_postgresql | postgres
 public         | postgres
(6 rows)

The next step is to create the foreign tables we want to work with.

For Oracle, MySQL/MariaDB and PostgreSQL we can use the import foreign schema command:

postgres=# import foreign schema "DS2" from server oracle_srv into ds2_oracle;
IMPORT FOREIGN SCHEMA
postgres= import foreign schema "DS2" from server mysql_srv into ds2_mysql;
IMPORT FOREIGN SCHEMA
postgres= import foreign schema "public" from server postgres_srv into ds2_postgresql;
IMPORT FOREIGN SCHEMA

For MS SQL Server we need to specify the foreign tables on our own (there is a feature request to implement import foreign schema):

create foreign table ds2_mssql.orders 
  ( orderid numeric not null
  , orderdate timestamp(0) without time zone not null
  , customerid numeric                                
  , netamount numeric(12,2) not null 
  , tax numeric(12,2) not null 
  , totalamount numeric(12,2) not null 
  )
SERVER mssql_svr
OPTIONS (table 'dbo.orders', row_estimate_method 'showplan_all');

create foreign table ds2_mssql.orderlines
  ( orderlineid numeric not null 
  , orderid numeric not null 
  , prod_id numeric not null 
  , quantity numeric not null 
  , orderdate timestamp(0) without time zone not null 
  )
SERVER mssql_svr
OPTIONS (table 'dbo.orderlines', row_estimate_method 'showplan_all');

Having all the foreign tables available we can start to work with the data either by creating materialized views:

create materialized view ds2_combined.mv_orders as
  select * from ds2_mssql.orders
  union all
  select * from ds2_mysql."ORDERS"
  union all
  select * from ds2_oracle.orders
  union all
  select * from ds2_postgresql.orders
  with no data;
refresh materialized view ds2_combined.mv_orders with data;

… or by importing the data into PostgreSQL and then build reports on top of that:

BEGIN;
  insert into ds2_combined.orders select * from ds2_mssql.orders;
  insert into ds2_combined.orders select * from ds2_mysql."ORDERS";
  insert into ds2_combined.orders select * from ds2_oracle.orders;
  insert into ds2_combined.orders select * from ds2_postgresql.orders;
END;
BEGIN;
  insert into ds2_combined.orderlines 
         select * from ds2_mssql.orderlines;
  insert into ds2_combined.orderlines 
         select * from ds2_mysql."ORDERLINES";
  insert into ds2_combined.orderlines 
         select * from ds2_oracle.orderlines;
  insert into ds2_combined.orderlines 
         select * from ds2_postgresql.orderlines;
END;

Having all the data locally available you can do whatever you want with it. Have fun with reporting on your data …

Btw: The source for this was a session at the Swiss PGDAY 2016. You can download the slides there.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure