Infrastructure at your Service

I was searching a tool for anonymizing data in a PostgreSQL database and I have tested the extension pg_anonymizer.
PostgreSQL_anonymizer is a set of SQL functions that remove personally identifiable values from a PostgreSQL table and replace them with random-but-plausible values. The goal is to avoid any identification from the data record while remaining suitable for testing, data analysis and data processing.
In this blog I am showing how this extension can be used. I am using a PostgreSQL 10 database.
The first step is to install the extension pg_anonymizer. In my case I did it with with pgxn client

[postgres@pgserver2 ~]$ pgxn install postgresql_anonymizer --pg_config /u01/app/postgres/product/10/db_1/bin/pg_config
INFO: best version: postgresql_anonymizer 0.0.3
INFO: saving /tmp/tmpVf3psT/postgresql_anonymizer-0.0.3.zip
INFO: unpacking: /tmp/tmpVf3psT/postgresql_anonymizer-0.0.3.zip
INFO: building extension
gmake: Nothing to be done for `all'.
INFO: installing extension
/usr/bin/mkdir -p '/u01/app/postgres/product/10/db_1/share/extension'
/usr/bin/mkdir -p '/u01/app/postgres/product/10/db_1/share/extension/anon'
/usr/bin/install -c -m 644 .//anon.control '/u01/app/postgres/product/10/db_1/share/extension/'
/usr/bin/install -c -m 644 .//anon/anon--0.0.3.sql  '/u01/app/postgres/product/10/db_1/share/extension/anon/'
[postgres@pgserver2 ~]$

We can then verify that under /u01/app/postgres/product/10/db_1/share/extension we have a file anon.control and a directory named anon

[postgres@pgserver2 extension]$ ls -ltra anon*
-rw-r--r--. 1 postgres postgres 167 Sep 13 10:54 anon.control

anon:
total 18552
drwxrwxr-x. 3 postgres postgres    12288 Sep 13 10:54 ..
drwxrwxr-x. 2 postgres postgres       28 Sep 13 10:54 .
-rw-r--r--. 1 postgres postgres 18980156 Sep 13 10:54 anon--0.0.3.sql

Let’s create a database named prod and let’s create the required extensions. tsm_system_rows should delivered by the contrib.

prod=# \c prod
You are now connected to database "prod" as user "postgres".
prod=#
prod=# CREATE EXTENSION tsm_system_rows;;
CREATE EXTENSION
prod=#

prod=# CREATE EXTENSION anon;
CREATE EXTENSION
prod=#


prod=# \dx
                                    List of installed extensions
      Name       | Version |   Schema   |                        Description

-----------------+---------+------------+----------------------------------------------------
--------
 anon            | 0.0.3   | anon       | Data anonymization tools
 plpgsql         | 1.0     | pg_catalog | PL/pgSQL procedural language
 tsm_system_rows | 1.0     | public     | TABLESAMPLE method which accepts number of rows as
a limit
(3 rows)

prod=#

The extension will create following functions in the schema anon. These functions can be used to mask some data.

prod=# set search_path=anon;
SET
prod=# \df
                                                               List of functions
 Schema |           Name           |     Result data type     |                          Argu
ment data types                           |  Type
--------+--------------------------+--------------------------+------------------------------
------------------------------------------+--------
 anon   | random_city              | text                     |
                                          | normal
 anon   | random_city_in_country   | text                     | country_name text
                                          | normal
 anon   | random_company           | text                     |
                                          | normal
 anon   | random_country           | text                     |
                                          | normal
 anon   | random_date              | timestamp with time zone |
                                          | normal
 anon   | random_date_between      | timestamp with time zone | date_start timestamp with tim
e zone, date_end timestamp with time zone | normal
 anon   | random_email             | text                     |
                                          | normal
 anon   | random_first_name        | text                     |
                                          | normal
 anon   | random_iban              | text                     |
                                          | normal
 anon   | random_int_between       | integer                  | int_start integer, int_stop integer
                            | normal
 anon   | random_last_name         | text                     |
                            | normal
 anon   | random_phone             | text                     | phone_prefix text DEFAULT '0'::text
                            | normal
 anon   | random_region            | text                     |
                            | normal
 anon   | random_region_in_country | text                     | country_name text
                            | normal
 anon   | random_siren             | text                     |
                            | normal
 anon   | random_siret             | text                     |
                            | normal
 anon   | random_string            | text                     | l integer
                            | normal
 anon   | random_zip               | text                     |
                            | normal
(18 rows)

prod=#

Now in the database prod let’s create a table with some data.

prod=# \d customers
                      Table "public.customers"
   Column   |         Type          | Collation | Nullable | Default
------------+-----------------------+-----------+----------+---------
 first_name | character varying(30) |           |          |
 last_name  | character varying(30) |           |          |
 email_add  | character varying(30) |           |          |
 country    | character varying(60) |           |          |
 iban       | character varying(60) |           |          |
 amount     | integer               |           |          |

prod=#

prod=# table customers;
 first_name | last_name |        email_add        |   country    |            iban            |   amount
------------+-----------+-------------------------+--------------+----------------------------+------------
 Michel     | Delco     | michel.delco@yaaa.fr    | FRANCE       | FR763000600001123456890189 |    5000000
 Denise     | Blanchot  | denise.blanchot@yaaa.de | GERMANY      | DE91100000000123456789     | 1000000000
 Farid      | Dim       | farid.dim@yaaa.sa       | Saudi Arabia | SA4420000001234567891234   |    2500000
(3 rows)

prod=#

Let’s say that I want some people to access to all data for this table, but I don’t want them to see the real email, the real country and the real iban of the customers.
One solution should be to create a view with anonymous data for these columns. This will replace them with random-but-plausible values for these columns

prod=# create view Customers_anon as select first_name as Firstname ,last_name  as Lastnmame,anon.random_email() as Email ,anon.random_country() as Country, anon.random_iban() as Iban ,amount as Amount from customers;
CREATE VIEW

And then grant the access privilege to concerned people

prod=# select * from customers_anon ;
 firstname | lastnmame |             email             | country |            iban            |   amount
-----------+-----------+-------------------------------+---------+----------------------------+------------
 Michel    | Delco     | wlothean0@springer.com        | Spain   |  AD1111112222C3C3C3C3C3C3  |    5000000
 Denise    | Blanchot  | emoraloy@dropbox.com          | Myanmar |  AD1111112222C3C3C3C3C3C3  | 1000000000
 Farid     | Dim       | vbritlandkt@deliciousdays.com | India   |  AD1111112222C3C3C3C3C3C3  |    2500000
(3 rows)

prod=#

Thanks to Damien for this project. Just not that this project is in its early development and should be used with care (see Damien comment below).

Mouhamadou Diaw
Mouhamadou Diaw

Consultant