Infrastructure at your Service

Daniel Westermann

Variations on 1M rows insert (1): bulk insert – PostgreSQL

When I read Franck’s post about Variations on 1M rows insert (1): bulk insert I thought doing quite the same in PostgreSQL might be interesting. Lets start by using the same test tables, one using a primary key and the other one without a primary key:create table DEMO (“id” int , “text” varchar(15), “number” int);

create table DEMO_PK ("id" int , "text" varchar(15), "number" int,
 constraint demo_pk_pk primary key (id) ) ;
postgres=# \d demo*
 Table "public.demo"
 Column | Type | Modifiers
 --------+-----------------------+-----------
 id | integer |
 text | character varying(15) |
 number | integer |
Table "public.demo_pk"
 Column | Type | Modifiers
 --------+-----------------------+-----------
 id | integer | not null
 text | character varying(15) |
 number | integer |
 Indexes:
 "demo_pk_pk" PRIMARY KEY, btree (id)
Index "public.demo_pk_pk"
 Column | Type | Definition
 --------+---------+------------
 id | integer | id
 primary key, btree, for table "public.demo_pk"

 

I am using a Virtual Box VM with Oracle Linux 7.1 (64bit) using 512mb of memory and one vCPU for the Tests:

cat /etc/oracle-release 
Oracle Linux Server release 7.1
cat /proc/meminfo | head -1
MemTotal:         502612 kB
cat /proc/cpuinfo | grep proc
processor	: 0

Similar to Franck’s plsql block I’ll use a plpgsql for the first tests:

\timing on
show autocommit;
truncate table DEMO;
DO $$DECLARE
  l_people_array varchar(15)[12] := '{"Marc", "Bill", "George", "Eliot", "Matt", "Trey", "Tracy","Greg", "Steve", "Kristina", "Katie", "Jeff"}';
  n int;
BEGIN
  for i in 0..1e6 loop
     n:=trunc(random()*1000+1);
     insert into DEMO values( i , l_people_array[floor((random()*11))+1::int] , n );
  end loop;
END$$;

(The code for the demo_pk table is exactly the same, except for the table name). It does not exactly the same as Frank’s plsql does but it does comparable things, I believe 🙂
The PostgreSQL version I’ll use is 9.4.1:

postgres=# select version();
                                                         version                                                          
--------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.4.1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
(1 row)

Lets start by executing the test without a primary key and thus no index:

postgres=# \i 1mio_rows_no_pk.sql
Timing is on.
 autocommit 
------------
 on
(1 row)

Time: 0.183 ms
TRUNCATE TABLE
Time: 71.339 ms
DO
Time: 6861.547 ms

Not so bad. Doing the same with a primary key and thus an index to maintain:

postgres-# \i 1mio_rows_pk.sql
Timing is on.
 autocommit 
------------
 on
(1 row)

Time: 0.494 ms
TRUNCATE TABLE
Time: 37.314 ms
DO
Time: 10900.631 ms

Around 3 seconds more, but again: Not so bad. As there is no “bulk collect” or “bulk insert” in PostgreSQL I can not do the same tests as Franck. But PostgreSQL knows a copy command, lets look at that. First I’ll unload the table to a flat file so I have something to load:

postgres=# copy demo to '/home/postgres/demo.txt';
COPY 1000001
Time: 239.161 ms

Believe it or not: it took 240 ms to unload 1000001 rows from a table to a flat file. Now lets do the opposite and load the data back to the table using the copy command. The script used for doing this looks like this for both variations (with and without primary key):

\timing on
show autocommit;
truncate table demo;
copy DEMO from '/home/postgres/demo.txt';

First one without the primary key:

postgres=# \i 1mio_rows_no_pk_copy.sql
Timing is on.
 autocommit 
------------
 on
(1 row)

Time: 0.663 ms
TRUNCATE TABLE
Time: 32.026 ms
COPY 1000001
Time: 1877.480 ms

The second one with the primary key:

postgres=# \i 1mio_rows_pk_copy.sql
Timing is on.
TRUNCATE TABLE
Time: 33.370 ms
COPY 1000001
Time: 6227.844 ms

This makes a big difference: 1,8 seconds compared to 6,2 seconds. But still amazingly fast. For those who do know PostgreSQL a little bit and might wonder about fsync:

postgres=# show fsync;
 fsync 
-------
 on
(1 row)

If I turn this off, without the primary key:

postgres=# alter system set fsync='off';
ALTER SYSTEM
Time: 62.066 ms
postgres=# select pg_reload_conf();
LOG:  received SIGHUP, reloading configuration files
LOG:  parameter "fsync" changed to "off"
 pg_reload_conf 
----------------
 t
(1 row)

Time: 7.253 ms
postgres=# show fsync;
 fsync 
-------
 off
(1 row)

Time: 0.601 ms
postgres=# \i 1mio_rows_no_pk_copy.sql
Timing is on.
 autocommit 
------------
 on
(1 row)

Time: 0.277 ms
TRUNCATE TABLE
Time: 10.064 ms
COPY 1000001
Time: 611.455 ms
postgres=# \i 1mio_rows_pk_copy.sql
Timing is on.
TRUNCATE TABLE
Time: 11.768 ms
COPY 1000001
Time: 4674.273 ms

Around 0,6 seconds without the primary key and around 4 seconds with the primary key. But remember that you might loose data if fsync is off and the server crashed during the load.

Lets wait what Frank will do in the next post of his series.

One Comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure