Infrastructure at your Service

Oracle Team

Variations on 1M rows insert (1): bulk insert

By Franck Pachot

.
This blog is the first one of a series about inserting 1 million rows. Do you have an idea about the time it takes? The idea came from another project, about NoSQL databases, but I’ll focus on Oracle in those posts. There are a lot of different ways to insert 1 million rows in Oracle: row-by-row insert or bulk, row-by-row commit or not, different ways to commit, In-Memory Database, Database In-Memory Option, etc.

Test case

Here is the table I create:

create table DEMO ("id" number , "text" varchar2(15), "number" number) ;

Here is the first PL/SQL I’ll run:

declare
 type people_array is varray(12) of varchar(15);
 people people_array := people_array( 'Marc', 'Bill', 'George', 'Eliot', 'Matt', 'Trey', 'Tracy','Greg', 'Steve', 'Kristina', 'Katie', 'Jeff' );
 people_count number :=people.COUNT;
 n number;
begin
 for i in 0..1e6 loop
  n:=trunc(dbms_random.value(0,10000));
  insert into DEMO values( i , people( dbms_random.value(1,people_count) ) , n );
 end loop;
 commit;
end;
/

You may wonder why I chose this testcase. As I said, this comes from another project and you can google to find where the the test case comes from. The performance will be mainly about inserts but the idea is also to implement some procedural processing to show that it’s possible in different database system, here in PL/SQL on Oracle and in a future post in TimesTen.

Test environment

All the test will be done on a VM with only one CPU activated. This is important because some activity will involve background processes.

$ grep "model name" /proc/cpuinfo
model name      : Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz

Oracle 11g

My first tests are done on the first Oracle Developer Day VM that was available for download. It’s a 11g version. We will see 12c later.

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production

And it’s 32-bit:

SQL> select addr,4*length(addr) from v$process where rownum=1;

ADDR     4*LENGTH(ADDR)
-------- --------------
3B3C9F80             32

With and without index

As you have seen above, I’ve created the table with a primary key, so I have an index on it. How long does it take to run the statement above to insert 1 million rows?

PL/SQL procedure successfully completed.
Elapsed: 00:00:46.80

On one CPU it takes 46 seconds to insert one million rows, row-by-row, from a simple PL/SQL loop. We will take that as the base for future comparisons. Of course I’ve done several runs and I keep them as long as there is less than 5% difference.

Just to show that the index has an important cost when inserting, let’s run the same testcase without the PRIMARY KEY:

PL/SQL procedure successfully completed.
Elapsed: 00:00:35.10

I’ll keep the testcase with the primary key index as a point of comparison. There is no point to insert one million rows if we can’t retreive one easily. And the table with primary key pattern is the most relevant one when comparing different databases – SQL or NoSQL.

Bulk insert

The previous inserts are quite optimal: directly done from PL/SQL in the database, so no roundtrips. Commit only at the end and we know that in Oracle it’s better not to commit too often. I will cover the need of row-by-row commits in a future post. But this is still row-by-row insert and this is not optimal. When we have all rows to insert from one session, then we can insert in bulk (one execution of the insert statement inserts several rows).

There are several ways to do that depending on the client language. Here we are in PL/SQL, so it is easy: put all rows into a collection and call the insert with that collection:

declare
 type people_array is varray(12) of varchar(15);
 people people_array := people_array( 'Marc', 'Bill', 'George', 'Eliot', 'Matt', 'Trey', 'Tracy','Greg', 'Steve', 'Kristina', 'Katie', 'Jeff' );
 people_count number :=people.COUNT;
 -- declare the collection
 type DEMO_record_type is record ("user_id" number , "name" varchar2(15), "number" number) ;
 type DEMO_collection_type is table of DEMO_record_type index by binary_integer;
 DEMO_collection DEMO_collection_type;
begin
 -- fill the collection
 for i in 1..1e6 loop
  DEMO_collection(i)."user_id":=i;
  DEMO_collection(i)."name":=people( dbms_random.value(1,people.count) );
  DEMO_collection(i)."number":=trunc(dbms_random.value(0,10000));
  null;
 end loop;
 -- call the insert
 forall i in 1..DEMO_collection.count insert into DEMO values DEMO_collection(i);
 commit;
end;
/

The result is obvious:

PL/SQL procedure successfully completed.
Elapsed: 00:00:06.55

And for the testcase without the primary key index:

PL/SQL procedure successfully completed.
Elapsed: 00:00:03.80

So bulk inserts here are 8x to 10x faster.

Conclusion

The immediate conclusion is that when you have lot of rows to insert and you don’t need to commit at each row, then you should do it in bulk. Here I made it from PL/SQL which run directly in the database. If you are inserting from an application server or a remote client, then the elimination of roundtrips will be a lot more important.

In future posts we will see what happens when you need to commit for each row. And I will also run it with In-Memory database and with Database In-Memory. Don’t forget that if you are in Switzerland in June, our experts from Oracle, Microsoft, and SAP technologies will talk about In-Memory as implemented by SQL Server, Oracle and SAP HANA. All the same day. It’s free and you can register now: Event In-Memory: boost your IT performance!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Oracle Team
Oracle Team