By Franck Pachot
In the previous blog post I measured the time it takes to insert 1 million rows. About 45 seconds when done row-by-row which goes down to 7 seconds when inserted in bulk. In both case, the commit was done only at the end. But in some cases you need to commit at each row, in order to make it visible to other sessions as soon as it is inserted (think of several sessions inserting concurrently). And commit makes it also durable – available even after an instance crashes – and that supposes that the change is written to disk. What’s the cost for that?
Of course, I cannot do bulk insert for that as I want to commit for each row. I take the sane test as in the previous post. A table with a primary key (so only one index):
create table DEMO ("id" number , "text" varchar2(15), "number" number) ;
And the same PL/SQL except that I’ve added a COMMIT in the loop:
declare type people_array is varray(12) of varchar(15); people people_array := people_array( 'Marc', 'Bill', 'George', 'Eliot', 'Matt', 'Trey', 'Tracy','Greg', 'Steve', 'Kristina', 'Katie', 'Jeff' ); people_count number :=people.COUNT; n number; begin for i in 0..1e6 loop n:=trunc(dbms_random.value(0,10000)); insert into DEMO values( i , people( dbms_random.value(1,people_count) ) , n ); commit; end loop; commit; end; /
My test environment is still the Oracle Developer Day 11g VM with 1 CPU allocated from by VirtualBox.
simple commit in PL/SQL
Yes, this is a special case. I’m running that in a PL/SQL statement and PL/SQL has a special optimization here because the ‘commit successful’ message is given to the user only at the end. Then the intermediate commits are lighter than commits that return to client. However, look at that 1 million insert + commit execution:
PL/SQL procedure successfully completed. Elapsed: 00:02:16.34
We are far from the 45 seconds of row-by-row inserts that were committed only at the end. Commit has a big overhead.
commit write wait immediate
If you run the inserts from a client, doing a user call for each row, the special optimization is not there. In that case the commit waits for log writer and returns only when log writer has acknowledged the write to disk (and to SYNC standby database if you are in DataGuard). We can do the same from PL/SQL if we issue a COMMIT WRITE WAIT IMMEDIATE instead of a simple commit. So that’s a good way to compare the time without introducing additional roundtrip latency.
for i in 0..1e6 loop n:=trunc(dbms_random.value(0,10000)); insert into DEMO values( i , people( dbms_random.value(1,people_count) ) , n ); commit write wait immediate; end loop;
When executing my procedure with the loop above, here is the time it takes:
PL/SQL procedure successfully completed. Elapsed: 00:05:06.57
Time has doubled. This is the most expensive insert you can do. But it’s also the only way if you want to:
- Give commit feedback to the user at each commit
- Ensure that the commit is done as quick as possible (this is the IMMEDIATE write) favoring response time over throughput.
- Ensure that commit is durable (the D in ACID) meaning that we wait that the changes are on persistent storage (this is the WAIT write)
As we can see here, those requirements have a cost.
commit write wait batch
We can favor the throughput with the BATCH commit logging instead of IMMEDIATE.
for i in 0..1e6 loop n:=trunc(dbms_random.value(0,10000)); insert into DEMO values( i , people( dbms_random.value(1,people_count) ) , n ); commit write wait batch; end loop;
The optimization is not very high here where I’ve only one session doing all those inserts:
PL/SQL procedure successfully completed. Elapsed: 00:04:28.97
but the redo size is reduced. I plan to show more statistics and wait events about it in a future post. The time spend here in the WAIT commits is mostly in that orange ‘log file sync’ waits we don’t like to see on enterprise manager screen. The only point where session process is waiting for work that is expected to be done asynchronously in background. At commit, the session waits for the redo to be persisted.
commit write nowait immediate
If you are ok to loose a transaction even when the recent commit was successful, then you don’t have to wait. That seems to be heretic for ACID aficionados, but is totally acceptable for supporters of ‘eventual consistency’. I prefer to think about it according to business requirements.
When something – external to the database – occurs when the commit is successful, then you must trust in ACID and you have to wait. When your customer withdraw money at the ATM then once they have their money you must be sure that the operation is commited in the database. A crash of the database should not give back the money on his account because he will not give back the bills.
However, when you insert events coming from sensors (you can think Big Data) you probably want to keep them for a while in the database but you can accept to loose a few of them in the rare case of server crash. Especially if that improves a lot the performance and the scalability of the process. Maybe you can accept to loose some transactions. Or you are able to load them back again after a crash.
Think of it like the NOLOGGING operations. It’s not what you want by default, but if you are sure about the business requirements, and the recover procedure, then you have new was to improve performance and scalability.
for i in 0..1e6 loop n:=trunc(dbms_random.value(0,10000)); insert into DEMO values( i , people( dbms_random.value(1,people_count) ) , n ); commit write nowait immediate; end loop;
And the improvement is obvious:
PL/SQL procedure successfully completed. Elapsed: 00:02:08.75
When you don’t wait, obviously, you improve the response time and the scalability.
commit write nowait batch
And we can batch the redo record, as we did above, in addition to nowait:
for i in 0..1e6 loop n:=trunc(dbms_random.value(0,10000)); insert into DEMO values( i , people( dbms_random.value(1,people_count) ) , n ); commit write nowait batch; end loop;
And the improvement is obvious:
PL/SQL procedure successfully completed. Elapsed: 00:01:45.33
Let the log writer do its job in batch, without waiting for its acknowledgement, and the row-by-row commit overhead is minimized. And it’s ok as long as you know what you are doing:
- You can manage the loss of your transaction in case of server crash. Either by re-executing them or because the missing changes have no consequence.
- You do a WAIT commit when you return to the user a commit status, because he can do something you don’t manage and which depend on that commit status.
statement level, session level, instance level
I show only the statement level way to manage commit wait and commit logging because it’s a decision that must be done by developers. You can choose the same behavior with the commit_wait and commit_logging session parameters but that’s probably not the right option. As I said before, the commit behavior probably depends on where it is in the code. The final commit before returning to the user is probably managed differently.
For the same reason, you probably don’t change those parameters at PDB or instance level. But there are always exceptions. If you are doing an application migration that is using the row-by-row API to insert data into the new system, then maybe you can do all that in NOWAIT BATCH by setting instance parameters. If it fails, you restart from the beginning (flashback database helps here).
This kind of optimization exists also in SQL Server. David Barbarin has blogged about MSSQL delayed durability previously. Want to meet in one day our experts from all technologies? Come to our Event In-Memory: boost your IT performance! where we talk about SQL Server, Oracle and SAP HANA.
And as we are talking about cross technology, stay tuned to see 1M row insert variations on other database systems…