Infrastructure at your Service

Franck Pachot

Oracle tuning silver bullet: add an order by to make your query faster

By Franck Pachot

.
You have read all Cary Millsap work about Oracle database performance tuning. You know that there are no silver bullets. Reducing the response time requires a methodical approach in order to analyze the response time with the goal of eliminating all unnecessary work.
But I’ll show something completly opposite here. A performance tuning silver bullet. Do more work in order to run it faster: just add an ORDER BY to your query and its faster.
I’ve made a very simple test case in order to let you test it by yourself.
I set my sqlplus environement and connect remotely to ma database:

SQL> set pagesize 1000 linesize 100 echo on timing on
SQL> connect xxx/[email protected]//xxx/xxx
Connected.

I create a DEMO table which is very simple: 10000 rows of small column.

SQL> create table DEMO pctfree 0 as select mod(rownum,2) x from dual connect by 10000>=level;
Table created.
Elapsed: 00:00:00.03

And a run a very simple query which does a full scan of that small table and pads the output to make it a significant size:

SQL> set autotrace trace
SQL> select lpad(x,2000,x) from DEMO;
10000 rows selected.

Elapsed: 00:00:21.52

It took 22 seconds to execute that. I’ve run it with ‘autotrace trace’ so that I’ve not the overhead of displaying all rows from sqlplus. Rows are fetched but not displayed. I measure only the time to get the result. Let’s run it again to be sure there is no cache effect:

SQL> select lpad(x,2000,x) from DEMO;
10000 rows selected.

Elapsed: 00:00:25.61

It is still taking more than 20 seconds to retreive the 10000 rows.
As I am with autotrace, I get the execution plan and basic statistics:

Execution Plan
----------------------------------------------------------
Plan hash value: 4000794843

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10000 |   126K|     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| DEMO | 10000 |   126K|     4   (0)| 00:00:01 |
--------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        683  consistent gets
          0  physical reads
          0  redo size
   20163693  bytes sent via SQL*Net to client
       7818  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

You can’t do more simple: a full table scan retreiving all the rows.

order by

And now, let’s apply my silver bullet: add an ORDER BY at the end and run it again:

SQL> a  order by x
  1* select lpad(x,2000,x) from DEMO order by x
SQL> /
10000 rows selected.

Elapsed: 00:00:07.06

Here it is. The response time is now less than 10 seconds. 2x faster. If you don’t believe it, just test it.
And thanks to autotrace I can check that I’m doing the same full table scan, but with an additional sort:

Execution Plan
----------------------------------------------------------
Plan hash value: 903288357

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   126K|     5  (20)| 00:00:01 |
|   1 |  SORT ORDER BY     |      | 10000 |   126K|     5  (20)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DEMO | 10000 |   126K|     4   (0)| 00:00:01 |
---------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
         35  consistent gets
          0  physical reads
          0  redo size
     118324  bytes sent via SQL*Net to client
       7818  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
      10000  rows processed

The plan clearly show that I’m doing additional work: Oracle has to sort the rows before sending them. And the statistics show that I’m still returning 10000 rows and have 1 sort done.
Adding an order by made this query faster…

Ok, at that point you probably realize that this post is clearly the opposite of the tuning method I usually recommend, and you have checked today’s date. Please don’t hesitate to comment if you have an idea about the reasons (plural because there are at least two of them) why the second query is much faster. If you attended my tuning workshop you should already know the main reason. If you are a long time reader of Jonathan Lewis blog then you should also get the second one.

Update 2nd April 2015

All good answers are in the comments. Of course something was wrong in the introduction. We didn’t introduce additional work with the SORT operation, but this is minimal. And there were unnecessary work before (the roundtrips) that had a bigger overhead. Besides the April Fool joke, the goal of this post was to show that it’s not only important to make a query faster. You must know why you made it faster, or maybe you just hide a problem that will come back again. And that’s not too difficult if you use the right tools. Here, a simple autotrace showing only 11 session statistics had all necessary information.

11 Comments

  • Jim Dickson (dev DBA) says:

    almost had me 🙂
    One reason : Fewer bytes transferred because of compression at SQL*Net level? think Tom and Jonathan have blogged about this.
    Not sure why fewer logical IOs.

    Excellent site.

  • Hi Jim,
    You identified quickly the point that is probably the less known. And the symptom of the second point (LIO). Congratulations. I’m sure root cause of second point will come very quickly.
    Regards,
    Franck.

  • Martin Preiss says:

    for the sake of completeness the Link to Jonathan’s article: https://jonathanlewis.wordpress.com/2010/05/07/sqlnet-compression/

  • Thanks Martin for the link.
    This is why I selected lpad(x,2000,x) in order to have some volume. But there is still the fact that the number of LIO is higher without the order by. You can compare:
    select rownum from DEMO;
    with:
    select rownum from DEMO order by 1;

  • Martin Preiss says:

    I think your arrysize is 15 (10000 rows, 668 roundtrips; not a surprise in sqlplus). For the query without the order by in each fetch the buffer that has been used in the last fetch has to be reaccessed – resulting in a CR. For the ordered query the complete table data has been read in a single fetch and passed to the PGA in which the sort operation took place – then the sorted result was send to the client (and that’s not a CR).

  • Jim Dickson says:

    Well noone else is biting, so here goes …
    Small sqlplus arraysize + lots of rows per block => each fetch can revisit block from previous fetch.
    Not 100% sure why sorting data first minimises effect of small arraysize, though. Might guess at blocking pinning or result set caching but suspect will just embarrass myself on April Fool’s.

  • Hi Martin and Jim, your’re right. Default fetch size is too small on most clients and because buffer cannot be pinned across user calls it has to revisit with the cost of logical read. I’ve created the table with very short rows to amplify the effect: lot of sets of 15 rows can fit in one block.
    Sorting buffers that in PGA – which is not shared and doesn’t need all that logical read overhead we have on a shared structure.
    As far as I know there is no way to do the same buffering without a SORT, which could help when we cannot change the application fetch size.

  • Neil Johnson says:

    My guess: the order by forces Oracle to ignore arraysize and fetch the entire set in order to sort it. Without the order by the SQL*Plus default arraysize will have an impact.

  • Hi Neil,
    Arraysize is not ignored, It’s still the size of fetch. But it’s fetching from a buffer in PGA rather on shared buffers in SGA – so no additional logical reads.

  • Jim Dickson says:

    I have not worked with Oracle for a while, but this highlights a gap in my knowledge/understanding- can anyone enlighten me?
    If session’s PGA is too small and data is spilled to disk (temp), then will number of [direct] IOs to/from TEMP show up in any autotrace/dbms_xplan metrics [other than sorts (disk) counter] or AWR report?
    Real-Time SQL Monitoring displays both memory and temp usage, so metrics available – but not sure if measured at sqlid/session level/parallel slave level.

  • Hi Jim,
    Those reads will show up in session statistics as ‘physical reads direct temporary tablespace’ which are included into ‘physical reads direct’ which are included into ‘physical reads’ and they will match the related writes (‘physical writes direct temporary tablespace’). And you will see them as DISK_READS in V$SQL. Those are in number of blocks.
    Regards,
    Franck.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Franck Pachot
Franck Pachot

Principal Consultant / Database Evangelist
Oracle ACE Director, Oracle Database OCM 12c
AWS Database Specialty certified, AWS Data Hero
Oak Table member

RSS for this blog: feed
Twitter: @FranckPachot
LinkedIn : www.linkedin.com/in/franckpachot
Podcast en français: DBPod