Infrastructure at your Service

Franck Pachot

Oracle non-linguistic varchar2 columns to order by without sorting

By Franck Pachot

.
Sorting data is an expensive operation and many queries declare an ORDER BY. To avoid the sort operation you can build an index as it maintains a sorted structure. This helps with Top-N queries as you don’t have to read all rows but only those from a range of index entries. However, indexes are sorted by binary values. For NUMBER or DATE datatypes, the internal storage ensures that the order is preserved in the binary format. For character strings, the binary format is ASCII, which follows the English alphabet. That’s fine when your session language, NLS_LANGUAGE, defines an NLS_SORT that follows this BINARY order. But as soon as you set a language that has some specific alphabetical order, having an index on a VARCHAR2 or CHAR column does not help to avoid a SORT operation. However, in Oracle 12.2 we can define the sort order at column level with the SQL Standard COLLATE. One use case is for alpha-numeric columns that have nothing to do with any language. Like some natural keys combining letters and numbers. The user expects them to be listed in alphabetical order but, storing only 7-bits ASCII characters, you don’t care about linguistic collation.

I am running this on the Oracle 20c preview in the Oracle Cloud.

VARCHAR2

It can happen that a primary key is not a NUMBER but a CHAR or VARCHAR2, like this:


SQL> create table demo (ID constraint demp_pk primary key) as
  2  select cast(dbms_random.string('U',1)||to_char(rownum,'FM0999') as varchar2(5)) ID
  3  from xmltable('1 to 10');

Table created.

SQL> select * from demo order by ID;

      ID
________
K0003
K0009
L0007
L0010
M0008
O0002
S0001
W0005
Y0006
Z0004

10 rows selected.

I query with ORDER BY because sorting can make sense on a natural key.

Index

I have an index on this column, which is sorted, and then the execution plan is optimized:


SQL> select * from dbms_xplan.display_cursor(format=>'basic');

                      PLAN_TABLE_OUTPUT
_______________________________________
EXPLAINED SQL STATEMENT:
------------------------
select * from demo order by ID

Plan hash value: 1955576728

------------------------------------
| Id  | Operation        | Name    |
------------------------------------
|   0 | SELECT STATEMENT |         |
|   1 |  INDEX FULL SCAN | DEMP_PK |
------------------------------------

13 rows selected.

There’s no SORT operation because the INDEX FULL SCAN follows the index entries in order.

NLS_LANGUAGE

However, there are many countries where we don’t speak English:


SQL> alter session set nls_language='French';

Session altered.

In French, like in many languages, we have accentuated characters and other specificities so that the language-alphabetical order does not always follow the ASCII order.

I’m running exactly the same query:


SQL> select * from demo order by ID;

      ID
________
K0003
K0009
L0007
L0010
M0008
O0002
S0001
W0005
Y0006
Z0004

10 rows selected.

SQL> select * from dbms_xplan.display_cursor(format=>'basic');

                      PLAN_TABLE_OUTPUT
_______________________________________
EXPLAINED SQL STATEMENT:
------------------------
select * from demo order by ID

Plan hash value: 2698718808

------------------------------------
| Id  | Operation        | Name    |
------------------------------------
|   0 | SELECT STATEMENT |         |
|   1 |  SORT ORDER BY   |         |
|   2 |   INDEX FULL SCAN| DEMP_PK |
------------------------------------

14 rows selected.

This time, there’s a SORT operation. even if I’m still reading with INDEX FULL SCAN.

NLS_SORT

The reason is that, by setting the ‘French’ language, I’ve also set the French sort collating sequence.


SQL> select * from nls_session_parameters;
                 PARAMETER                           VALUE
__________________________ _______________________________
NLS_LANGUAGE               FRENCH
NLS_SORT                   FRENCH

And this is different from the BINARY one that I had when my language was ‘American’.

Actually, only a few languages follow the BINARY order of the ASCII table:


SQL>
  declare
   val varchar2(64);
  begin
    for i in (select VALUE from V$NLS_VALID_VALUES where PARAMETER='LANGUAGE') loop
    execute immediate 'alter session set nls_language='''||i.value||'''';
    select value into val from NLS_SESSION_PARAMETERS where PARAMETER='NLS_SORT';
    if val='BINARY' then dbms_output.put(i.value||' '); end if;
    end loop;
    dbms_output.put_line('');
  end;
/

AMERICAN JAPANESE KOREAN SIMPLIFIED CHINESE TRADITIONAL CHINESE ENGLISH HINDI TAMIL KANNADA TELUGU ORIYA MALAYALAM ASSAMESE GUJARATI MARATHI PUNJABI BANGLA MACEDONIAN LATIN SERBIAN IRISH

PL/SQL procedure successfully completed.

This is ok for real text but not for my primary key where ASCII order is ok. I can set the NLS_SORT=BINARY for my session, but that’s too wide as my problem is only with a column.

Or I can create an index for the French collation. Actually, this is what is used internally:


SQL> explain plan for select * from demo order by ID;
Explained.

SQL> select * from dbms_xplan.display(format=>'basic +projection');
                                                      PLAN_TABLE_OUTPUT
_______________________________________________________________________
Plan hash value: 2698718808

------------------------------------
| Id  | Operation        | Name    |
------------------------------------
|   0 | SELECT STATEMENT |         |
|   1 |  SORT ORDER BY   |         |
|   2 |   INDEX FULL SCAN| DEMP_PK |
------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=1) NLSSORT("DEMO"."ID",'nls_sort=''GENERIC_M''')[50],
       "DEMO"."ID"[VARCHAR2,5]
   2 - "DEMO"."ID"[VARCHAR2,5]

GENERIC_M is the sort collation for many European languages.

But that again, does not fit the scope of my problem as I don’t want to create an index for any possible NLS_SORT setting.

COLLATE

The good solution is to define the collation for my table column: this ID is a character string, but it is an ASCII character string which has nothing to do with my language. In 18c I can do that:


SQL> alter table demo modify ID collate binary;

Table altered.

The COLLATE is a SQL Standard syntax that exists in other databases, and it came to Oracle in 12cR2.

And that’s all:


SQL> explain plan for select * from demo order by ID;

Explained.

SQL> select * from dbms_xplan.display(format=>'basic +projection');

                                             PLAN_TABLE_OUTPUT
______________________________________________________________
Plan hash value: 1955576728

------------------------------------
| Id  | Operation        | Name    |
------------------------------------
|   0 | SELECT STATEMENT |         |
|   1 |  INDEX FULL SCAN | DEMP_PK |
------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "DEMO"."ID"[VARCHAR2,5]

No SORT operation needed, whatever the language I set for my session.

Here is the DDL for my table:


SQL> ddl demo

  CREATE TABLE "SYS"."DEMO"
   (    "ID" VARCHAR2(5) COLLATE "BINARY",
         CONSTRAINT "DEMP_PK" PRIMARY KEY ("ID")
  USING INDEX  ENABLE
   )  DEFAULT COLLATION "USING_NLS_COMP" ;

My column explicitly follows the BINARY collation.

Extended Data Types

Now, all seems easy, but there’s a prerequisite:


SQL> show parameter max_string_size

NAME            TYPE   VALUE
--------------- ------ --------
max_string_size string EXTENDED

I have set my PDB to EXTENDED string size.

If I try the same in a PDB with the ‘old’ limit of 4000 bytes:


SQL> alter session set container=PDB1;

Session altered.

SQL> show parameter max_string_size

NAME            TYPE   VALUE
--------------- ------ --------
max_string_size string STANDARD

SQL> drop table demo;

Table dropped.

SQL> create table demo (ID varchar2(5) collate binary constraint demp_pk primary key);

create table demo (ID varchar2(5) collate binary constraint demp_pk primary key)
 *
ERROR at line 1:
ORA-43929: Collation cannot be specified if parameter MAX_STRING_SIZE=STANDARD is set.

This new feature is allowed only with the Extended Data Types introduced in 12c release 2.

ORDER BY COLLATE

Ok, let’s create the table with the default collation:


SQL> create table demo (ID constraint demp_pk primary key) as
  2  select cast(dbms_random.string('U',1)||to_char(rownum,'FM0999') as varchar2(5)) ID
  3  from xmltable('1 to 10');

Table created.

SQL> select * from dbms_xplan.display_cursor(format=>'basic +projection');

                                                   PLAN_TABLE_OUTPUT
____________________________________________________________________
EXPLAINED SQL STATEMENT:
------------------------
select * from demo order by ID

Plan hash value: 2698718808

------------------------------------
| Id  | Operation        | Name    |
------------------------------------
|   0 | SELECT STATEMENT |         |
|   1 |  SORT ORDER BY   |         |
|   2 |   INDEX FULL SCAN| DEMP_PK |
------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=1) NLSSORT("DEMO"."ID",'nls_sort=''FRENCH''')[50],
       "DEMO"."ID"[VARCHAR2,5]
   2 - "DEMO"."ID"[VARCHAR2,5]

As my NLS_SORT is ‘French’ there is a SORT operation.

But I can explicitly request a BINARY sort for this:


SQL> select * from demo order by ID collate binary;

      ID
________
D0003
H0002
L0009
N0008
P0010
Q0005
R0004
W0007
Y0001
Z0006

10 rows selected.

SQL> select * from dbms_xplan.display_cursor(format=>'basic +projection');

                                             PLAN_TABLE_OUTPUT
______________________________________________________________
EXPLAINED SQL STATEMENT:
------------------------
select * from demo order by ID collate binary

Plan hash value: 2698718808

------------------------------------
| Id  | Operation        | Name    |
------------------------------------
|   0 | SELECT STATEMENT |         |
|   1 |  SORT ORDER BY   |         |
|   2 |   INDEX FULL SCAN| DEMP_PK |
------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=1) "DEMO"."ID" COLLATE "BINARY"[5],
       "DEMO"."ID"[VARCHAR2,5]
   2 - "DEMO"."ID"[VARCHAR2,5]

I have no idea why there is still a sort operation. I think that the INDEX FULL SCAN returns already the rows in binary order. And that should require additional sorting for the ORDER BY … COLLATE BINARY.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Franck Pachot
Franck Pachot

Principal Consultant / Database Evangelist