By Franck Pachot

.
In europe we have accents and non US7ASCII characters. We need special characterset. I’m not talking about Unicode here that solves all the problems. If you have a Java application, you have no problem: it’s Unicode. You can store all characters in one multi-byte characterset. But for other applications, on Windows, you have 2 possible charactersets for Western Europe WE8MSWIN1252 and WE8PC850. WE8MSWIN1252 is the one that is set by default in the registry, but is it the right one?

Windows graphical applications use WE8MSWIN1252, It’s the Western European (WE) 8 bit (8) Microsoft Windows (MSWIN) codepage number 1252. Most of your applications are graphical, and this is what is set in the registry NLS_LANG entry, so no problem. Accents are well displayed. Euro symbol (€) is well displayed, etc.

But when you run sqlplus, you have strange characters:

SQL*Plus: Release 11.2.0.4.0 Production on Mar. Janv. 20 13:10:57 2015

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

ConnectÚ Ó :
Oracle Database 11g Release 11.2.0.4.0 - 64bit Production

… instead of: ‘Connecté à‘ for example in French.

What’s the reason? sqlplus.exe is a DOS command line (sqlplusw.exe doesn’t exist in current versions). And DOS command line uses a different codepage:

>help chcp
Displays or sets the active code page number.

CHCP [nnn]

  nnn   Specifies a code page number.

Type CHCP without a parameter to display the active code page number.

>chcp
Active code page: 850


Oracle supports the DOS codepage 850 with WE8PC850 and you can set it with:

set NLS_LANG=FRENCH_FRANCE.WE8PC850

But do you want to do that? It depends. If you only use sqlplus interactively, input from your keyboard and output to your screen. Then WE8PC850 is the right one.

But if you spool to a file that you will open with a Windows application (notepad.exe for example) then you may prefer to use WE8MSWIN1252. It’s the same if your input is a script file written with a Windows editor.

Let’s have a look:

From the DOS command line, I set MSWIN1252 (or just unset NLS_LANG as it is my default from registry):

Capturenls001.PNG

Bad characters.

Where the ‘ConnectÚ‘ comes from? It’s the conversion from the Windows codepage 1252 to the DOS codepage 850 that occured because sqlplus is a DOS program but NLS_LANG is set to Windows.

The database returns the ascii 233 because that’s how is stored the ‘é’ character in the 1252 codepage, and the database returns in what is specified in NLS_LANG. But that ascii 233 is displayed as ‘Ú’ in the DOS console because that’s it’s definition in codepage 850.

You can check that from the codepage definitions:

But is it bad to do that conversion? Maybe not.

I’ve spooled the result to a file. Now I open the file with notepad.exe (a Windows program):

Capturenls002.PNG

That’s correct.

If you run sqlplus (or rman) to spool to a log file that you want to open with a Windows editor, then it’s right to set NLS_LANG to WE8MSWIN1252 and do the conversion.

But if you want the right output on your screen, then it’s better to match the DOS codepage WE8PC850:

Capturenls003.PNG

That’s correct. Except when you read the log file on Windows:

Capturenls004.PNG

The file is ok. It is written in codepage 850 where the ‘é’ is stored as the binary 130. But 130 is displayed as the curved quote ‘‚’ in codepage 1252

This is only a simple exemple where it concern only the output. But if you have a script that insert data into the database, you should match the NLS_LANG to the codepage of the script file, or you will have bad conversion…

If you don’t want to go to the Wikipedia codepage table, you can use the CONVERT function. And in order to be able to display any character, the best way is to interact with a Unicode application. You can use SQLDeveloper but I like to use it’s command line ‘sdsql’ which is still in beta:

sdsql: Release 4.1.0 Beta on Tue Jan 20 15:07:26 2015

Copyright (c) 1982, 2015, Oracle. All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 – 64bit Production

Then I want to check how the ‘connecté à’ string is converted from WE8MSWIN1252 to WE8PC850. Of course, in order to display the result properly to my Unicode client, I have to convert the final result to AL32UTF8. I also select the dump in order to show that ‘é’ is the ascii 233:

SQL> select dump("win 1252"),convert("win 1252",'AL32UTF8','WE8PC850') "WE8MSWIN1252 to WE8PC850"
 5 from ( select convert('connecté à','WE8MSWIN1252') "win 1252" from dual)
 6 ;
DUMP("WIN1252")                                    WE8MSWIN1252 to WE8PC850
-------------------------------------------------- ------------------------
Typ=1 Len=10: 99,111,110,110,101,99,116,233,32,224 connectÚ Ó

 

This how the ‘é’ is ascii 233 in the WIN1252 codepage, but is displayed as ‘Ú’ when converted to PC850

And to show the opposite:

SQL> select dump("pc 850"),convert("pc 850",'AL32UTF8','WE8MSWIN1252') "WE8MSWIN1252 to WE8PC850"
 2 from ( select convert('connecté à','WE8PC850') "pc 850" from dual)
 3 ;
DUMP("PC850")                                      WE8MSWIN1252 to WE8PC850
-------------------------------------------------- ------------------------
Typ=1 Len=10: 99,111,110,110,101,99,116,130,32,133 connect‚ …

Here the ‘é’ is ascii 130 in the PC850 codepage, but is displayed as ‘‚’ when converted to WIN1252.

NLS Charactersets can become a nightmare when not correctly set, so it’s better to understand what happens as soon as we see exotic characters.