Infrastructure at your Service

Introduction

Reimaging an ODA is a good practice for a lot of reasons. To make your ODA cleaner if it’s running for many years and if you patch regularly. To simplify patching because if you’re late you could have to apply multiple intermediate patches to reach the target version. Or simply because you need to change the configuration (for example network configuration) and you want to make things clean, and be sure that future patches will be OK after the changes.

Understand the reimaging process

Actually, reimaging is divided in two operations. The pure reimaging of the nodes with a dedicated ISO file (barely an OS installation). And then the create appliance part. Reimaging means completely reinstall the ODA from scratch, so you can think that everything will be cleaned up. But reimaging is something slightly different. It only means reinstalling the software on the node. For sure, if you have 2 nodes (HA ODAs) you will have to do the pure reimaging on both nodes. Most of the time pure reimaging is working fine. Just after the reimaging you will need to deploy/create the appliance (for all ODAs using 18.3 or later) from the first node only, and this step can be quite stressful.

Typical problem encountered

This example comes from an ODA X6-2S, but it’s quite the same problem on all the ODAs. You just succesfully reimaged the server, and configured straight the network with odacli-firstnet. As soon as your ODA is in the network, you can copy the GI and DB clones needed to create the appliance. Create the appliance stands for configuring the system for Oracle, installing all the Oracle software, configuring ASM, creating a first database and so on. You’ll find these steps in the official documentation. In this case, a few minutes after running the create-appliance, you discover that the appliance creation failed.

 /opt/oracle/dcs/bin/odacli describe-job -i 3f93ad2d-7f0f-4f25-90cb-d3937b1270a9

Job details
----------------------------------------------------------------
                     ID:  3f93ad2d-7f0f-4f25-90cb-d3937b1270a9
            Description:  Provisioning service creation
                 Status:  Failure
                Created:  May 24, 2019 5:08:50 PM CEST
                Message:  DCS-10001:Internal error encountered: Fail to run root scripts : .

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Provisioning service creation            May 24, 2019 5:08:50 PM CEST        May 24, 2019 5:21:47 PM CEST        Failure
Provisioning service creation            May 24, 2019 5:08:50 PM CEST        May 24, 2019 5:21:47 PM CEST        Failure
networks updation                        May 24, 2019 5:08:51 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
updating network                         May 24, 2019 5:08:51 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
Setting up Network                       May 24, 2019 5:08:51 PM CEST        May 24, 2019 5:08:51 PM CEST        Success
OS usergroup 'asmdba'creation            May 24, 2019 5:08:59 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
OS usergroup 'asmoper'creation           May 24, 2019 5:08:59 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
OS usergroup 'asmadmin'creation          May 24, 2019 5:08:59 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
OS usergroup 'dba'creation               May 24, 2019 5:08:59 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
OS usergroup 'dbaoper'creation           May 24, 2019 5:08:59 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
OS usergroup 'oinstall'creation          May 24, 2019 5:08:59 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
OS user 'grid'creation                   May 24, 2019 5:08:59 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
OS user 'oracle'creation                 May 24, 2019 5:08:59 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
SSH equivalance setup                    May 24, 2019 5:08:59 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
Grid home creation                       May 24, 2019 5:09:06 PM CEST        May 24, 2019 5:11:57 PM CEST        Success
Creating GI home directories             May 24, 2019 5:09:06 PM CEST        May 24, 2019 5:09:06 PM CEST        Success
Cloning Gi home                          May 24, 2019 5:09:06 PM CEST        May 24, 2019 5:11:56 PM CEST        Success
Updating GiHome version                  May 24, 2019 5:11:56 PM CEST        May 24, 2019 5:11:57 PM CEST        Success
Storage discovery                        May 24, 2019 5:11:57 PM CEST        May 24, 2019 5:16:33 PM CEST        Success
Grid stack creation                      May 24, 2019 5:16:33 PM CEST        May 24, 2019 5:21:47 PM CEST        Failure
Configuring GI                           May 24, 2019 5:16:33 PM CEST        May 24, 2019 5:18:13 PM CEST        Success
Running GI root scripts                  May 24, 2019 5:18:13 PM CEST        May 24, 2019 5:21:47 PM CEST        Failure

You were thinking that reimaging was a good idea, and now you’re affraid that maybe not… But don’t panic, it’s a normal behavior.

As this error is related to GI stack configuration, you need to look into the GI logs, as you would do on another platform.

vi /u01/app/18.0.0.0/grid/install/root_odadbi03_2019-05-24_17-18-13-479398837.log

...
2019/05/24 17:20:47 CLSRSC-594: Executing installation step 17 of 20: 'InitConfig'.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'odadbi'
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'odadbi' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.evmd' on 'odadbi'
CRS-2672: Attempting to start 'ora.mdnsd' on 'odadbi'
CRS-2676: Start of 'ora.mdnsd' on 'odadbi' succeeded
CRS-2676: Start of 'ora.evmd' on 'odadbi' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'odadbi'
CRS-2676: Start of 'ora.gpnpd' on 'odadbi' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'odadbi'
CRS-2672: Attempting to start 'ora.gipcd' on 'odadbi'
CRS-2676: Start of 'ora.cssdmonitor' on 'odadbi' succeeded
CRS-2676: Start of 'ora.gipcd' on 'odadbi' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'odadbi'
CRS-2672: Attempting to start 'ora.diskmon' on 'odadbi'
CRS-2676: Start of 'ora.diskmon' on 'odadbi' succeeded
CRS-2676: Start of 'ora.cssd' on 'odadbi' succeeded
Creating SQL script file /tmp/asminit_sql_2019-05-24-17-21-15.sql
cat: /etc/grub.conf: Permission denied

SQL*Plus: Release 18.0.0.0.0 - Production on Fri May 24 17:21:15 2019
Version 18.3.0.0.0

Copyright (c) 1982, 2018, Oracle.  All rights reserved.

Connected to an idle instance.

ASM instance started

Total System Global Area 1136934472 bytes
Fixed Size                  8666696 bytes
Variable Size            1103101952 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted
ASM diskgroups volume enabled
create diskgroup DATA NORMAL REDUNDANCY
*
ERROR at line 1:
ORA-15018: diskgroup cannot be created
ORA-15030: diskgroup name "DATA" is in use by another diskgroup
...

Problem is quite obvious, diskgroup creation failed when creating DATA diskgroup. As a result, your GI configuration is not complete, and there is no way to go further.

Why ASM is in trouble when reimaging?

Actually, reimaging an ODA will erase all the data on the local disks (the 2 disks for the system) but will not erase anything on the disks dedicated to ASM (the data disks), even if you’re using a lite ODA like most of us these days. As a result, data disks are still configured with previous ASM headers and data, leading to the failure. You can avoid this error by thinking of cleaning up the ODA BEFORE the reimaging (please refer to the dedicated procedure for your version/your ODA). In our case, we totally missed to clean up before, so after the first deployment failure, you’ll find a script for purging ASM tagging and data on disks :

/opt/oracle/oak/onecmd/cleanup.pl
INFO: *******************************************************************
INFO: ** Starting process to cleanup provisioned host odadbi03           **
INFO: *******************************************************************
INFO: Default mode being used to cleanup a provisioned system.
INFO: It will change all ASM disk status from MEMBER to FORMER
Do you want to continue (yes/no) : yes
INFO:
Running cleanup will delete Grid User - 'grid' and
INFO: DB user - 'oracle' and also the
INFO: groups 'oinstall,dba,asmadmin,asmoper,asmdba'
INFO: nodes will be rebooted
Do you want to continue (yes/no) : yes
…

After this cleanup, the server will reboot, and you will be able to retry the odacli create-appliance:

/opt/oracle/dcs/bin/odacli describe-job -i "3571b291-be91-4cd4-a133-b52ead24ff61"

Job details
----------------------------------------------------------------
                     ID:  3571b291-be91-4cd4-a133-b52ead24ff61
            Description:  Provisioning service creation
                 Status:  Success
                Created:  May 24, 2019 5:53:44 PM CEST
                Message:

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
networks updation                        May 24, 2019 5:53:45 PM CEST        May 24, 2019 5:53:52 PM CEST        Success
updating network                         May 24, 2019 5:53:45 PM CEST        May 24, 2019 5:53:52 PM CEST        Success
Setting up Network                       May 24, 2019 5:53:45 PM CEST        May 24, 2019 5:53:45 PM CEST        Success
OS usergroup 'asmdba'creation            May 24, 2019 5:53:52 PM CEST        May 24, 2019 5:53:52 PM CEST        Success
OS usergroup 'asmoper'creation           May 24, 2019 5:53:52 PM CEST        May 24, 2019 5:53:52 PM CEST        Success
OS usergroup 'asmadmin'creation          May 24, 2019 5:53:52 PM CEST        May 24, 2019 5:53:52 PM CEST        Success
OS usergroup 'dba'creation               May 24, 2019 5:53:52 PM CEST        May 24, 2019 5:53:52 PM CEST        Success
OS usergroup 'dbaoper'creation           May 24, 2019 5:53:52 PM CEST        May 24, 2019 5:53:52 PM CEST        Success
OS usergroup 'oinstall'creation          May 24, 2019 5:53:52 PM CEST        May 24, 2019 5:53:52 PM CEST        Success
OS user 'grid'creation                   May 24, 2019 5:53:52 PM CEST        May 24, 2019 5:53:52 PM CEST        Success
OS user 'oracle'creation                 May 24, 2019 5:53:52 PM CEST        May 24, 2019 5:53:52 PM CEST        Success
SSH equivalance setup                    May 24, 2019 5:53:53 PM CEST        May 24, 2019 5:53:53 PM CEST        Success
Grid home creation                       May 24, 2019 5:54:00 PM CEST        May 24, 2019 5:57:08 PM CEST        Success
Creating GI home directories             May 24, 2019 5:54:00 PM CEST        May 24, 2019 5:54:00 PM CEST        Success
Cloning Gi home                          May 24, 2019 5:54:00 PM CEST        May 24, 2019 5:57:06 PM CEST        Success
Updating GiHome version                  May 24, 2019 5:57:06 PM CEST        May 24, 2019 5:57:08 PM CEST        Success
Storage discovery                        May 24, 2019 5:57:08 PM CEST        May 24, 2019 6:01:46 PM CEST        Success
Grid stack creation                      May 24, 2019 6:01:46 PM CEST        May 24, 2019 6:18:04 PM CEST        Success
Configuring GI                           May 24, 2019 6:01:46 PM CEST        May 24, 2019 6:03:27 PM CEST        Success
Running GI root scripts                  May 24, 2019 6:03:27 PM CEST        May 24, 2019 6:12:52 PM CEST        Success
Running GI config assistants             May 24, 2019 6:12:53 PM CEST        May 24, 2019 6:14:45 PM CEST        Success
Setting AUDIT SYSLOG LEVEL               May 24, 2019 6:14:53 PM CEST        May 24, 2019 6:14:53 PM CEST        Success
Post cluster OAKD configuration          May 24, 2019 6:18:04 PM CEST        May 24, 2019 6:21:17 PM CEST        Success
Disk group 'RECO'creation                May 24, 2019 6:21:25 PM CEST        May 24, 2019 6:21:35 PM CEST        Success
Volume 'datDBTEST'creation               May 24, 2019 6:21:35 PM CEST        May 24, 2019 6:22:06 PM CEST        Success
Volume 'reco'creation                    May 24, 2019 6:22:06 PM CEST        May 24, 2019 6:22:25 PM CEST        Success
Volume 'commonstore'creation             May 24, 2019 6:22:25 PM CEST        May 24, 2019 6:22:44 PM CEST        Success
ACFS File system 'DATA'creation          May 24, 2019 6:22:44 PM CEST        May 24, 2019 6:22:59 PM CEST        Success
ACFS File system 'RECO'creation          May 24, 2019 6:22:59 PM CEST        May 24, 2019 6:23:15 PM CEST        Success
ACFS File system 'DATA'creation          May 24, 2019 6:23:15 PM CEST        May 24, 2019 6:23:30 PM CEST        Success
Database home creation                   May 24, 2019 6:23:30 PM CEST        May 24, 2019 6:27:02 PM CEST        Success
Validating dbHome available space        May 24, 2019 6:23:30 PM CEST        May 24, 2019 6:23:30 PM CEST        Success
Creating DbHome Directory                May 24, 2019 6:23:30 PM CEST        May 24, 2019 6:23:30 PM CEST        Success
Extract DB clones                        May 24, 2019 6:23:30 PM CEST        May 24, 2019 6:25:08 PM CEST        Success
Clone Db home                            May 24, 2019 6:25:08 PM CEST        May 24, 2019 6:26:47 PM CEST        Success
Enable DB options                        May 24, 2019 6:26:47 PM CEST        May 24, 2019 6:26:56 PM CEST        Success
Run Root DB scripts                      May 24, 2019 6:26:56 PM CEST        May 24, 2019 6:26:56 PM CEST        Success
Provisioning service creation            May 24, 2019 6:27:02 PM CEST        May 24, 2019 6:35:18 PM CEST        Success
Database Creation                        May 24, 2019 6:27:02 PM CEST        May 24, 2019 6:33:23 PM CEST        Success
Change permission for xdb wallet files   May 24, 2019 6:33:23 PM CEST        May 24, 2019 6:33:23 PM CEST        Success
Place SnapshotCtrlFile in sharedLoc      May 24, 2019 6:33:23 PM CEST        May 24, 2019 6:33:25 PM CEST        Success
SqlPatch upgrade                         May 24, 2019 6:34:46 PM CEST        May 24, 2019 6:35:16 PM CEST        Success
updating the Database version            May 24, 2019 6:35:16 PM CEST        May 24, 2019 6:35:18 PM CEST        Success
users tablespace creation                May 24, 2019 6:35:18 PM CEST        May 24, 2019 6:35:20 PM CEST        Success
Install TFA                              May 24, 2019 6:35:20 PM CEST        May 24, 2019 6:39:53 PM CEST        Success

Everything is OK now.

Conclusion

ODA reimaging does not include data disk formating, you are now aware of that. The other thing reimaging is not doing is to patch the bios, firmwares and all the microcodes in your ODA. So just after the create-appliance, don’t forget to apply the patch even you’re already in the target version.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Jérôme Dubar
Jérôme Dubar

Consultant