When PRCD-1027 and PRCD-1229 is spoiling your rainy day

Introduction:

More then one year ago i had set up an Oracle restart environment  with Grid Infra, ASM and Databases all in 11.2.0.2.0 since that was a requirement from vendor at first. Once the server had been handed over to production  I got the request that it should also host EMC  based  Clones and those clones where 11.2.0.3.0. That meant i had to upgrade both Grid infrastructure and the database software and of course the databases as well.

So i geared up , did an upgrade of the GI and the Rdbms software and of course of the local databases in place. After that  the Emc clones had been added and every thing  looked fine .

Until ……….

Error Messages after Server reboot:

Well until the server got rebooted. After that server reboot a first sign that things where wrong was that the databases , did not start via the grid infra structure which was not expected !

So there I was again ready for solving another puzzle and of course people waiting for the DBs to come online so they could work.

## First clue:

I checked the Resource ( the database ) in the cluster with:   crsctl status resource ….  –p

Much to my surprise that showed the wrong oracle home ( it was 11.2.0.2.0 the initial Oracle Home before upgrade). But I was so sure that I had upgraded the database.. What did i miss . Even more strange was that the Cluster agent kept altering my oratab  for the specific database to have the old oracle home ( and  it would almost stick out tongue at me telling  #line has been added by agent ).

## Second clue

When i altered the oratab to show the correct oracle home i could start the database via sqlplus which was indeed my second clue .

After a big face-palm it became clear to me that the cluster was not having correct status in the cluster ware about that Oracle Home ..

## Will srvctl modify do the Job:

srvctl modify database -d mydb -o /opt/oracle/product/11203_ee_64/db

##output:

PRCD-1027 : Failed to retrieve database mydb

PRCD-1229 : An attempt to access configuration of database migoms was rejected because its version 11.2.0.2.0 differs from the program version 11.2.0.3.0. Instead run the program from /opt/oracle/product/11202_ee_64/db

Well that was not expected.  Especially since that other clue was that the db can be started as 1120.3 db when oracle env is put properly in ORATAB.

##Solution:

First I tried :

srvctl modify database -d mydb -o /opt/oracle/product/11203_ee_64/db

but that is wrong as we already saw in this post.

Hmm then i thought of an expression in German. once you start doing it the right way  things will start to work for you :

Plain and simple this is what  I have to do making things right again:

srvctl upgrade database -d  mydb -o /opt/oracle/product/11203_ee_64/db

After that I started mydb via cluster she is happy now.

## Bottom-line ( aka lesson learned ).

If you upgrade your databases in on an oracle Restart /  Rac cluster environment make it part of your upgrade plan to upgrade the information in the clusterlayer of that specific database.

As always,

Happy Reading and till we meet again.

Mathijs

Upgrade to 11.2.0.3 Grid Infra and Rdbms with Psu October 2013

Introduction

It is always a pleasure to be challenged with a new puzzle to solve. This time i was asked to take part in  the following scenario which will be implemented on various boxes:

  • Linux box(es) being upgraded from Red Hat Linux  5.5 to 5.9 .
  • Grid infra structure (GI) to be updated from 11.2.0.2 to 11.2.0.3 ( this is a need to work with Rdbms 11.2.0.3).
  • Database software (Rdbms) 11.2.0.3 Installation.
  • Implement Psu October 2013 on both GI and Rdbms
  • Upgrade the databases ( 24 in total ) to  11.2.0.3.

Environment: Linux Server with RH 5.5, 11.2 Grid Infra installed together with 11.2.0.2 Rdbms  on a single server ( Oracle Restart).

Quite a packed program right ?  Well the Linux part i was merely a spectator i  could only wait and see how the server was bounced  and see that  My Oracle Environment ( both the databases and the listener) came back up after bouncing  so the Restart environment was performing well.  After the third reboot it became time to do my actions. Below you will see a case study which i wanted to start at first cause  I thought that would save time .. In the end I implemented Plan B ( always good to have one available). Well better buckle up and let’s get started with it.

Summary:

Below you will find three scenarios you could follow, Scenario 1 would enable you to do installation in parallel and even a week in advance but  would need extra steps ( relinking software, enable the Grid Infra structure in a number of steps). In the End when setting this up and testing with it i came to the conclusion that it does not save that much time to do the installs before  the Linux upgrade .  Average installation is about  10 – 15 Minutes so i recommend to first have the Linux upgrade in place and after the Databases and listeners come back online when they (Linux) boot the server proceed with the Installations and patching .  The scenario is less complicated and i think even more error prone. So  this will be the scenario 2 and i will follow it on the environments where i am asked to do.

Addendum

Meanwhile i have implemented the plan and upgraded the databases ( 25 in total ). I was not very pleased with the  use of DBUA  (used it in a script with silent option) because  i indeed felt less control on the process. Two major setbacks i witnessed during the upgrade with DBU: 1) in my case it bugged me that it added a local listener to the init.oras while doing the upgrade cause that crashed the upgrade ( at a restart of database with that new generated – altered init.ora the db would of course not restart )  2) the Grid agent kept altering  the Oracle Home of the databases ( so was pointing to wrong env.). Well  Together with a colleague in the end we did save the day . But that was because we failed back to the manual upgrade Method. I have listed the activities and i will add it to the  Scenario : Real Life Implemented plan.

PS . by customers Request and due to the fact that the following Parameter altered its Default behavior ( was FALSE in 11.2.0.3 became TRUE)  i had to make sure the following parameter was set again in spfile:

alter system set “_use_adaptive_log_file_sync”=FALSE scope = both;

Important Add on. In all scenarios as a baseline  i ran utlu112i.sql on all Databases  in scope. And the good news was that all components  installed where  valid! AND i created a list of invalid objects per schema to compare to the situation after the upgrade ( as proof that this dba did not break the application).

As always happy reading,

Mathijs

Scenario 1  Installing software only  as a preparation:

When I started my preparations it seemed like the best thing to install  both the Software parts ( GI and Rdbms) as software only and perform the needed steps after that.  In this case following would have been performed:

  1. Install 11.2.0. 3 Rdbms as “software only”
  2. Install 11.2.0.3 GI as software only
  3. After  The Linux upgrade would have to relink the software (described below)
  4. Would have to perform various steps to activate  the 11.2.0. GI.
  5. Would have to implement  PSU October 2013
  6. Upgrade the Databases.

Scenario 1 After the Linux upgrade would have to relink my software in full again:

Stopping the databases under control in an easy way:
 As prep for the relinking of the software I performed following step:
srvctl status home -o /opt/oracle/product/112_ee_64/db -s /var/tmp/state_file.status
srvctl stop home -o /opt/oracle/product/112_ee_64/db -s /var/tmp/state_file.dmp
In Order to relink the Rdbms software:
After shutting down the databases (see above):
Had the ORACLE_HOME  set properly
$ORACLE_HOME/bin/relink all
Note: writing relink log to: /opt/oracle/product/112_ee_64/db/install/relink.log
In Order to relink the Oracle Restart software:
 Prepare the Oracle Grid Infrastructure for a Standalone Server home for modification using the following procedure:
  1. Log in as the Oracle Grid Infrastructure software owner user and change the directory to the path Grid_home/bin, where Grid_home is the path to the Oracle Grid Infrastructure home:
cd /opt/crs/product/112_ee_64/crs/bin
  1. Shut down the Oracle Restart stack using the following command:
crsctl stop has –f
 This will show:
 CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'MySrvr1hr'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'MySrvr1hr'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'MySrvr1hr' succeeded
CRS-2673: Attempting to stop 'ora.evmd' on 'MySrvr1hr'
CRS-2677: Stop of 'ora.evmd' on 'MySrvr1hr' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'MySrvr1hr' has completed
CRS-4133: Oracle High Availability Services has been stopped.
oracle@MySrvr1hr:/opt/crs/product/112_ee_64/crs/bin [+ASM]#
oracle@MySrvr1hr:/opt/crs/product/112_ee_64/crs/bin [+ASM]#
 Then:
 Relink Oracle Grid Infrastructure for a Standalone Server using the following procedure:
  1. Login as root
  2. Log in as the Oracle Grid Infrastructure for a Standalone Server owner:
  3. Login as root again:
4.     # cd /opt/crs/product/112_ee_64/crs/crs/install
5.     # perl roothas.pl -unlock
7.     $ export ORACLE_HOME=/opt/crs/product/112_ee_64/crs
8.     $ $ORACLE_HOME/bin/relink
This will show: oracle@MySrvr1hr:/opt/oracle [+ASM]# $ORACLE_HOME/bin/relink
writing relink log to: /opt/crs/product/112_ee_64/crs/install/relink.log
10.  cd/opt/crs/product/112_ee_64/crs/rdbms/install/
11.  ./rootadd_rdbms.sh 
Note. Rootadd_rdbms  came back very fast  without any output.
12.  cd /opt/crs/product/112_ee_64/crs/crs/install
13.  perl roothas.pl -patch
Checked with:
  1. export $ORACLE_HOME=/opt/crs/product/112_ee_64/crs/
  2. ./crsctl check has
 This showed:CRS-4638: Oracle High Availability Services is online Starting the databases under control in an easy way: srvctl start home -o /opt/oracle/product/112_ee_64/db -s /var/tmp/state_file.dmp

Scenario 1 After the Linux upgrade would have to do various steps to activate the GI environment:

1)      First of all, please shutdown the current database & ASM instances.

oracle@MySrvr1hr:/opt/oracle/admin/tools [+ASM]# stop.ksh

2) You need to install the new 11.2.0.<#> Patchset Grid Infrastructure Standalone  “Software Only” first (on a separated Oracle Home/directory AKA “out of place”).

3) Then configure the CSS & OHAS services as root user:

/opt/oracle/product/112_ee_64_a/asm/crs/install/roothas.pl -deconfig -force
and
/opt/crs/product/112_ee_64/crs/crs/install/roothas.pl

4) Please perform the next steps as oracle or grid OS user (Grid Infrastructure OS owner):

/opt/crs/product/112_ee_64/crs/bin/crsctl modify resource “ora.cssd” -attr “AUTO_START=1”

/opt/crs/product/112_ee_64/crs/bin/crsctl modify resource “ora.diskmon” -attr “AUTO_START=1”

Note: On release & onwards 11.2.0.3 (non-Exadata), “ora.diskmon” is not required anymore since this is an Exadata related process responsible for the I/O fencing.

5) Restart the OHAS stack as grid or oracle OS user:

/opt/crs/product/112_ee_64/crs/bin/crsctl stop has

/opt/crs/product/112_ee_64/crs/bin/crsctl start has

6) Check the CSS & OHAS state as grid or oracle OS user:

/opt/crs/product/112_ee_64/crs/bin/crsctl check has

/opt/crs/product/112_ee_64/crs/bin/crsctl check css

/opt/crs/product/112_ee_64/crs/bin/ crsctl stat resource

/opt/crs/product/112_ee_64/crs/bin/crsctl stat res –t

Note: If the CSS & OHAS service did NOT start, then you will need to reboot the Linux box and check them again.

7) Remove the Old listener running under the old Grid Infrastructure Oracle Home, thru the NETCA GUI (from the old Grid Infrastructure Oracle Home).
ps -ef|grep inherit

oracle   23430     1  0  2012 ?        1-08:34:36 /opt/oracle/product/112_ee_64_a/db/bin/tnslsnr LISTENER_MYSRVR1HR -inherit

8) Recreate the default listener (LISTENER) using port 1521, thru the NETCA GUI on the new Grid Infrastructure Oracle Home.

9) Please create the init+ASM.ora file on the $ Grid Infrastructure Oracle Home>/dbs directory with the next parameters:

asm_diskgroups= <list of diskgroups>

asm_diskstring= ‘ORCL:*’ or ‘/dev/oracleasm/disks/*’

instance_type=’asm’

large_pool_size=12M

10) Add the ASM instance as grid or oracle user or the Grid Infrastructure installation owner and execute:

/opt/crs/product/112_ee_64/crs bin/srvctl add asm


11) Enable ASM instance Auto Start as follow:

/opt/crs/product/112_ee_64/crs /bin/crsctl modify resource “ora.asm” -attr “AUTO_START=1”


12) Make sure the disks are discovered by kfod:

Example:

/opt/crs/product/112_ee_64/crs /bin/kfod asm_diskstring=’/dev/mapper/asm-*p1′ disks=all

13) If so, then startup the ASM instance as follow:

export ORACLE_SID=+ASM

/opt/crs/product/112_ee_64/crs /bin/sqlplus “/as sysasm”

SQL> startup pfile=init+ASM.ora –#init file from point #9

SQL> show parameter asm

14) Validate that the original diskgroup(s) were mounted:

SQL> select name, state from v$asm_diskgroup;

15 Finally confirm the OHAS (autostart) services start as follows:

/opt/crs/product/112_ee_64/crs /bin/crsctl stop has

/opt/crs/product/112_ee_64/crs //bin/crsctl start has

/opt/crs/product/112_ee_64/crs //bin/crsctl stat res

/opt/crs/product/112_ee_64/crs //bin/crsctl stat res -t

16) Check the new HAS version as well as follows:

/opt/crs/product/112_ee_64/crs /bin/crsctl query has releaseversion
/opt/crs/product/112_ee_64/crs /bin/crsctl query has softwareversion


Scenario 1 And after that would be able to continue with applying the Psu October 2013 on the software:

Copy to GI home & extract /opt/oracle/product/11203_ee_64/stage/p6880880_112000_Linux-x86-64.zip
Create Response file oracle@mysrvr:/opt/oracle/product/11203_ee_64/db/OPatch/ocm/bin [11203]# ocm.rsp
/opt/crs/product/112_ee_64/crs/OPatch/ocm/bin [CRS]# ocm.rsp
As Root export PATH=/opt/crs/product/112_ee_64/crs/OPatch:$PATH
which opatch
opatch auto /opt/oracle/product/11203_ee_64/stage -ocmrf /opt/crs/product/112_ee_64/crs/OPatch/ocm/bin/ocm.rsp -oh /opt/crs/product/112_ee_64/crs, /opt/oracle/product/11203_ee_64/db

Note this command  performed an installation of the psu october both in my GI home and Rdbms home.).

After that Upgrade the databases with DBUA.

Scenario 2  ( Installing new Grid Infra and Rdbms AFTER Linux patching )

As i wrote this is my preferred scenario. It will involve less steps and the runInstaller will enable you to  Upgrade the existing environement. Please be aware that the 11.2.0.3 Installations are a so-called out-of-place installation requiring new Oracle Homes . For that purpose i have requested ( and got ) extra space  +15 Gb in /opt/oracle  and same space in /opt/crs. I followed the steps below bullet by bullet.

  • Create fresh copy init.ora /opt/oracle/admin/tools/cSpfile.ksh. (using scripts to create and save spfile).
  • Perform status and config /opt/oracle/admin/tools/cSrvctlAct.ksh status. (Checking and holding status of resources in logfiles).
  • /opt/oracle/admin/tools/cSrvctlAct.ksh config (Catching details on cluster config in files).
  • Stop the dbs / Listener /opt/oracle/admin/tools/cSrvctlAct.ksh stop (This was done by Grid Infra during GI install
  • Install Gi 1120.3 Option Oracle Grid infra structuren for a standalone server
  • Install Rdbms 1120.3 Software only. ( make sure no parallel installs before )
  • Copy to GI home & extract /opt/oracle/product/11203_ee_64/stage/p6880880_112000_Linux-x86-64.zip ( this is the most up to date opatch ).
  • Create Response file oracle@mysrvr1:/opt/oracle/product/11203_ee_64/db/OPatch/ocm/bin [11203]# ocm.rsp
  • /opt/crs/product/112_ee_64/crs/OPatch/ocm/bin/ocm.rsp (location and name are needed during opatch ).
  • As Root export PATH=/opt/crs/product/112_ee_64/crs/OPatch:$PATH
  • As Root which opatch
  • As Root opatch auto /opt/oracle/product/11203_ee_64/stage -ocmrf /opt/crs/product/112_ee_64/crs/OPatch/ocm/bin/ocm.rsp -oh /opt/crs/product/112_ee_64/crs, /opt/oracle/product/11203_ee_64/db (## will apply to these two homes)
  • As Oracle Perform DBUA per Instance cDbua.kshTwo odd things:
  • 1) created spfile in $ORACLE_HOME/dbs  (was before in ASM).
  • 2) had a local listener defined alll over sudden so needed get rid of it on this box again.
  • As oracle perform /opt/oracle/admin/tools/cCatBundle.ksh
  • As oracle perform /opt/oracle/admin/tools/cSrvctlmod.ksh (done by dbua so no need for it in sep script).
  • As oracle perform /opt/oracle/admin/tools/cSrvctlAct.ksh status (Checking the status in GI).
  • As oracle perform /opt/oracle/admin/tools/cSrvctlAct.ksh config  (Stop and start using the GI).
  • As oracle perform ln -s /opt/networker/lib/libnwora.so libobk.so in new oracle home  (Dont forget your Networker Library in the lib dir in the new oracle home like i did :)).
  • As oracle perform run an archive backup as check
  • As oracle perform  per Db alter system set "_use_adaptive_log_file_sync"=FALSE scope = both; (Needed in each upgraded db by customers request).
  • As oracle perform  per Db create new pfile for the database
  • As oracle perform  per Db remover local listener entry from created new pfile
  • As oracle perform  per Db Check spfile because after upgrade it is on $ORACLE_HOME/dbs again instead of in asm where i had expected it
  • As oracle perform  per Db create new spfile in asm
  • As oracle perform  per Db srvctl modify database -dMYDB1 -p '+DATA/MYDB1/spfileMYDB1.ora'

Scenario 3 The Real Life implemented Plan.

Install the software and patch
/opt/oracle/admin/tools/cSpfile.ksh Creating a copy of spfile to Init.ora
create Migration pfiles with larger settings for shared_pool Recommended setting Min. Value Shared_pool_size > 600M ( prefer 1024M)
/opt/oracle/admin/tools/cSrvctlAct.ksh status Check the current setup in the clusterware
/opt/oracle/admin/tools/cSrvctlAct.ksh config Check the current setup in the clusterware
Dryrun . Start ./runInstaller to check Prerequisites before install and correct if needed. check for warnings and errors and correct them
Option Oracle Grid infra structuren for a standalone server Install in separate Oracle Home And choose Upgrade
Software only. ( make sure no parallel installs before ) Oracle Home Needs 2 b empty
/opt/oracle/product/11203_ee_64/stage/p6880880_112000_Linux-x86-64.zip Copy and install the Latest Opatch to both GI and Rdbms
oracle@mysrvr1:/opt/oracle/product/11203_ee_64/db/OPatch/ocm/bin [11203]# ocm.rsp Create a response File  needed during the opatch .
/opt/crs/product/112_ee_64/crs/OPatch/ocm/bin [CRS]# ocm.rsp
export PATH=/opt/crs/product/112_ee_64/crs/OPatch:$PATH Opatch runs as root , Set Path
which opatch Check version of Opatch
opatch auto /opt/oracle/product/11203_ee_64/stage -ocmrf /opt/crs/product/112_ee_64/crs/OPatch/ocm/bin/ocm.rsp -oh /opt/crs/product/112_ee_64/crs, /opt/oracle/product/11203_ee_64/db Run Opatch
Upgrade the Databases
Perform DBUA per Instance DBUA messed up by adding local listener to Init.ora and continued altering the  oratab by the Grid agent. That is why i recommend against the DBU for bulk upgrades . I would script the Upgrade using a Fixed Oracle_HOME ( the new one ) and a dedicated initora / spfile  for the MIG.
Steps for Manual Upgrade: Preferred WAY !
Create a new spfile from a  migration pfile mig pfile has larger shared_pool-size
1) Start sqlplus and run catupgrd.sql script from the NEW  $ORACLE_HOME/rdbms/admin
sqlplus ” / as sysdba “
spool /tmp/upgrade<DB>.log
startup upgrade
set echo on
@?/rdbms/admin/catupgrd.sql; After Catupgrd.sql finishes it will shutdown the database
2) Check catupgrd.sql spool file for errors.
3) Restart the database in normal mode.
4)  @$ORACLE_HOME/rdbms/admin/catuppst.sql; Post steps for the migration
5)  @$ORACLE_HOME/rdbms/admin/utlrp.sql;
alter system set “_use_adaptive_log_file_sync”=FALSE scope = both; Requested by customer
set lines 2000
select instance_name from v$instance; Check sanity of upgrade
select * from v$version; Check sanity of upgrade
select COMP_NAME,VERSION,STATUS,MODIFIED from dba_registry order by 1; Check sanity of upgrade all the installed components should be valid !
select * from DBA_REGISTRY_HISTORY order by action_time desc Check if Catbundle Ran (shows most  recent Entry first.
Check $ORACLE_HOME/dbs for the presence / correct init.ora Should point to the Asm Diskgroup to an Spfile.
srvctl upgrade database -d <Db>  -o /opt/oracle/product/11203_ee_64/db Inform clusterware about altered  Oracle home
srvctl modify database -d<Db> -p ‘+DATA/<Db>/spfile<Db>.ora’ make sure Clusterware knows about the Spfile alter if needed
srvctl modify database  -d <Db>  -o ‘/opt/oracle/product/11203_ee_64/db’ make sure Clusterware knows about the New Oracle Home
If you have a listener per Database make sure it is started from the NEW oracle Home with the correct listener.ora
/opt/oracle/admin/tools/cSrvctlAct.ksh status Check status of db in cluster
/opt/oracle/admin/tools/cSrvctlAct.ksh config Check configuration of db in cluster
srvctl stop database -d  & start database -d as check . /opt/oracle/admin/tools/cSrvctlAct.ksh stop /opt/oracle/admin/tools/cSrvctlAct.ksh start As a test stop and start via srvctl stop/start database -d  <Db>
ln -s /opt/networker/lib/libnwora.so libobk.so in new oracle home Check For the Networker Lib present in the new Oracle Home
run an archive or control file  backup as check Run an archive backup as a  test

Mission completed.