Moving from the Asmlib to multi-pathing / creating a shared spfile in ASM – Grid Infra

Introduction

No this is not a note in defense or offense for the ASMLIB. It is merely a registration of the fact that in the past Real application clusters  have been setup with ASMLIB  and these Clusters have turned into trouble(d) environments recently due to wrong allocation of ASMLIB Disks. Since the company standards have changed over the year in favor of the Multi-pathing  it has been decided that the troubled cluster would  have to start using the Multi-pathing . For a general overview of ASMLIB and Multi-pathing :  http://oracle.su/docs/11g/server.112/e10500/asmprepare.htm.

Also it has been decided to alter the standards a that ruled couple of years  ago ( implementing Rac clusters with Asm with ASMLIB) to now a days company standards which embraced the Multi-pathing. So far no one ever did this kind of convert on live systems so it would need a proper scenario a dress-rehearsal test before we would implement this on production environment .

In general terms it was explained to me that ASMLIB is like a layer on TOP of the Multi-Pathing  presenting Disks in ASM with a specific labeling. The asm_diskstring  in an ASMLIB environment would point to those disks with  ‘ORCL:1’ . Where the asm_diskstring would be like  the key which disks should be discovered by the ASM instance to work with. The scenario turned out to be remarkable simple to the Oracle Dbas. Bottom line was that even when the disks have been labeled for ASMLIB  if we would alter the asm_diskstring to point to the disks via ‘/dev/mapper/asm*p1’ that would already do the job since the ASMLIB labels would simply be ignored by multi-path. But as always do not  try this at home unless  you alter it in a controlled way as described below.

Details:

As a preparation i ran this scenario on a preproduction box  and with  that being a good dress-rehearsal implemented this on the production as well.  More details will be described below:

Plan of Approach
1 alter system set asm_diskstring = ‘/dev/mapper/asm*p1’ scope = spfile; Oracle
1 Stopping Crs on all 4 Nodes Oracle
1 Started Crs on mysrvr33r Oracle
1 Started Crs on mysrvr34r Oracle
1 Started Crs on mysrvr35r Oracle
1 Started Crs on mysrvr36r Oracle
2 Stopping Crs on all 4 Nodes Oracle
2 Removing ASMLIB RPMS 33r 34r 35r 36r Linux
2 Server reboot of mysrvr33r Linux
2 Check Cluster on mysrvr33r , all up Oracle
2 Restart of mysrvr34r 35r 36r Linux
2 Check  33r 34r 35r 36r Oracle
3 Stopping Crs on 34r 35r 36r Nodes Oracle
3 Create spfile in asm diskgroup on 33r Oracle
3 Restart of  CRS  on 33r Oracle
3 Check gpnp profile / spfile on 33r Oracle
3 Restart of crs on mysrvr34r 35r 36r Oracle
3 Check gpnp profile / spfile on 33r – 36r Oracle
  1. I have used three waves of activities . In the first wave As as preparation i altered the asm_diskgstring in the spfile of the asm instances already ( of course not yet active till after the next restart ) .  After that i stopped the full cluster on all 4 nodes and started with my first node to see the effects ( and to see the asm instance and database instances to be started ) all nodes had been restarted. And it showed all was running well.
  2. In this action i worked together with the Linux admins. The boxes in scope running on RedHat  they wanted to get rid of the ASMLIB in the kernel as well so in the second wave i shutdown the full cluster one more time, they removed the rpms from the Linux and rebooted  the first box and all was well after that. So after my checks the other three boxes were started in parallel and the end result was a happy  and running cluster again !
  3. In wave three i had to fix some old hurt that these 4 boxes were still working with local spfiles instead of a shared spfile in the asm instance. When this Cluster was build by me some three years ago it was born as a 11.1 cluster environment and it had been set up with local copies of  the spfile:

Setting up a (lost) Spfile in ASM in a Grid infrastructure environment:

##  First step  i thought would be to stop the cluster since i did that on another scenario. Much to my surprise i was recommended to do this in the running environment so i  did  this after i prepared a valid init.ora:

 SQL> create spfile ='+CLUSTERDATA'  from pfile = '/opt/oracle/+ASM1/admin/pfile/init+ASM1.NEW' ;
create spfile ='+CLUSTERDATA'  from pfile = '/opt/oracle/+ASM1/admin/pfile/init+ASM1.NEW'
*
ERROR at line 1:
ORA-29780: unable to connect to GPnP daemon [CLSGPNP_ERR]

## As you can see that did not work.  Investigation brought  following note (Environment Variable ORA_CRS_HOME MUST be UNSET in 11gR2/12c GI (Doc ID 1502996.1) . I checked and indeed that environment Variable was present in my  .profile:

#### bad practice to have ORA_CRS_HOME  set in your .profile if you are using GI so we unset it !!!!!!!!!!!!!!!!!!!

oracle@mysrvr33r:/opt/oracle/+ASM1/admin/pfile [+ASM1]# unset  ORA_CRS_HOME
oracle@mysrvr33r:/opt/oracle/+ASM1/admin/pfile [+ASM1]# echo $ORA_CRS_HOME
## after that  the create of the spfile worked.
SQL> create spfile ='+CLUSTERDATA'  from pfile = '/opt/oracle/+ASM1/admin/pfile/init+ASM1.NEW' ;
File created.

##  Lets do some checks  ( in asmcmd)  it showed:

ASMCMD [+] > spget
+CLUSTERDATA/mysrvr3_cluster/asmparameterfile/registry.253.841967299

## Second check using the gpnptool:

gpnptool get

Warning: some command line parameters were defaulted. Resulting command line:          /opt/crs/product/112_ee_64/crs/bin/gpnptool.bin get -o- http://www.grid-pnp.org/2005/11/gpnp-profile” xmlns:gpnp=”http://www.grid-pnp.org/2005/11/gpnp-profile” xmlns:orcl=”http://www.oracle.com/gpnp/2005/11/gpnp-profile” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd” ProfileSequence=”58″ ClusterUId=”2b7266b0d5797f65ff0fcf4c8e7931d6″ ClusterName=”mysrvr3_cluster” PALocation=””><gpnp:Network-Profile>SPFile=”+CLUSTERDATA/mysrvr3_cluster/asmparameterfile/registry.253.841967299″/><orcl:OCR-Profile id=”ocr” OCRId=”372102285″/>http://www.w3.org/2000/09/xmldsig#“><ds:CanonicalizationMethod Algorithm=”http://www.w3.org/2001/10/xml-exc-c14n#“/><ds:SignatureMethod Algorithm=”http://www.w3.org/2000/09/xmldsig#rsa-sha1“/><ds:Reference URI=””><ds:Transforms><ds:Transform Algorithm=”http://www.w3.org/2000/09/xmldsig#enveloped-signature“/><ds:Transform Algorithm=”http://www.w3.org/2001/10/xml-exc-c14n#“> <InclusiveNamespaces xmlns=”http://www.w3.org/2001/10/xml-exc-c14n#” PrefixList=”gpnp orcl xsi”/>http://www.w3.org/2000/09/xmldsig#sha1“/>3Tjuts50Gi92r42OMa4Pb17PiYc=B941IphE6D1FqVhc1u/+NwhAM3QXbBRiMT0plxhXyptUnj4mu1T1UFP/5yG+yBIzblquOy4aqxNBthMy7aQW0lyS4QfMZbjWYhYH2nvbrnnyqY/ZoYXOY0QaAYciboALXxJxCzup6ZGxCnsgtT8G/b08z679j8NlMvykdE2pmWY=

## in asmcmd:

ls -l +CLUSTERDATA/mysrvr3_cluster/asmparameterfile/registry.253.841967299

[Option  -p will be used ]

Type              Redund  Striped  Time             Sys  Name

ASMPARAMETERFILE  UNPROT  COARSE   MAR 11 23:00:00  Y    registry.253.841967299

## alert log shows:

Tue Mar 11 23:48:18 2014

NOTE: updated gpnp profile ASM diskstring: /dev/mapper/asm*p1

NOTE: updated gpnp profile ASM diskstring: /dev/mapper/asm*p1

## looks good  on other servers as well ( checked them all and  they showed similar like below:

gpnptool get

## As a next step i moved the existing spfile in the $ORACLE_HOME/dbs

mv spfile+ASM1.ora spfile+ASM1.ora.20140311.old

## then edited a  the init.ora to make it point to the ASM-Diskgroup as well.

vi init+ASM1.ora

## i restarted the cluster and after that i checked in the asm Instance.

SQL> show parameter spfile
NAME                                                      TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile                                                       string               +CLUSTERDATA/mysrvr3_cluster/as
mparameterfile/registry.253.84
1967299

## checks are asmcmd spget  and gpnptool get

##  copied  the init.ora to otherboxes

cat
oracle@mysrvr33r:/opt/crs/product/112_ee_64/crs/dbs [+ASM1]# cat init+ASM1.ora
spfile='+CLUSTERDATA/mysrvr3_cluster/asmparameterfile/registry.253.841967299'
1062      scp init+ASM1.ora oracle@mysrvr34r:/opt/crs/product/112_ee_64/crs/dbs/init+ASM2.ora
1063      scp init+ASM1.ora oracle@mysrvr35r:/opt/crs/product/112_ee_64/crs/dbs/init+ASM3.ora
1064      scp init+ASM1.ora oracle@mysrvr36r:/opt/crs/product/112_ee_64/crs/dbs/init+ASM4.ora

Mission completed. As always  happy reading And DO test before you  implement!

Mathijs .

ORA-01031: insufficient privileges with logon as sydba

Introduction:

I guess we all sometimes have that famous face palm moment where we do not have a direct success story to offer when a customer asks something . In my case I was asked to offer the sys password to a third-party. I did grumble about it but hey that is the way it is sometimes.  And yes during the time where they have sys-password  not responsible for that database AND I will  have to audit the system after  I take control of it again but that is another story.

Details:

In this case I had altered the password of sys , I tested it locally on the machine and I had given it to the customer and well he came back to me telling me it did not work …

The error they got and the one I researched for:

sqlplus "sys/password@MYB.prod.nl as sysdba"

ERROR:
ORA-01031: insufficient privileges

Well That one made me frown because : 
sqlplus system@MYDB.prod.nl

Connected to:

And then do this locally without connect string:

SQL> connect sys as sysdba 
Enter password: 
Connected.

So with the Web on my side  I was looking for clues-on this one. And again love the on-line community of Oracle because soon I found out  that I should check if password files existed and if the password on that file was set according to the sys password.

I checked and frowned because there was no password file present at all in the $ORACLE_HOME/dbs ! So I created a password file. Tested it and  it worked !

Surely that was a face-palm moment to me like hey I have been there – done that 🙂  but still nice got it working asap after all .

Syntax  IN $ORACLE_HOME/dbs: 
orapwd FILE=orapwMYDB1 password=
After that I copied it to the second node with scp and of course renamed  orapwMYDB1 to  orapwMYDB2.

See. Sometimes it is not that complex a matter to keep customer and Dba happy 🙂

Happy reading,

Mathijs

Are we or Are we not going to relink Oracle software after A Linux Upgrade

Introduction

Recently i have talked to  my managers with this one big topic on my mind . The Linux department is rapidly patching their servers to higher levels of Linux and they ask for support during and after them patching the OS. From management side I needed clear directive  about Oracle team following those activities closely. With Oracle support  I have started to gather information about the should or should not  relink the software . Experience will tell which way was the right one.  Cause of course there is also risks involved when relinking the Oracle  software . Examples from past seem to tell that relink brought trouble with it too.

As a central note i was given  Mos Note Is It Necessary To Relink Oracle Following OS Upgrade? (Doc ID 444595.1)

In there you see two great references :

Note 883299.1 Oracle 11gR2 Relink New Feature

and

Note 220970.1 RAC: Frequently Asked Questions
Note 284785.1 How to Check Whether Oracle Binary/Instance is RAC Enabled and Relink Oracle Binary in RAC

From the rac note ) 220970.1  this is a interesting quote as with regard to  the Grid infra structure environment:

Do I need to relink the Oracle Clusterware / Grid Infrastructure home after an OS upgrade?

Using Oracle Clusterware 10g and 11.1, Oracle Clusterware binaries cannot be relinked. However, the client shared libraries, which are part of the home can be relinked, in most cases there should not be a need to relink them. See Note:743649.1 for more information. 
 
 Using Oracle Grid Infrastructure 11.2 and higher, there are some executables in the Grid home that can and should be relinked after an OS upgrade. The following steps describe how to relink an Oracle Grid Infrastructure for Clusters home: 
 
 As root:
 
 # cd Grid_home/crs/install
 # perl rootcrs.pl -unlock
 
 As the grid infrastructure for a cluster owner:
 
 $ export ORACLE_HOME=Grid_home
 $ Grid_home/bin/relink
 
 As root again:
 
 # cd Grid_home/crs/install
 # perl rootcrs.pl -patch 
 
 Note: If using Oracle Grid Infrastructure for Standalone Environments (Oracle Restart), see the Oracle Documentation for more information: http://docs.oracle.com/cd/E11882_01/install.112/e24321/oraclerestart.htm#BABJCHIJ

So should you relink ? Well frankly I would follow recommends  of Oracle in this matter  with the idea that it needs to be done on low priority boxes like test or preprod first before touching production . And should you decide not to relink due to experiences in the past , well i would recommend to be standby once the servers are patched Os wise and the grid infra structure  and databases are starting .

 

happy reading,

 

Mathijs

Move Oracle Rac Database to a new diskgroup in Asm (A real life scenario)

Introduction:

Below you will find detailled steps to move a Rac database ( Oracle 11.2.0.3) to a different Diskgroup using a total maintenance mode scenario where the Database is not available during the maintenance window for the appliciaton.

Happy reading,

Mathijs

Detailled  Scenario of a diskgroup move of a Rac Database

## create fresh pfile  to be used  as a basis for a new spfile in the new diskgroup
create pfile=’/opt/oracle/MYDB1/admin/pfile/initMYDB1.ora.20140111_1800′ from spfile;
## create new spfile in the new diskgroup
create spfile=’+MYDB_DATA01′ from pfile=’/opt/oracle/MYDB1/admin/pfile/initMYDB1.ora.20140111_1800′;

## Shutdown the database vi the cluster :
##  all actions being performed after this shutdown using sqlplus till further notice !
srvctl stop database  -d MYDB

## In a second screen make sure your environment is pointing to the ASM environment and go to Asmcmd and go to the new disk group to find new name

asmcmd -p
cd MYDB_DATA01/MYDB/PARAMETERFILE

## it shows:
ls -ltr
spfile.256.836591237

## Copy the file spfile.256.831311031 to the new disk group which will work as a more human readable file name:

cp +MYDB_DATA01/MYDB/PARAMETERFILE/spfile.256.836591237  +MYDB_DATA01/MYDB/spfileMYDB.ora

### After doing that on Linux level alter the location of the spfile in the init.ora in the $ORACLE_HOME/dbs  ON ALL NODES  ( mysrvr25r / mysrvr26r)

cd/opt/oracle/product/11203_ee_64/db/dbs

ls -ltr  initMYDB*

### current content:  initMYDB1.ora check and adapt on second node as well
spfile=’+DATA1/MYDB/spfileMYDB.ora’
## After changing the disk group my new init.ora looks like this:

SPFILE=’+MYDB_DATA01/MYDB/spfileMYDB.ora’

## Start the  database to find out that you did good thing :

SQL> startup

## Working with the control files and Perform Backup:
## shows we have three controlfiles in place
SQL> show parameter control_files

NAME                                                      TYPE VALUE
—————————- ———– ——————————
+DATA1/MYDB/control01.ctl,+DATA1/MYDB/control02.ctl,+DATA1/MYDB/control03.ctl

## Set new location of controlfile in SPFILE:
alter system set control_files=’+MYDB_DATA01/MYDB/control01.ctl’, ‘+MYDB_FRA1/MYDB/control02.ctl’, ‘+MYDB_DATA01/MYDB/control03.ctl’ scope=spfile sid=’*’;
alter system set cluster_database=false scope=spfile;
## Shutdown your database
SQL> shutdown;

## Open Asmcmd again with the environment pointing to +ASM instance:
##  Copy the current control file from +DATA01 to the correct Diskgroups and sync them by this copy
ASMCMD
cp +DATA1/MYDB/control01.ctl  +MYDB_DATA01/MYDB/control01.ctl
cp +DATA1/MYDB/control01.ctl  +MYDB_FRA1/MYDB/control02.ctl
cp +DATA1/MYDB/control01.ctl  +MYDB_DATA01/MYDB/control03.ctl
## check it:
ls -l +MYDB_DATA01/MYDB/control01.ctl
ls -l +MYDB_FRA1/MYDB/control02.ctl
ls -l +MYDB_DATA01/MYDB/control03.ctl

##  Start your database with startup nomount

SQL> startup nomount;

## Start an rman session : Open “rman target /” and restore from old control and mount + open database:
## Not 100 % sure if this step was needed since we copied file in asmcmd already but  it wont hurt and takes little time
rman target /
restore controlfile to ‘+MYDB_DATA01/MYDB/control01.ctl’ from ‘+DATA1/MYDB/control01.ctl’;
restore controlfile to ‘+MYDB_FRA1/MYDB/control02.ctl’   from ‘+DATA1/MYDB/control01.ctl’;
restore controlfile to ‘+MYDB_DATA01/MYDB/control03.ctl’ from ‘+DATA1/MYDB/control01.ctl’;

##This will show
rman target /

Recovery Manager: Release 11.2.0.3.0 – Production on Sat Jan 11 18:53:29 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: MYDB (not mounted)

RMAN> restore controlfile to ‘+MYDB_DATA01/MYDB/control01.ctl’ from ‘+DATA1/MYDB/control01.ctl’;

Starting restore at 11.01.2014 18:54:00
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1010 device type=DISK

channel ORA_DISK_1: copied control file copy
Finished restore at 11.01.2014 18:54:02

RMAN> restore controlfile to ‘+MYDB_FRA1/MYDB/control02.ctl’   from ‘+DATA1/MYDB/control01.ctl’;

Starting restore at 11.01.2014 18:54:33
using channel ORA_DISK_1

channel ORA_DISK_1: copied control file copy
Finished restore at 11.01.2014 18:54:34

RMAN> restore controlfile to ‘+MYDB_DATA01/MYDB/control03.ctl’ from ‘+DATA1/MYDB/control01.ctl’;

Starting restore at 11.01.2014 18:54:51
using channel ORA_DISK_1

channel ORA_DISK_1: copied control file copy
Finished restore at 11.01.2014 18:54:52

## Mount the database via Rman

RMAN> sql ‘alter database mount’;

##This will show

List of instances:
1 (myinst: 1)
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
Sat Jan 11 18:39:32 2014
LCK0 started with pid=31, OS id=4011
Starting background process RSMN
Sat Jan 11 18:39:33 2014
RSMN started with pid=32, OS id=4015
ORACLE_BASE from environment = /opt/oracle
Sat Jan 11 18:39:33 2014
ALTER DATABASE   MOUNT
This instance was first to mount
NOTE: Loaded library: System
SUCCESS: diskgroup DATA1 was mounted
NOTE: dependency between database MYDB and diskgroup resource ora.DATA1.dg is established
Successful mount of redo thread 1, with mount id 2306013605
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT

## Now it is time to make a backup of the database into the new Disk group (+MYDB_DATA01). If you are in a rac environment make sure all other instances are down.

## Issue the following command in rman because this will create a one to one copy of the database in the new Disk group:

RMAN>backup as copy database format ‘+MYDB_DATA01’;

##This will show:

RMAN> backup as copy database format ‘+MYDB_DATA01’;

Starting backup at 11.01.2014 18:58:11
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1766 device type=DISK
channel ORA_DISK_1: starting datafile copy
input datafile file number=00006 name=+DATA1/MYDB/datafile/gemprod.275.786977627
output file name=+MYDB_DATA01/MYDB/datafile/gemprod.260.836593093 tag=TAG20140111T185812 RECID=15 STAMP=836593134
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:45
channel ORA_DISK_1: starting datafile copy
input datafile file number=00002 name=+DATA1/MYDB/datafile/sysaux.262.786977877
output file name=+MYDB_DATA01/MYDB/datafile/sysaux.261.836593139 tag=TAG20140111T185812 RECID=16 STAMP=836593155
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:25
channel ORA_DISK_1: starting datafile copy
input datafile file number=00001 name=+DATA1/MYDB/datafile/system.269.786977709
output file name=+MYDB_DATA01/MYDB/datafile/system.262.836593163 tag=TAG20140111T185812 RECID=17 STAMP=836593175
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:15
channel ORA_DISK_1: starting datafile copy
input datafile file number=00003 name=+DATA1/MYDB/datafile/undotbs1.266.786977783
output file name=+MYDB_DATA01/MYDB/datafile/undotbs1.263.836593179 tag=TAG20140111T185812 RECID=18 STAMP=836593187
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:15
channel ORA_DISK_1: starting datafile copy
input datafile file number=00007 name=+DATA1/MYDB/datafile/undotbs2.267.786977883
output file name=+MYDB_DATA01/MYDB/datafile/undotbs2.264.836593195 tag=TAG20140111T185812 RECID=19 STAMP=836593202
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:15
channel ORA_DISK_1: starting datafile copy
input datafile file number=00004 name=+DATA1/MYDB/datafile/users.259.786977783
output file name=+MYDB_DATA01/MYDB/datafile/users.265.836593209 tag=TAG20140111T185812 RECID=20 STAMP=836593211
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:03
channel ORA_DISK_1: starting datafile copy
input datafile file number=00005 name=+DATA1/MYDB/datafile/tools.263.786977709
output file name=+MYDB_DATA01/MYDB/datafile/tools.266.836593213 tag=TAG20140111T185812 RECID=21 STAMP=836593215
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:03
channel ORA_DISK_1: starting datafile copy
copying current control file
output file name=+MYDB_DATA01/MYDB/controlfile/backup.267.836593215 tag=TAG20140111T185812 RECID=22 STAMP=836593217
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:03
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 11.01.2014 19:00:18
channel ORA_DISK_1: finished piece 1 at 11.01.2014 19:00:19
piece handle=+MYDB_DATA01/MYDB/backupset/2014_01_11/nnsnf0_tag20140111t185812_0.268.836593219 tag=TAG20140111T185812 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 11.01.2014 19:00:19

## Once that has finished issue the following command in Rman (note this switch to command performs the  … set newname for you  which is great:

RMAN>switch database to copy;

##This will show you:

RMAN> switch database to copy;

datafile 1 switched to datafile copy “+MYDB_DATA01/MYDB/datafile/system.262.836593163”
datafile 2 switched to datafile copy “+MYDB_DATA01/MYDB/datafile/sysaux.261.836593139”
datafile 3 switched to datafile copy “+MYDB_DATA01/MYDB/datafile/undotbs1.263.836593179”
datafile 4 switched to datafile copy “+MYDB_DATA01/MYDB/datafile/users.265.836593209”
datafile 5 switched to datafile copy “+MYDB_DATA01/MYDB/datafile/tools.266.836593213”
datafile 6 switched to datafile copy “+MYDB_DATA01/MYDB/datafile/gemprod.260.836593093”
datafile 7 switched to datafile copy “+MYDB_DATA01/MYDB/datafile/undotbs2.264.836593195”

##When that is finished issue following command in rman :
RMAN>sql ‘alter database open’;

## Start / open  Second instance as well via Sqlplus  as a check on the second box.
alter system set cluster_database=true scope=spfile;
startup

##Your alert file has been updated with following information:
Sat Jan 11 19:02:35 2014
WARNING: cataloging database area datafile
+DATA1/MYDB/datafile/system.269.786977709 as recovery area datafilecopy.
This datafilecopy is accounted into used space. Consider incrementing
db_recovery_file_dest_size parameter value by size of datafile.
Switch of datafile 1 complete to datafile copy
checkpoint is 13560558508638
WARNING: cataloging database area datafile
+DATA1/MYDB/datafile/sysaux.262.786977877 as recovery area datafilecopy.
This datafilecopy is accounted into used space. Consider incrementing
db_recovery_file_dest_size parameter value by size of datafile.
Switch of datafile 2 complete to datafile copy
checkpoint is 13560558508638
WARNING: cataloging database area datafile
+DATA1/MYDB/datafile/undotbs1.266.786977783 as recovery area datafilecopy.
This datafilecopy is accounted into used space. Consider incrementing
db_recovery_file_dest_size parameter value by size of datafile.
Switch of datafile 3 complete to datafile copy
checkpoint is 13560558508638
WARNING: cataloging database area datafile
+DATA1/MYDB/datafile/users.259.786977783 as recovery area datafilecopy.
This datafilecopy is accounted into used space. Consider incrementing
db_recovery_file_dest_size parameter value by size of datafile.
Switch of datafile 4 complete to datafile copy
checkpoint is 13560558508638
WARNING: cataloging database area datafile
+DATA1/MYDB/datafile/tools.263.786977709 as recovery area datafilecopy.
This datafilecopy is accounted into used space. Consider incrementing
db_recovery_file_dest_size parameter value by size of datafile.
Switch of datafile 5 complete to datafile copy
checkpoint is 13560558508638
WARNING: cataloging database area datafile
+DATA1/MYDB/datafile/gemprod.275.786977627 as recovery area datafilecopy.
This datafilecopy is accounted into used space. Consider incrementing
db_recovery_file_dest_size parameter value by size of datafile.
Switch of datafile 6 complete to datafile copy
checkpoint is 13560558508638
WARNING: cataloging database area datafile
+DATA1/MYDB/datafile/undotbs2.267.786977883 as recovery area datafilecopy.
This datafilecopy is accounted into used space. Consider incrementing
db_recovery_file_dest_size parameter value by size of datafile.
Switch of datafile 7 complete to datafile copy
checkpoint is 13560558508638
Sat Jan 11 19:04:21 2014
alter database open

## Open a Sqlplus session and check for the %create% parameter:

show parameter create
SQL> show parameter db_create

NAME                                 TYPE        VALUE
———————————— ———– ——————————
db_create_file_dest                  string
db_create_online_log_dest_1          string
db_create_online_log_dest_2          string
db_create_online_log_dest_3          string
db_create_online_log_dest_4          string
db_create_online_log_dest_5          string

alter system set db_create_file_dest=’+MYDB_DATA01′ sid=’*’;
alter system set db_create_online_log_dest_1=’+MYDB_FRA1′ sid=’*’;

## check again
show parameter db_create
show parameter cluster

##Now it is time to work with the temp files. You will have to create a New temp tablespace in the new disk group, make that the default one and drop the old one:
set lines 200
col tablespace_name format a40
col file_name format a80
select tablespace_name,FILE_NAME, bytes/1024/1024 MB from dba_temp_files;

## This shows: FILE_NAME
TABLESPACE_NAME                          FILE_NAME                                                                                MB
—————————————- ——————————————————————————– ———-
TEMP                                     +DATA1/MYDB/tempfile/temp.268.774880131                                             8192

##This will create a new temp tablespace , in the new disk group , make it the default tablespace and drop old

create temporary tablespace TEMP02 tempfile size 1024m;
alter database default temporary tablespace TEMP02;
drop tablespace TEMP ;
create temporary tablespace TEMP tempfile size 8192m;
alter database default temporary tablespace TEMP;
drop tablespace TEMP02;

##Check it again:
set lines 200
col tablespace_name format a40
col file_name format a80
select tablespace_name,FILE_NAME, bytes/1024/1024 MB from dba_temp_files;

## Working with the redo logs:
## In this step first we have to add new members to each group (to each thread (in a rac)). After that and after switching the log files you can delete the members in the old disk group
## First check the environment:

Set lines 2000
select * from v$log;
SQL>
GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIME          NEXT_CHANGE# NEXT_TIME
———- ———- ———- ———- ———- ———- — —————- ————- ——————- ———— ——————-
1          1      28288  209715200        512          2 YES INACTIVE            1.3559E+13 08.01.2014 14:54:59   1.3559E+13 08.01.2014 15:24:59
2          1      28289  209715200        512          2 YES INACTIVE            1.3559E+13 08.01.2014 15:24:59   1.3559E+13 08.01.2014 15:54:59
3          1      28290  209715200        512          2 NO  CURRENT             1.3559E+13 08.01.2014 15:54:59   2.8147E+14
5          2      28097  209715200        512          2 YES INACTIVE            1.3559E+13 08.01.2014 15:24:53   1.3559E+13 08.01.2014 15:54:52
6          2      28098  209715200        512          2 NO  CURRENT             1.3559E+13 08.01.2014 15:54:52   2.8147E+14
7          2      28095  209715200        512          2 YES INACTIVE            1.3559E+13 08.01.2014 14:24:51   1.3559E+13 08.01.2014 14:54:53
8          2      28096  209715200        512          2 YES INACTIVE            1.3559E+13 08.01.2014 14:54:53   1.3559E+13 08.01.2014 15:24:53

## and
col member format a80
set pagesize 33
select GROUP#,MEMBER from v$logfile order by 1;
GROUP# MEMBER
———- ——————————————————————————–
1 +DATA1/MYDB/onlinelog/group_1.260.774880079
1 +MYDB_FRA1/MYDB/onlinelog/group_1.258.774880081
2 +DATA1/MYDB/onlinelog/group_2.264.774880083
2 +MYDB_FRA1/MYDB/onlinelog/group_2.257.774880085
3 +DATA1/MYDB/onlinelog/group_3.261.774880087
3 +MYDB_FRA1/MYDB/onlinelog/group_3.256.774880089
5 +DATA1/MYDB/onlinelog/group_5.271.774880749
5 +MYDB_FRA1/MYDB/onlinelog/group_5.259.774880751
6 +DATA1/MYDB/onlinelog/group_6.272.774880753
6 +MYDB_FRA1/MYDB/onlinelog/group_6.260.774880755
7 +DATA1/MYDB/onlinelog/group_7.273.774880757
7 +MYDB_FRA1/MYDB/onlinelog/group_7.261.774880759
8 +DATA1/MYDB/onlinelog/group_8.274.774880763
8 +MYDB_FRA1/MYDB/onlinelog/group_8.262.774880765

##First we add new members to the correct , new disk group:
alter database add logfile member ‘+MYDB_DATA01’ to group 1;
alter database add logfile member ‘+MYDB_DATA01’ to group 2;
alter database add logfile member ‘+MYDB_DATA01’ to group 3;
alter database add logfile member ‘+MYDB_DATA01’ to group 5;
alter database add logfile member ‘+MYDB_DATA01’ to group 6;
alter database add logfile member ‘+MYDB_DATA01’ to group 7;
alter database add logfile member ‘+MYDB_DATA01’ to group 8;

###Check again:
select GROUP#,MEMBER from v$logfile order by 1;

## Perform some switches to make sure the new members have been in use  ( archive log current performs logswitch in  the whole  Database )

alter system archive log current;

alter system archive log current;

alter system archive log current;

alter system archive log current;

alter system archive log current;

alter system archive log current;

alter system archive log current;

alter system archive log current;

## First check the environment again:
Set lines 2000
select * from v$log;

##It is time to drop the members from the old ( data1 )
select GROUP#,MEMBER from v$logfile order by 1;

GROUP# MEMBER
———- ——————————————————————————–
1 +DATA1/MYDB/onlinelog/group_1.260.774880079
1 +MYDB_FRA1/MYDB/onlinelog/group_1.258.774880081
2 +DATA1/MYDB/onlinelog/group_2.264.774880083
2 +MYDB_FRA1/MYDB/onlinelog/group_2.257.774880085
3 +DATA1/MYDB/onlinelog/group_3.261.774880087
3 +MYDB_FRA1/MYDB/onlinelog/group_3.256.774880089
5 +DATA1/MYDB/onlinelog/group_5.271.774880749
5 +MYDB_FRA1/MYDB/onlinelog/group_5.259.774880751
6 +DATA1/MYDB/onlinelog/group_6.272.774880753
6 +MYDB_FRA1/MYDB/onlinelog/group_6.260.774880755
7 +DATA1/MYDB/onlinelog/group_7.273.774880757
7 +MYDB_FRA1/MYDB/onlinelog/group_7.261.774880759
8 +DATA1/MYDB/onlinelog/group_8.274.774880763
8 +MYDB_FRA1/MYDB/onlinelog/group_8.262.774880765

##So we have to drop the redo members that point to the old ( DATA01 ) Disk group ( but the group can not be current !!):

GROUP# MEMBER
———- ——————————————————————————–
1 +DATA1/MYDB/onlinelog/group_1.260.774880079
2 +DATA1/MYDB/onlinelog/group_2.264.774880083
3 +DATA1/MYDB/onlinelog/group_3.261.774880087
5 +DATA1/MYDB/onlinelog/group_5.271.774880749
6 +DATA1/MYDB/onlinelog/group_6.272.774880753
7 +DATA1/MYDB/onlinelog/group_7.273.774880757
8 +DATA1/MYDB/onlinelog/group_8.274.774880763

alter database drop logfile member ‘+DATA1/MYDB/onlinelog/group_1.260.774880079’;
alter database drop logfile member ‘+DATA1/MYDB/onlinelog/group_2.264.774880083’;
alter database drop logfile member ‘+DATA1/MYDB/onlinelog/group_3.261.774880087’;
alter database drop logfile member ‘+DATA1/MYDB/onlinelog/group_5.271.774880749’;
alter database drop logfile member ‘+DATA1/MYDB/onlinelog/group_6.272.774880753’;
alter database drop logfile member ‘+DATA1/MYDB/onlinelog/group_7.273.774880757’;
alter database drop logfile member ‘+DATA1/MYDB/onlinelog/group_8.274.774880763’;

##Checked again
select GROUP#,MEMBER from v$logfile order by 1;

##Working in the clusterware:

After these activities I tried Stopping and starting via srvctl ( this is an oracle restart environment crashed ) ..   Alertlog was having error messages and the start failed …   I did notice that the environment was using the old SPFILE in +Data01 again … I checked the spfile which was wrong again..

cd  $ORACLE_HOME/db/dbs

cat initMYDB1.ora
spfile=’+DATA1/MYDB/spfileMYDB.ora’

## so the clusteragent had altered my changes
##altered init.ora again and started that worked ..

##In sqlplus:

select name from v$controlfile
union
select name from v$datafile
union
select name from v$tempfile
union
select member from v$logfile
union
select filename from v$block_change_tracking
union
select name from v$flashback_database_logfile;

## This shows:

NAME
——————————————————————————–
+DATA1/MYDB/changetracking/ctf.265.774880707
+DATA1/MYDB/control01.ctl
+DATA1/MYDB/control02.ctl
+DATA1/MYDB/control03.ctl
+DATA1/MYDB/datafile/gemprod.275.786977627
+DATA1/MYDB/datafile/sysaux.262.786977877
+DATA1/MYDB/datafile/system.269.786977709
+DATA1/MYDB/datafile/tools.263.786977709
+DATA1/MYDB/datafile/undotbs1.266.786977783
+DATA1/MYDB/datafile/undotbs2.267.786977883
+DATA1/MYDB/datafile/users.259.786977783
+DATA1/MYDB/onlinelog/group_1.260.774880079
+DATA1/MYDB/onlinelog/group_2.264.774880083
+DATA1/MYDB/onlinelog/group_3.261.774880087
+DATA1/MYDB/onlinelog/group_5.271.774880749
+DATA1/MYDB/onlinelog/group_6.272.774880753
+DATA1/MYDB/onlinelog/group_7.273.774880757
+DATA1/MYDB/onlinelog/group_8.274.774880763
+DATA1/MYDB/tempfile/temp.268.774880131
+MYDB_FRA1/MYDB/onlinelog/group_1.258.774880081
+MYDB_FRA1/MYDB/onlinelog/group_2.257.774880085
+MYDB_FRA1/MYDB/onlinelog/group_3.256.774880089
+MYDB_FRA1/MYDB/onlinelog/group_5.259.774880751
+MYDB_FRA1/MYDB/onlinelog/group_6.260.774880755
+MYDB_FRA1/MYDB/onlinelog/group_7.261.774880759
+MYDB_FRA1/MYDB/onlinelog/group_8.262.774880765

##  means another action point as with regard to the block change tracking
SELECT filename, status, bytes
FROM v$block_change_tracking;
2
FILENAME                                                                         STATUS          BYTES
——————————————————————————– ———- ———-
+DATA1/MYDB/changetracking/ctf.265.774880707                                  ENABLED      11599872

## Disable it and enable it on the new location ( the new disk group )
ALTER DATABASE DISABLE BLOCK CHANGE TRACKING;
ALTER DATABASE ENABLE BLOCK CHANGE TRACKING USING FILE ‘+MYDB_DATA01’;

## In an Oracle restart or Rac Environment you need to check the Clusterware setup now since it has knowledge about spfile, disk groups being used etc.
## First check the configuration in the Clusterware for the database:
srvctl config database -d MYDB
Database unique name: MYDB
Database name:
Oracle home: /opt/oracle/product/11203_ee_64/db
Oracle user: oracle
Spfile: +DATA1/MYDB/spfileMYDB.ora
Domain: prod.vis
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: MYDB
Database instances: MYDB1,MYDB2
Disk Groups: DATA1,MYDB_FRA1
Mount point paths:
Services: MYDB_TAF.prod.vis
Type: RAC
Database is administrator managed

##So we have to perform two action points:
·        Make the spfile point to the correct disk group ( our new +MYDB_DATA01)
·        Disk groups attribute still knows about that data1 disk group ( and it should not)

##First modification will be to inform the Clusterware which spfile to use:
srvctl modify database -d MYDB -p ‘+MYDB_DATA01/MYDB/spfileMYDB.ora’

##After that similar action for the disk groups:
srvctl modify database -d  MYDB -a ‘MYDB_DATA01,MYDB_FRA1’

## don’t believe it check it
srvctl config database -d MYDB

##This looks better so now let’s do a stop & start with srvctl
srvctl stop database -d MYDB
srvctl start database -d MYDB

##That worked !  happy dba

TNS-12514 in an Oracle Rac Environment

Introduction:

On a frequent bases team members do hotline service where they act as a first line of contact when there are issues with the Oracle Environment on the various boxes. In the last week of 2013 Monday 30st of December  it was my turn to do that duty when one of the interesting  issues showed that  even caused a Priority One incident on the server srvr_33r which is part of a  4 node Rac cluster with the  database MYDB. So once again time to gear up and go investigate what was going on. Users received the following Error Message:  TNS-12514: TNS:listener does not know of service requested in connect descriptor

Summary:

In the end it showed that the Oracle Instance on the 33r Box was suffering from a communication issue with its ASM instance since this ASM instance had issues with its shared_pool ( in the alert file of the Oracle Instance at the start of the incident i found : ORA-04031: unable to allocate 760 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^928″,”kglss”).

Due to that  shared_pool issue the Instance was unable to communicate with its ASM Instance when it needed the information about  its Local listener in the spfile. That made that information unavailable to the Oracle instance so the Cluster agent performed this :  ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.20.30.72)(PORT=1521))))’ SCOPE=MEMORY SID=’MYDB1′;

So It altered my LOCAL_LISTENER in memory to the default listener on the cluster, listening on 1521 with NO services defined for this specific database !!

Even worse was that the environment was hit by an Oracle Bug ( which was pointed out by a team member to me : Bug 11772838 – oraagent modifies LOCAL_LISTENER with an incorrect value, when the parameter was set in spfile (Doc ID 11772838.8)

I have implemented the workaround for the listener and will have to schedule a maintenance window on the cluster to fix the Memory issues in the ASM instance(s).

Details:

Environment setting:  4 node 11.2.0.3 Grid Infrastructure  with a mix of  Databases (and versions) present. Every database has a dedicated listener on a specific port. And services for that database register with that dedicated listener. Every database has its spfile in an ASM Diskgroup.

When i started performing checks i first looked at the listener on the first Node ( Instance1 is on srvr_33r)

# lsnrctl status listener_MYDB
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 30-DEC-2013 10:02:51
 Copyright (c) 1991, 2011, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=srvr_33r-vip)(PORT=20008)(IP=FIRST)))
 STATUS of the LISTENER
 ------------------------
 Alias LISTENER_MYDB
 Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production
 Start Date 23-OCT-2013 01:21:27
 Uptime 68 days 9 hr. 41 min. 24 sec
 Trace Level off
 Security ON: Local OS Authentication
 SNMP OFF
 Listener Parameter File /opt/oracle/product/11203_ee_64/db/network/admin/listener.ora
 Listener Log File /opt/oracle/MYDB/admin/network/log/listener_MYDB.log
 Listening Endpoints Summary...
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.20.30.72)(PORT=20008)))
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.20.30.71)(PORT=20008)))
 Services Summary...
 Service "MYDB" has 1 instance(s).
 Instance "MYDB", status UNKNOWN, has 1 handler(s) for this service...
 The command completed successfully

 

And i compared that to the information in the listener on the other node: 

(Instance 2 is on srvr_34r)
oracle@srvr_34r:/opt/oracle [MYDB2]# lsnrctl status listener_MYDB
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 30-DEC-2013 10:00:27
 Copyright (c) 1991, 2011, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=srvr_34r-vip)(PORT=20008)(IP=FIRST)))
 STATUS of the LISTENER
 ------------------------
 Alias LISTENER_MYDB
 Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production
 Start Date 23-OCT-2013 03:10:15
 Uptime 68 days 7 hr. 50 min. 12 sec
 Trace Level off
 Security ON: Local OS Authentication
 SNMP OFF
 Listener Parameter File /opt/oracle/product/11203_ee_64/db/network/admin/listener.ora
 Listener Log File /opt/oracle/MYDB/admin/network/log/listener_MYDB.log
 Listening Endpoints Summary...
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.20.30.75)(PORT=20008)))
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.20.30.73)(PORT=20008)))
 Services Summary...
 Service "MYDB" has 1 instance(s).
 Instance "MYDB", status UNKNOWN, has 1 handler(s) for this service...
 Service "MYDB.prod.nl" has 1 instance(s).
 Instance "MYDB2", status READY, has 1 handler(s) for this service...
 Service "MYDB_TAF.prod.nl" has 1 instance(s).
 Instance "MYDB2", status READY, has 1 handler(s) for this service...
 The command completed successfully

 

Conclusion: So in short the listener on srvr_33r for this specific database was unaware of its services even though  the services where / are in place.

HOWEVER  when I checked for the local listener on the srvr_33r server   in an sqlplus session  this is what I saw:

SQL> show parameter listener
NAME TYPE VALUE
 ------------------------------------ ----------- ------------------------------
 listener_networks string
 local_listener string (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.20.30.72)(PORT=1521))))
 remote_listener string srvr_3rcl-scan:33100

Which proves that the Spfile was at this morning time during incident ( December 30 as of 09:00)  no longer aware of its LOCAL listener !!! In the end we fixed the issue by adding the local listener again and after that issues with connectivity where resolved.

Note when I check the listener Logs I do see that listener was unaware of services at the start of  the incident .

/opt/oracle/MYDB/admin/network/log/listener_MYDB.log
30-DEC-2013 00:14:40 * (CONNECT_DATA=(SERVICE_NAME=MYDB.prod.nl)(SERVER=DEDICATED)(CID=(PROGRAM=c:\oraclexe\app\oracle\product\11.2.0\server\bin\ORACLE.EXE)(HOST=X1234567)(USER=SYSTEM))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.60.205.57)(PORT=58445)) * establish * MYDB.prod.nl * 12514
TNS-12514: TNS:listener does not currently know of service requested in connect descriptor
30-DEC-2013 00:14:40 * (CONNECT_DATA=(SERVICE_NAME=MYDB.prod.nl)(SERVER=DEDICATED)(CID=(PROGRAM=c:\oraclexe\app\oracle\product\11.2.0\server\bin\ORACLE.EXE)(HOST=X1234567)(USER=SYSTEM))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.60.205.57)(PORT=58446)) * establish * MYDB.prod.nl * 12514
TNS-12514: TNS:listener does not currently know of service requested in connect descriptor

In the alert logs of the Instance I found this information and I think that this has caused our PRiO 1 incident due to the instance could not talk to the asm instance:

view /opt/oracle/MYDB/admin/Arch/alert_MYDB1.log.20131230_000158
See Note 411.1 at My Oracle Support for error and packaging details.
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
Dumping diagnostic data in directory=[cdmp_20131229112802], requested by (instance=1, osid=29870 (M000)), summary=[incident=48997].
Sun Dec 29 11:28:07 2013
Errors in file /opt/oracle/MYDB/diag/rdbms/MYDB/MYDB1/trace/MYDB1_m000_29870.trc  (incident=48998):
ORA-04031: unable to allocate 760 bytes of shared memory ("shared pool","unknown object","KKSSP^928","kglss")
Incident details in: /opt/oracle/MYDB/diag/rdbms/MYDB/MYDB1/incident/incdir_48998/MYDB1_m000_29870_i48998.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
Sun Dec 29 11:28:08 2013
Dumping diagnostic data in directory=[cdmp_20131229112808], requested by (instance=1, osid=29870 (M000)), summary=[incident=48998].
Sun Dec 29 11:28:33 2013
ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.20.30.72)(PORT=1521))))' SCOPE=MEMORY SID='MYDB1';

Addendum:

After Altering the local listener to the required and correct listener i closed the ticket. Much to my surprise after couple of hours to be kicked by the same issue on the same database again. A  Colleague pointed me to Mos to  Bug 11772838 – oraagent modifies LOCAL_LISTENER with an incorrect value, when the parameter was set in spfile (Doc ID 11772838.8) .  So that proved i would need to make another step to have a proper workaround in place before i could start plan a maintenance window that would help to fix the cause of all this ( which was the Oracle Instance unable to communicate to its ASM instance when it needed  information about  the local listener in its spfile).

As Always Happy reading,

Mathijs

Rman Level backup and Block change Tracking and the Positive effects in IO load.

Introduction:

Recently another environment came under surveillance ( or was it showed up on the radar) due to performance issues within a specific time interval ( app. 20:00 – 22:00). A quick scan showed that on that box various Rac Instances where present and they all have Rman level 0 and level 1 backups . First urgent request by customer was to reschedule the various backups so they would not run together anymore. After that first adoption the performance already changed for the better.  Yet it was time again to wear the thinking hat and investigate the environment closer. Fortunately we have awr running on that box so it was a nice challenge to go find out.  In this blog you will find the feedback of those analyses.

Summary

Most important in this matter was to have AWR reports running on a frequent (hour) bases. Check with your licences if you are allowed to use the awr reports. From those Awr Reports we saw Peek  Disk IO during level 1 backups. Well That as not supposed to be the case .. So a dive into the details of the database showed that there was no block_change_tracking enabled. After doing so the level 1 backup was so much more balanced in its io – needs . So when you have large Databases ( mine is app 1.2 TB) make sure that you or your fellow Dbas have enabled this block_change_tracking if  you do level 1 backups with Rman.

Details.

Environment we are working with is a Real application Cluster on RedHat Linux with multiple Instances on the Nodes of the cluster. Backups are performed with Rman to Tape  with Level 0 and Level 1  on a daily basis . Customer complained about performance going down the tube in a specific time interval ( 20.00 )  – ( 22.00). And as always the backups where first prime suspects causing the performance drain. Frankly at first I was frowning with the idea.. ‘Let’s blame the backups again for the poor performance of the database’.. Investigations however showed a couple of shocks to me.

First it came to my notice that the database (11.2.0.3) was setup as a Real application cluster and it was 1.2 TB in size. Frankly I could not understand why this Database was pushed to tape with Rman backup instead of using disk backups or using Snapshot technology to mount the database on a backup server and prevent every kind of overhead on the disks.  But as they say once you are in the kitchen for the short-term you need to cook with the tools you  have present.

Block change tracking:

I came across this blog and  that was once again an eye Opener: http://arup.blogspot.nl/2008/09/magic-of-block-change-tracking.html

From the official oracle Documentation:

http://docs.oracle.com/cd/E11882_01/backup.112/e10642/rcmbckba.htm#BRADV8125

So once it is enabled Oracle uses a file to keep track of the blocks that are being  changed  in the database with the goal to ease the burden of incremental backups.

This is how you can find out if that is already in place:

select filename, status, bytes from v$block_change_tracking;

If  block change tracking is in place you will see something like this ( in my case the file is stored in an ASM diskgroup since I am working with a Rac database) :

FILENAME
------------------------------------------------------------------------------------------------------------------------------------------------
STATUS                BYTES
---------- ----------
+MYDB_DATA01/mydb/changetracking/ctf.407.833981253
ENABLED     158401024

The smallest size for this file is 10MB and that it will expand with 10 Mb intervals. From the Oracle documentation:  The size of the block change tracking file is proportional to the size of the database and the number of enabled threads of redo. The size of the block change tracking file can increase and decrease as the database changes. The size is not related to the frequency of updates to the database.

Typically, the space required for block change tracking for a single instance is about 1/30,000 the size of the data blocks to be tracked. For an Oracle RAC environment, it is 1/30,000 of the size of the database, times the number of enabled threads.

So enabling the block_change_tracking was a first improvement I have implemented for that database.

Awr Reports.

As I mentioned I am a lucky Dba to be allowed to use the Awr reports to do analyses to prove if the Rman backups indeed affected the performance of the database in a dramatic way:

A level 1Rman backup should be much faster – causing less overhead on the database since only the changes since last backup need to be put to tape right ?  WRONG!  ( well at least it is wrong if you did not enable the block_change_tracking yet). Ah but you want proof ..  See is believe right :

AWR without the block_change tracking enabled  the hour before Rman backup kicks in:

Top 5 Timed Foreground Events

Event

Waits

Time(s)

Avg wait (ms)

% DB time

Wait Class

db file sequential read

92,008

723

8

59.13

User I/O
DB CPU  

214

 

17.49

 
log file sync

192,402

194

1

15.85

Commit
control file sequential read

19,000

18

1

1.45

System I/O
enq: TX – index contention

1,604

14

9

1.17

Concurrency

IOStat by Function summary

Function Name

Reads: Data

Reqs per sec

Data per sec

Writes: Data

Reqs per sec

Data per sec

Waits: Count

Avg Tm(ms)

Others 2G

10.03

.554437 1.2G

3.35

.346592 36.7K

1.00

LGWR 1M

0.02

.000277 2.1G

109.02

.600501 195.4K

0.23

DBWR 0M

0.00

0M 1.2G

24.90

.342707 0  
Buffer Cache Reads 697M

24.55

.193414 0M

0.00

0M 88.4K

7.59

Direct Reads 344M

2.16

.095458 1M

0.02

.000277 0  
Direct Writes 0M

0.00

0M 42M

1.49

.011654 0  
RMAN 0M

0.01

0M 0M

0.00

0M 42

1.40

TOTAL: 3G

36.77

.843588 4.6G

138.78

1.30173 320.5K

2.35

 

 AWR report DURING the Rman backup without the block change tracking enabled:

Top 5 Timed Foreground Events with Level 1 Backup without Block_change_tracking

Event

Waits

Time(s)

Avg wait (ms)

% DB time

Wait Class

log file sync

119,690

30,133

252

74.69

Commit
db file sequential read

47,120

6,113

130

15.15

User I/O
enq: TX – index contention

5,454

1,107

203

2.74

Concurrency
direct path write

7,679

586

76

1.45

User I/O
control file sequential read

4,505

436

97

1.08

System I/O

As you can see the Avg wait time has increased dramatically. Let’s take a look at the IO and this was a real shock to me since even with a level 1 Backup all the database is being read (RMAN reading 1.2T):

IOStat by Function summary

Function Name

Reads: Data

Reqs per sec

Data per sec

Writes: Data

Reqs per sec

Data per sec

Waits: Count

Avg Tm(ms)

RMAN 1.2T

350.69

349.559 42M

0.23

.011712 4527

16.02

Others 1.1G

5.50

.305634 819M

3.66

.228389 20.2K

138.86

LGWR 1M

0.01

.000278 1.3G

33.35

.380091 57.1K

57.57

DBWR 0M

0.00

0M 834M

17.08

.232572 0  
Buffer Cache Reads 371M

13.04

.103458 0M

0.00

0M 46.6K

124.67

Direct Reads 341M

1.98

.095092 0M

0.02

0M 0  
Direct Writes 0M

0.00

0M 12M

0.43

.003346 0  
Streams AQ 0M

0.00

0M 0M

0.00

0M 14

56.64

TOTAL: 1.2T

371.22

350.064 3G

54.76

.856112 128.5K

93.26

 

Awr Report same time interval with block_change_tracking active and Rman Level 1 backup

Top 5 Timed Foreground Events

Event

Waits

Time(s)

Avg wait (ms)

% DB time

Wait Class

db file sequential read

89,148

675

8

55.22

User I/O
DB CPU  

222

 

18.15

 
log file sync

188,028

193

1

15.79

Commit
SQL*Net more data from client

125,490

81

1

6.65

Network
enq: TX – index contention

1,779

18

10

1.50

Concurrency

 

Average wait times are back to normal.

 IOStat by Function summary

Function Name

Reads: Data

Reqs per sec

Data per sec

Writes: Data

Reqs per sec

Data per sec

Waits: Count

Avg Tm(ms)

Others 2.2G

10.92

.629154 1.7G

5.39

.480530 41.3K

0.85

LGWR 1M

0.02

.000277 2.3G

106.90

.649395 191.2K

0.24

DBWR 0M

0.00

0M 1.2G

23.95

.353535 0  
Buffer Cache Reads 679M

24.03

.188274 0M

0.00

0M 86.7K

7.28

Direct Writes 0M

0.00

0M 87M

3.02

.024123 0  
RMAN 60M

0.94

.016636 15M

0.20

.004159 4008

0.49

Direct Reads 28M

0.96

.007763 2M

0.04

.000554 0  
Streams AQ 0M

0.00

0M 0M

0.00

0M 17

5.53

TOTAL: 3G

36.88

.842107 5.3G

139.50

1.51229 323.2K

2.21


Happy reading,

Mathijs

Upgrade to 11.2.0.3 Grid Infra and Rdbms with Psu October 2013

Introduction

It is always a pleasure to be challenged with a new puzzle to solve. This time i was asked to take part in  the following scenario which will be implemented on various boxes:

  • Linux box(es) being upgraded from Red Hat Linux  5.5 to 5.9 .
  • Grid infra structure (GI) to be updated from 11.2.0.2 to 11.2.0.3 ( this is a need to work with Rdbms 11.2.0.3).
  • Database software (Rdbms) 11.2.0.3 Installation.
  • Implement Psu October 2013 on both GI and Rdbms
  • Upgrade the databases ( 24 in total ) to  11.2.0.3.

Environment: Linux Server with RH 5.5, 11.2 Grid Infra installed together with 11.2.0.2 Rdbms  on a single server ( Oracle Restart).

Quite a packed program right ?  Well the Linux part i was merely a spectator i  could only wait and see how the server was bounced  and see that  My Oracle Environment ( both the databases and the listener) came back up after bouncing  so the Restart environment was performing well.  After the third reboot it became time to do my actions. Below you will see a case study which i wanted to start at first cause  I thought that would save time .. In the end I implemented Plan B ( always good to have one available). Well better buckle up and let’s get started with it.

Summary:

Below you will find three scenarios you could follow, Scenario 1 would enable you to do installation in parallel and even a week in advance but  would need extra steps ( relinking software, enable the Grid Infra structure in a number of steps). In the End when setting this up and testing with it i came to the conclusion that it does not save that much time to do the installs before  the Linux upgrade .  Average installation is about  10 – 15 Minutes so i recommend to first have the Linux upgrade in place and after the Databases and listeners come back online when they (Linux) boot the server proceed with the Installations and patching .  The scenario is less complicated and i think even more error prone. So  this will be the scenario 2 and i will follow it on the environments where i am asked to do.

Addendum

Meanwhile i have implemented the plan and upgraded the databases ( 25 in total ). I was not very pleased with the  use of DBUA  (used it in a script with silent option) because  i indeed felt less control on the process. Two major setbacks i witnessed during the upgrade with DBU: 1) in my case it bugged me that it added a local listener to the init.oras while doing the upgrade cause that crashed the upgrade ( at a restart of database with that new generated – altered init.ora the db would of course not restart )  2) the Grid agent kept altering  the Oracle Home of the databases ( so was pointing to wrong env.). Well  Together with a colleague in the end we did save the day . But that was because we failed back to the manual upgrade Method. I have listed the activities and i will add it to the  Scenario : Real Life Implemented plan.

PS . by customers Request and due to the fact that the following Parameter altered its Default behavior ( was FALSE in 11.2.0.3 became TRUE)  i had to make sure the following parameter was set again in spfile:

alter system set “_use_adaptive_log_file_sync”=FALSE scope = both;

Important Add on. In all scenarios as a baseline  i ran utlu112i.sql on all Databases  in scope. And the good news was that all components  installed where  valid! AND i created a list of invalid objects per schema to compare to the situation after the upgrade ( as proof that this dba did not break the application).

As always happy reading,

Mathijs

Scenario 1  Installing software only  as a preparation:

When I started my preparations it seemed like the best thing to install  both the Software parts ( GI and Rdbms) as software only and perform the needed steps after that.  In this case following would have been performed:

  1. Install 11.2.0. 3 Rdbms as “software only”
  2. Install 11.2.0.3 GI as software only
  3. After  The Linux upgrade would have to relink the software (described below)
  4. Would have to perform various steps to activate  the 11.2.0. GI.
  5. Would have to implement  PSU October 2013
  6. Upgrade the Databases.

Scenario 1 After the Linux upgrade would have to relink my software in full again:

Stopping the databases under control in an easy way:
 As prep for the relinking of the software I performed following step:
srvctl status home -o /opt/oracle/product/112_ee_64/db -s /var/tmp/state_file.status
srvctl stop home -o /opt/oracle/product/112_ee_64/db -s /var/tmp/state_file.dmp
In Order to relink the Rdbms software:
After shutting down the databases (see above):
Had the ORACLE_HOME  set properly
$ORACLE_HOME/bin/relink all
Note: writing relink log to: /opt/oracle/product/112_ee_64/db/install/relink.log
In Order to relink the Oracle Restart software:
 Prepare the Oracle Grid Infrastructure for a Standalone Server home for modification using the following procedure:
  1. Log in as the Oracle Grid Infrastructure software owner user and change the directory to the path Grid_home/bin, where Grid_home is the path to the Oracle Grid Infrastructure home:
cd /opt/crs/product/112_ee_64/crs/bin
  1. Shut down the Oracle Restart stack using the following command:
crsctl stop has –f
 This will show:
 CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'MySrvr1hr'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'MySrvr1hr'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'MySrvr1hr' succeeded
CRS-2673: Attempting to stop 'ora.evmd' on 'MySrvr1hr'
CRS-2677: Stop of 'ora.evmd' on 'MySrvr1hr' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'MySrvr1hr' has completed
CRS-4133: Oracle High Availability Services has been stopped.
oracle@MySrvr1hr:/opt/crs/product/112_ee_64/crs/bin [+ASM]#
oracle@MySrvr1hr:/opt/crs/product/112_ee_64/crs/bin [+ASM]#
 Then:
 Relink Oracle Grid Infrastructure for a Standalone Server using the following procedure:
  1. Login as root
  2. Log in as the Oracle Grid Infrastructure for a Standalone Server owner:
  3. Login as root again:
4.     # cd /opt/crs/product/112_ee_64/crs/crs/install
5.     # perl roothas.pl -unlock
7.     $ export ORACLE_HOME=/opt/crs/product/112_ee_64/crs
8.     $ $ORACLE_HOME/bin/relink
This will show: oracle@MySrvr1hr:/opt/oracle [+ASM]# $ORACLE_HOME/bin/relink
writing relink log to: /opt/crs/product/112_ee_64/crs/install/relink.log
10.  cd/opt/crs/product/112_ee_64/crs/rdbms/install/
11.  ./rootadd_rdbms.sh 
Note. Rootadd_rdbms  came back very fast  without any output.
12.  cd /opt/crs/product/112_ee_64/crs/crs/install
13.  perl roothas.pl -patch
Checked with:
  1. export $ORACLE_HOME=/opt/crs/product/112_ee_64/crs/
  2. ./crsctl check has
 This showed:CRS-4638: Oracle High Availability Services is online Starting the databases under control in an easy way: srvctl start home -o /opt/oracle/product/112_ee_64/db -s /var/tmp/state_file.dmp

Scenario 1 After the Linux upgrade would have to do various steps to activate the GI environment:

1)      First of all, please shutdown the current database & ASM instances.

oracle@MySrvr1hr:/opt/oracle/admin/tools [+ASM]# stop.ksh

2) You need to install the new 11.2.0.<#> Patchset Grid Infrastructure Standalone  “Software Only” first (on a separated Oracle Home/directory AKA “out of place”).

3) Then configure the CSS & OHAS services as root user:

/opt/oracle/product/112_ee_64_a/asm/crs/install/roothas.pl -deconfig -force
and
/opt/crs/product/112_ee_64/crs/crs/install/roothas.pl

4) Please perform the next steps as oracle or grid OS user (Grid Infrastructure OS owner):

/opt/crs/product/112_ee_64/crs/bin/crsctl modify resource “ora.cssd” -attr “AUTO_START=1”

/opt/crs/product/112_ee_64/crs/bin/crsctl modify resource “ora.diskmon” -attr “AUTO_START=1”

Note: On release & onwards 11.2.0.3 (non-Exadata), “ora.diskmon” is not required anymore since this is an Exadata related process responsible for the I/O fencing.

5) Restart the OHAS stack as grid or oracle OS user:

/opt/crs/product/112_ee_64/crs/bin/crsctl stop has

/opt/crs/product/112_ee_64/crs/bin/crsctl start has

6) Check the CSS & OHAS state as grid or oracle OS user:

/opt/crs/product/112_ee_64/crs/bin/crsctl check has

/opt/crs/product/112_ee_64/crs/bin/crsctl check css

/opt/crs/product/112_ee_64/crs/bin/ crsctl stat resource

/opt/crs/product/112_ee_64/crs/bin/crsctl stat res –t

Note: If the CSS & OHAS service did NOT start, then you will need to reboot the Linux box and check them again.

7) Remove the Old listener running under the old Grid Infrastructure Oracle Home, thru the NETCA GUI (from the old Grid Infrastructure Oracle Home).
ps -ef|grep inherit

oracle   23430     1  0  2012 ?        1-08:34:36 /opt/oracle/product/112_ee_64_a/db/bin/tnslsnr LISTENER_MYSRVR1HR -inherit

8) Recreate the default listener (LISTENER) using port 1521, thru the NETCA GUI on the new Grid Infrastructure Oracle Home.

9) Please create the init+ASM.ora file on the $ Grid Infrastructure Oracle Home>/dbs directory with the next parameters:

asm_diskgroups= <list of diskgroups>

asm_diskstring= ‘ORCL:*’ or ‘/dev/oracleasm/disks/*’

instance_type=’asm’

large_pool_size=12M

10) Add the ASM instance as grid or oracle user or the Grid Infrastructure installation owner and execute:

/opt/crs/product/112_ee_64/crs bin/srvctl add asm


11) Enable ASM instance Auto Start as follow:

/opt/crs/product/112_ee_64/crs /bin/crsctl modify resource “ora.asm” -attr “AUTO_START=1”


12) Make sure the disks are discovered by kfod:

Example:

/opt/crs/product/112_ee_64/crs /bin/kfod asm_diskstring=’/dev/mapper/asm-*p1′ disks=all

13) If so, then startup the ASM instance as follow:

export ORACLE_SID=+ASM

/opt/crs/product/112_ee_64/crs /bin/sqlplus “/as sysasm”

SQL> startup pfile=init+ASM.ora –#init file from point #9

SQL> show parameter asm

14) Validate that the original diskgroup(s) were mounted:

SQL> select name, state from v$asm_diskgroup;

15 Finally confirm the OHAS (autostart) services start as follows:

/opt/crs/product/112_ee_64/crs /bin/crsctl stop has

/opt/crs/product/112_ee_64/crs //bin/crsctl start has

/opt/crs/product/112_ee_64/crs //bin/crsctl stat res

/opt/crs/product/112_ee_64/crs //bin/crsctl stat res -t

16) Check the new HAS version as well as follows:

/opt/crs/product/112_ee_64/crs /bin/crsctl query has releaseversion
/opt/crs/product/112_ee_64/crs /bin/crsctl query has softwareversion


Scenario 1 And after that would be able to continue with applying the Psu October 2013 on the software:

Copy to GI home & extract /opt/oracle/product/11203_ee_64/stage/p6880880_112000_Linux-x86-64.zip
Create Response file oracle@mysrvr:/opt/oracle/product/11203_ee_64/db/OPatch/ocm/bin [11203]# ocm.rsp
/opt/crs/product/112_ee_64/crs/OPatch/ocm/bin [CRS]# ocm.rsp
As Root export PATH=/opt/crs/product/112_ee_64/crs/OPatch:$PATH
which opatch
opatch auto /opt/oracle/product/11203_ee_64/stage -ocmrf /opt/crs/product/112_ee_64/crs/OPatch/ocm/bin/ocm.rsp -oh /opt/crs/product/112_ee_64/crs, /opt/oracle/product/11203_ee_64/db

Note this command  performed an installation of the psu october both in my GI home and Rdbms home.).

After that Upgrade the databases with DBUA.

Scenario 2  ( Installing new Grid Infra and Rdbms AFTER Linux patching )

As i wrote this is my preferred scenario. It will involve less steps and the runInstaller will enable you to  Upgrade the existing environement. Please be aware that the 11.2.0.3 Installations are a so-called out-of-place installation requiring new Oracle Homes . For that purpose i have requested ( and got ) extra space  +15 Gb in /opt/oracle  and same space in /opt/crs. I followed the steps below bullet by bullet.

  • Create fresh copy init.ora /opt/oracle/admin/tools/cSpfile.ksh. (using scripts to create and save spfile).
  • Perform status and config /opt/oracle/admin/tools/cSrvctlAct.ksh status. (Checking and holding status of resources in logfiles).
  • /opt/oracle/admin/tools/cSrvctlAct.ksh config (Catching details on cluster config in files).
  • Stop the dbs / Listener /opt/oracle/admin/tools/cSrvctlAct.ksh stop (This was done by Grid Infra during GI install
  • Install Gi 1120.3 Option Oracle Grid infra structuren for a standalone server
  • Install Rdbms 1120.3 Software only. ( make sure no parallel installs before )
  • Copy to GI home & extract /opt/oracle/product/11203_ee_64/stage/p6880880_112000_Linux-x86-64.zip ( this is the most up to date opatch ).
  • Create Response file oracle@mysrvr1:/opt/oracle/product/11203_ee_64/db/OPatch/ocm/bin [11203]# ocm.rsp
  • /opt/crs/product/112_ee_64/crs/OPatch/ocm/bin/ocm.rsp (location and name are needed during opatch ).
  • As Root export PATH=/opt/crs/product/112_ee_64/crs/OPatch:$PATH
  • As Root which opatch
  • As Root opatch auto /opt/oracle/product/11203_ee_64/stage -ocmrf /opt/crs/product/112_ee_64/crs/OPatch/ocm/bin/ocm.rsp -oh /opt/crs/product/112_ee_64/crs, /opt/oracle/product/11203_ee_64/db (## will apply to these two homes)
  • As Oracle Perform DBUA per Instance cDbua.kshTwo odd things:
  • 1) created spfile in $ORACLE_HOME/dbs  (was before in ASM).
  • 2) had a local listener defined alll over sudden so needed get rid of it on this box again.
  • As oracle perform /opt/oracle/admin/tools/cCatBundle.ksh
  • As oracle perform /opt/oracle/admin/tools/cSrvctlmod.ksh (done by dbua so no need for it in sep script).
  • As oracle perform /opt/oracle/admin/tools/cSrvctlAct.ksh status (Checking the status in GI).
  • As oracle perform /opt/oracle/admin/tools/cSrvctlAct.ksh config  (Stop and start using the GI).
  • As oracle perform ln -s /opt/networker/lib/libnwora.so libobk.so in new oracle home  (Dont forget your Networker Library in the lib dir in the new oracle home like i did :)).
  • As oracle perform run an archive backup as check
  • As oracle perform  per Db alter system set "_use_adaptive_log_file_sync"=FALSE scope = both; (Needed in each upgraded db by customers request).
  • As oracle perform  per Db create new pfile for the database
  • As oracle perform  per Db remover local listener entry from created new pfile
  • As oracle perform  per Db Check spfile because after upgrade it is on $ORACLE_HOME/dbs again instead of in asm where i had expected it
  • As oracle perform  per Db create new spfile in asm
  • As oracle perform  per Db srvctl modify database -dMYDB1 -p '+DATA/MYDB1/spfileMYDB1.ora'

Scenario 3 The Real Life implemented Plan.

Install the software and patch
/opt/oracle/admin/tools/cSpfile.ksh Creating a copy of spfile to Init.ora
create Migration pfiles with larger settings for shared_pool Recommended setting Min. Value Shared_pool_size > 600M ( prefer 1024M)
/opt/oracle/admin/tools/cSrvctlAct.ksh status Check the current setup in the clusterware
/opt/oracle/admin/tools/cSrvctlAct.ksh config Check the current setup in the clusterware
Dryrun . Start ./runInstaller to check Prerequisites before install and correct if needed. check for warnings and errors and correct them
Option Oracle Grid infra structuren for a standalone server Install in separate Oracle Home And choose Upgrade
Software only. ( make sure no parallel installs before ) Oracle Home Needs 2 b empty
/opt/oracle/product/11203_ee_64/stage/p6880880_112000_Linux-x86-64.zip Copy and install the Latest Opatch to both GI and Rdbms
oracle@mysrvr1:/opt/oracle/product/11203_ee_64/db/OPatch/ocm/bin [11203]# ocm.rsp Create a response File  needed during the opatch .
/opt/crs/product/112_ee_64/crs/OPatch/ocm/bin [CRS]# ocm.rsp
export PATH=/opt/crs/product/112_ee_64/crs/OPatch:$PATH Opatch runs as root , Set Path
which opatch Check version of Opatch
opatch auto /opt/oracle/product/11203_ee_64/stage -ocmrf /opt/crs/product/112_ee_64/crs/OPatch/ocm/bin/ocm.rsp -oh /opt/crs/product/112_ee_64/crs, /opt/oracle/product/11203_ee_64/db Run Opatch
Upgrade the Databases
Perform DBUA per Instance DBUA messed up by adding local listener to Init.ora and continued altering the  oratab by the Grid agent. That is why i recommend against the DBU for bulk upgrades . I would script the Upgrade using a Fixed Oracle_HOME ( the new one ) and a dedicated initora / spfile  for the MIG.
Steps for Manual Upgrade: Preferred WAY !
Create a new spfile from a  migration pfile mig pfile has larger shared_pool-size
1) Start sqlplus and run catupgrd.sql script from the NEW  $ORACLE_HOME/rdbms/admin
sqlplus ” / as sysdba “
spool /tmp/upgrade<DB>.log
startup upgrade
set echo on
@?/rdbms/admin/catupgrd.sql; After Catupgrd.sql finishes it will shutdown the database
2) Check catupgrd.sql spool file for errors.
3) Restart the database in normal mode.
4)  @$ORACLE_HOME/rdbms/admin/catuppst.sql; Post steps for the migration
5)  @$ORACLE_HOME/rdbms/admin/utlrp.sql;
alter system set “_use_adaptive_log_file_sync”=FALSE scope = both; Requested by customer
set lines 2000
select instance_name from v$instance; Check sanity of upgrade
select * from v$version; Check sanity of upgrade
select COMP_NAME,VERSION,STATUS,MODIFIED from dba_registry order by 1; Check sanity of upgrade all the installed components should be valid !
select * from DBA_REGISTRY_HISTORY order by action_time desc Check if Catbundle Ran (shows most  recent Entry first.
Check $ORACLE_HOME/dbs for the presence / correct init.ora Should point to the Asm Diskgroup to an Spfile.
srvctl upgrade database -d <Db>  -o /opt/oracle/product/11203_ee_64/db Inform clusterware about altered  Oracle home
srvctl modify database -d<Db> -p ‘+DATA/<Db>/spfile<Db>.ora’ make sure Clusterware knows about the Spfile alter if needed
srvctl modify database  -d <Db>  -o ‘/opt/oracle/product/11203_ee_64/db’ make sure Clusterware knows about the New Oracle Home
If you have a listener per Database make sure it is started from the NEW oracle Home with the correct listener.ora
/opt/oracle/admin/tools/cSrvctlAct.ksh status Check status of db in cluster
/opt/oracle/admin/tools/cSrvctlAct.ksh config Check configuration of db in cluster
srvctl stop database -d  & start database -d as check . /opt/oracle/admin/tools/cSrvctlAct.ksh stop /opt/oracle/admin/tools/cSrvctlAct.ksh start As a test stop and start via srvctl stop/start database -d  <Db>
ln -s /opt/networker/lib/libnwora.so libobk.so in new oracle home Check For the Networker Lib present in the new Oracle Home
run an archive or control file  backup as check Run an archive backup as a  test

Mission completed.

Moving Your 11.2 Database to a new Diskgroup in ASM

Introduction:

I have been assigned with a new challenge with 11.2 Oracle On RedHat Linux with Grid Infrastructure and ASM.  Lets start with some background information. When I started building Real Application Clusters with ASM it was a standard to have merely one Disk group for data (often +DATA01) and one Disk group for the flash recovery area (+FRA01).  Over time that was altered as a standard by still having that one Data Disk group but using at least a dedicated disk group for the flash recovery area. When adding more databases to the same data disk group that might cause bottlenecks when one database grows fast claiming all space. Now a day standard would require a dedicated diskgroup both for data and fra. For one of the production databases I have been asked to deal with a scenario where there will be a new dedicated data disk group (the database already had a dedicated FRA disk group.

Below you will find the steps I performed in a sandbox as a scenario for upcoming change:

Details:

Working with the Spfile:

## First we start with finding location of the spfile that is being used.

SQL> show parameter spfile
NAME                                                      TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile                                                string       +DATA01/ASMdummy/spfileASMdummy.ora

## Next step is creating a pfile from this current spfile which will be used to recreate the spfile in the new disk group:

SQL> create pfile='/opt/oracle/ASMDUMMY/admin/pfile/initASMDUMMY.ora' from spfile ;
File created.

## Then create an spfile in the new disk group:

create spfile='+DATA02' from pfile='/opt/oracle/ASMDUMMY/admin/pfile/initASMDUMMY.ora';

## note alertlog shows:

Tue Nov 12 15:43:51 2013
SUCCESS: disk group DATA02 was mounted
Tue Nov 12 15:43:51 2013

NOTE: dependency between database ASMDUMMY and disk group resource ora.DATA02.dg is established

## Shutdown the database:

SQL> shutdown immediate;

## In a second screen make sure your environment is pointing to the ASM environment and go to ASMcmd  and go to the new disk group

asmcmd -p

## it shows:

ASMCMD [+DATA02/ASMDUMMY/PARAMETERFILE] > ls -ltr
WARNING:option 'r' is deprecated for 'ls'
please use 'reverse'
Type           Redund  Striped  Time             Sys  Name
PARAMETERFILE  UNPROT  COARSE   NOV 12 15:00:00  Y    spfile.256.831311031

## Copy the file spfile.256.831311031 to the new disk group which will work as a more human readable file name:

cp +DATA02/ASMDUMMY/PARAMETERFILE/spfile.256.831311031 +DATA02/ASMDUMMY/spfileASMdummy.ora

### After doing that on Linux level alter the location of the spfile in the init.ora in the $ORACLE_HOME/dbs

cd /opt/oracle/product/11203_ee_64_a/db/dbs [ASMDUMMY]# ls -ltr  initASMDUMMY.ora
-rw-r----- 1 oracle dba       45 Nov 12 14:07 initASMDUMMY.ora
### current content: initASMDUMMY.ora
SPFILE='+DATA01/ASMDUMMY/spfileASMDUMMY.ora'

## After changing the disk group my new init.ora looks like this:

SPFILE='+DATA02/ASMDUMMY/spfileASMDUMMY.ora'

## Start the  database to find out that you did good thing :

SQL> startup

Working with the control files and Perform Backup:

SQL> show parameter control_files
NAME                                                      TYPE VALUE
---------------------------- ----------- ------------------------------
control_files                  string  +DATA01/ASMdummy/control01.ctl,+FRA01/ASMdummy/control02.ctl

## Set new location of controlfile in SPFILE:

SQL> alter system set control_files='+DATA02/ASMdummy/control01.ctl' scope=spfile sid='*';

## if you work with one control file , at the end you have to do extra step again to have a two control files again ( I have documented that step ) , but when re-reading this note I think would have been better to :

## Set new location of controlfile in SPFILE:

SQL> alter system set control_files='+DATA02/ASMdummy/control01.ctl', '+FRA01/ASMdummy/control02.ctl' scope=spfile sid='*';

## Shutdown your database

SQL> shutdown;

## Open ASMcmd again with the environment pointing to +ASM instance:

ASMCMD> cp +DATA01/ASMdummy/control01.ctl +DATA02/ASMdummy/control01.ctl

## check it:

ls -l  +DATA02/ASMdummy/control01.ctl
ASMCMD [+] > ls -l  +DATA02/ASMdummy/control01.ctl
Type         Redund  Striped  Time             Sys  Name
N    control01.ctl => +DATA02/ASM/CONTROLFILE/control01.ctl.258.831312753

##  Start your database with startup nomount

SQL> startup nomount;

## Start an rman session : Open “rman target /” and restore from old control and mount + open database:

rman target /
restore controlfile to '+DATA02/ASMdummy/control01.ctl' from '+DATA01/ASMdummy/control01.ctl';

##This will show

Starting restore at 12.11.2013 16:15:54
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=35 device type=DISK
channel ORA_DISK_1: copied control file copy
Finished restore at 12.11.2013 16:15:55

##In Rman you should mount the database:

RMAN> sql 'alter database mount';

## this shows :

sql statement: alter database mount
released channel: ORA_DISK_1
## and in the alert:
Tue Nov 12 16:16:51 2013
SUCCESS: disk group DATA02 was mounted
Tue Nov 12 16:16:51 2013
NOTE: dependency between database ASMDUMMY and disk group resource ora.DATA02.dg is established
Tue Nov 12 16:16:55 2013
Successful mount of redo thread 1, with mount id 2267329971
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: alter database mount

## Now it is time to make a backup of the database into the new Disk group (Data02). If you are in a rac environment make sure all other instances are down. Issue the following command in rman because this will create a one to one copy of the database in the new Disk group:

backup as copy database format '+DATA02';
##This will show:
Starting backup at 12.11.2013 16:18:54
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=35 device type=DISK
channel ORA_DISK_1: starting datafile copy
input datafile file number=00001 name=+DATA01/ASMdummy/datafile/system.256.831304997
output file name=+DATA02/ASMdummy/datafile/system.259.831313135 tag=TAG20131112T161854 RECID=2 STAMP=831313144
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:15
channel ORA_DISK_1: starting datafile copy
input datafile file number=00002 name=+DATA01/ASMdummy/datafile/sysaux.257.831304997
output file name=+DATA02/ASMdummy/datafile/sysaux.260.831313151 tag=TAG20131112T161854 RECID=3 STAMP=831313157
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:15
channel ORA_DISK_1: starting datafile copy
input datafile file number=00005 name=+DATA01/ASMdummy/datafile/example.262.831305069
output file name=+DATA02/ASMdummy/datafile/example.261.831313165 tag=TAG20131112T161854 RECID=4 STAMP=831313170
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:07
channel ORA_DISK_1: starting datafile copy
input datafile file number=00003 name=+DATA01/ASMdummy/datafile/undotbs1.258.831304997
output file name=+DATA02/ASMdummy/datafile/undotbs1.262.831313173 tag=TAG20131112T161854 RECID=5 STAMP=831313173
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:01
channel ORA_DISK_1: starting datafile copy
copying current control file
output file name=+DATA02/ASMdummy/controlfile/backup.263.831313173 tag=TAG20131112T161854 RECID=6 STAMP=831313175
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:03
channel ORA_DISK_1: starting datafile copy
input datafile file number=00004 name=+DATA01/ASMdummy/datafile/users.259.831304997
output file name=+DATA02/ASMdummy/datafile/users.264.831313177 tag=TAG20131112T161854 RECID=7 STAMP=831313177
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:01
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 12.11.2013 16:19:38
channel ORA_DISK_1: finished piece 1 at 12.11.2013 16:19:39
piece handle=+DATA02/ASMdummy/backupset/2013_11_12/nnsnf0_tag20131112t161854_0.265.831313179 tag=TAG20131112T161854 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 12.11.2013 16:19:39

## Once that has finished issue the following command in Rman (note this switch to command performs the  … set newname for you .. :

switch database to copy;
##This will show you:
datafile 1 switched to datafile copy "+DATA02/ASMdummy/datafile/system.259.831313135"
datafile 2 switched to datafile copy "+DATA02/ASMdummy/datafile/sysaux.260.831313151"
datafile 3 switched to datafile copy "+DATA02/ASMdummy/datafile/undotbs1.262.831313173"
datafile 4 switched to datafile copy "+DATA02/ASMdummy/datafile/users.264.831313177"
datafile 5 switched to datafile copy "+DATA02/ASMdummy/datafile/example.261.831313165"

##When that is finished issue following command in rman :

alter database open;

##Your alert file has been updated with following information:

Tue Nov 12 16:21:17 2013
Switch of datafile 1 complete to datafile copy
checkpoint is 1040434
Switch of datafile 2 complete to datafile copy
checkpoint is 1040434
Switch of datafile 3 complete to datafile copy
checkpoint is 1040434
Switch of datafile 4 complete to datafile copy
checkpoint is 1040434
Switch of datafile 5 complete to datafile copy
checkpoint is 1040434
Tue Nov 12 16:22:00 2013
alter database open
Tue Nov 12 16:22:00 2013
Thread 1 opened at log sequence 10
Current log# 1 seq# 10 mem# 0: +FRA01/ASMdummy/onlinelog/group_1.257.831305063
Current log# 1 seq# 10 mem# 1: +DATA01/ASMdummy/onlinelog/group_1.264.831305765
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Nov 12 16:22:00 2013
SMON: enabling cache recovery
[29312] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:3669618038 end:3669618518 diff:480 (4 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is WE8MSWIN1252
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Tue Nov 12 16:22:03 2013
QMNC started with pid=33, OS id=8677
Tue Nov 12 16:22:06 2013
db_recovery_file_dest_size of 4122 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM disk group.
Completed: alter database open
Tue Nov 12 16:22:06 2013
Starting background process CJQ0
Tue Nov 12 16:22:06 2013
CJQ0 started with pid=34, OS id=8763

## At that moment it feels is if we are almost there ..

Create Destination(s)

## Open a Sqlplus session and check for the %create% parameter:

show parameter create
##This shows:
NAME                                                      TYPE VALUE
------------------------------------ ----------- ------------------------------
create_bitmap_area_size                   integer            8388608
create_stored_outlines                       string
db_create_file_dest                             string               +DATA01
db_create_online_log_dest_1           string               +FRA01
alter system set db_create_file_dest='DATA02' sid='*';
alter system set db_create_file_dest='DATA02' sid='*'
*
ERROR at line 1:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-01261: Parameter db_create_file_dest destination string cannot be translated
ORA-01262: Stat failed on a file destination directory
Linux-x86_64 Error: 2: No such file or directory
## pff typo  of course you should add the + in front of the disk group name
SQL> alter system set db_create_file_dest='+DATA02' sid='*';

Working with the temp files:

##Now it is time to work with the temp files. You will have to create a New temp tablespace in the new disk group, make that the default one and drop the old one:

SQL> select FILE_NAME from dba_temp_files;

## This shows:

FILE_NAME
--------------------------------------------------------------------------------
+DATA01/ASMdummy/tempfile/temp.261.831305069

##This will create a new temp tablespace , in the new disk group , make it the default tablespace and drop old

create temporary tablespace TEMP02 tempfile size 100m;
alter database default temporary tablespace TEMP02;
drop tablespace TEMP including contents;

##Check it again:

SQL> select FILE_NAME from dba_temp_files;
FILE_NAME
--------------------------------------------------------------------------------
+DATA02/ASMdummy/tempfile/temp02.266.831313635

Working with the redo logs:

In this step first we have to add new members to each group (to each thread (in a rac)). After that and after switching the log files you can delete the members in the old disk group

##First check the environment:

Set lines 2000
select * from v$log;
SQL>
GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE        MEMBERS ARC STATUS              FIRST_CHANGE# FIRST_TIME    NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ------------------- ------------ -------------------
1         1        10   52428800    512       2 YES INACTIVE                  1035793 12.11.2013 14:17:13      1067305 13.11.2013 06:00:48
2         1        11   52428800    512       2 NO  CURRENT            1067305 13.11.2013 06:00:48   2.8147E+14
3         1         9   52428800    512       2 YES INACTIVE                  1035790 12.11.2013 14:17:11      1035793 12.11.2013 14:17:13
## and
SQL> select GROUP#,MEMBER from v$logfile order by 1;
GROUP# MEMBER
---------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 +FRA01/ASMdummy/onlinelog/group_1.257.831305063
1 +DATA01/ASMdummy/onlinelog/group_1.264.831305765
2 +DATA01/ASMdummy/onlinelog/group_2.265.831305777
2 +FRA01/ASMdummy/onlinelog/group_2.258.831305065
3 +DATA01/ASMdummy/onlinelog/group_3.266.831305783
3 +FRA01/ASMdummy/onlinelog/group_3.259.831305065

##First we add new members to the correct , new disk group:

alter database add logfile member '+DATA02' to group 1;
alter database add logfile member '+DATA02' to group 2;
alter database add logfile member '+DATA02' to group 3;

###Check again:

select GROUP#,MEMBER from v$logfile order by 1;

## Perform several  switches to make sure the new members have been in use

SQL> alter system switch logfile;

##Alert showed:

Wed Nov 13 07:54:28 2013
Thread 1 advanced to log sequence 12 (LGWR switch)
Current log# 3 seq# 12 mem# 0: +FRA01/ASMdummy/onlinelog/group_3.259.831305065
Current log# 3 seq# 12 mem# 1: +DATA01/ASMdummy/onlinelog/group_3.266.831305783
Current log# 3 seq# 12 mem# 2: +DATA02/ASMdummy/onlinelog/group_3.269.831369169
Wed Nov 13 07:54:28 2013
Errors in file /opt/oracle/diag/rdbms/ASMdummy/ASMDUMMY/trace/ASMDUMMY_arc0_2197.trc:
ORA-00313: open failed for members of log group 2 of thread 1
Errors in file /opt/oracle/diag/rdbms/ASMdummy/ASMDUMMY/trace/ASMDUMMY_arc0_2197.trc:
ORA-00313: open failed for members of log group 2 of thread 1
Wed Nov 13 07:54:29 2013
Errors in file /opt/oracle/diag/rdbms/ASMdummy/ASMDUMMY/trace/ASMDUMMY_m000_22231.trc:
Errors in file /opt/oracle/diag/rdbms/ASMdummy/ASMDUMMY/trace/ASMDUMMY_m000_22231.trc:
Archived Log entry 8 added for thread 1 sequence 11 ID 0x8724b1a2 dest 1:
Wed Nov 13 07:55:38 2013
Thread 1 advanced to log sequence 13 (LGWR switch)
Current log# 1 seq# 13 mem# 0: +FRA01/ASMdummy/onlinelog/group_1.257.831305063
Current log# 1 seq# 13 mem# 1: +DATA01/ASMdummy/onlinelog/group_1.264.831305765
Current log# 1 seq# 13 mem# 2: +DATA02/ASMdummy/onlinelog/group_1.267.831369167
Wed Nov 13 07:55:38 2013
Archived Log entry 9 added for thread 1 sequence 12 ID 0x8724b1a2 dest 1:
Thread 1 cannot allocate new log, sequence 14
Checkpoint not complete
Current log# 1 seq# 13 mem# 0: +FRA01/ASMdummy/onlinelog/group_1.257.831305063
Current log# 1 seq# 13 mem# 1: +DATA01/ASMdummy/onlinelog/group_1.264.831305765
Current log# 1 seq# 13 mem# 2: +DATA02/ASMdummy/onlinelog/group_1.267.831369167
Thread 1 advanced to log sequence 14 (LGWR switch)
Current log# 2 seq# 14 mem# 0: +FRA01/ASMdummy/onlinelog/group_2.258.831305065
Current log# 2 seq# 14 mem# 1: +DATA01/ASMdummy/onlinelog/group_2.265.831305777
Current log# 2 seq# 14 mem# 2: +DATA02/ASMdummy/onlinelog/group_2.268.831369169
Wed Nov 13 07:55:45 2013
Archived Log entry 10 added for thread 1 sequence 13 ID 0x8724b1a2 dest 1:
Wed Nov 13 07:55:59 2013
Thread 1 advanced to log sequence 15 (LGWR switch)
Current log# 3 seq# 15 mem# 0: +FRA01/ASMdummy/onlinelog/group_3.259.831305065
Current log# 3 seq# 15 mem# 1: +DATA01/ASMdummy/onlinelog/group_3.266.831305783
Current log# 3 seq# 15 mem# 2: +DATA02/ASMdummy/onlinelog/group_3.269.831369169
Wed Nov 13 07:55:59 2013
Archived Log entry 11 added for thread 1 sequence 14 ID 0x8724b1a2 dest 1:

##It is time to drop the members from the old ( data01 )

select GROUP#,MEMBER from v$logfile order by 1;
GROUP# MEMBER
---------- -------------------------------------------------------------------------------- -------
1 +DATA01/ASMdummy/onlinelog/group_1.264.831305765
1 +DATA02/ASMdummy/onlinelog/group_1.267.831369167
1 +FRA01/ASMdummy/onlinelog/group_1.257.831305063
2 +DATA02/ASMdummy/onlinelog/group_2.268.831369169
2 +DATA01/ASMdummy/onlinelog/group_2.265.831305777
2 +FRA01/ASMdummy/onlinelog/group_2.258.831305065
3 +DATA02/ASMdummy/onlinelog/group_3.269.831369169
3 +DATA01/ASMdummy/onlinelog/group_3.266.831305783
3 +FRA01/ASMdummy/onlinelog/group_3.259.831305065
9 rows selected.

##So we have to drop the redo members that point to the old ( DATA01 ) Disk group:

alter database drop logfile member '+DATA01/ASMdummy/onlinelog/group_1.264.831305765';
alter database drop logfile member '+DATA01/ASMdummy/onlinelog/group_2.265.831305777';
alter database drop logfile member  '+DATA01/ASMdummy/onlinelog/group_3.266.831305783';
##When I did  this following error occurred:
SQL> alter database drop logfile member  '+DATA01/ASMdummy/onlinelog/group_3.266.831305783'
*
ERROR at line 1:
ORA-01609: log 3 is the current log for thread 1 - cannot drop members
ORA-00312: online log 3 thread 1: '+FRA01/ASMdummy/onlinelog/group_3.259.831305065'
ORA-00312: online log 3 thread 1: '+DATA01/ASMdummy/onlinelog/group_3.266.831305783'
ORA-00312: online log 3 thread 1: '+DATA02/ASMdummy/onlinelog/group_3.269.831369169'
##So I need another switch logfile since group 3 was current.
SQL> alter system switch logfile;
System altered.
SQL> alter database drop logfile member  '+DATA01/ASMdummy/onlinelog/group_3.266.831305783';
Database altered.

##Checked again

select GROUP#,MEMBER from v$logfile order by 1;
GROUP# MEMBER
---------- --------------------------------------------------------------------------------
1 +FRA01/ASMdummy/onlinelog/group_1.257.831305063
1 +DATA02/ASMdummy/onlinelog/group_1.267.831369167
2 +DATA02/ASMdummy/onlinelog/group_2.268.831369169
2 +FRA01/ASMdummy/onlinelog/group_2.258.831305065
3 +DATA02/ASMdummy/onlinelog/group_3.269.831369169
3 +FRA01/ASMdummy/onlinelog/group_3.259.831305065
6 rows selected.

Working in the clusterware:

After these activities I tried Stopping and starting via srvctl ( this is an oracle restart environment crashed ) ..   Alertlog was having error messages and the start failed …   I did notice that the environment was using the old SPFILE in +Data01 again … I checked the spfile which was wrong again..

cd :/opt/oracle/product/11203_ee_64_a/db/dbs [ASMDUMMY]# cat initASMDUMMY.ora
SPFILE='+DATA01/ASMDUMMY/spfileASMDUMMY.ora'                        # line added by Agent

## so the clusteragent had altered my changes

##altered init.ora again and started that worked ..

##In sqlplus:
select name from v$controlfile
union
select name from v$datafile
union
select name from v$tempfile
union
select member from v$logfile
union
select filename from v$block_change_tracking
union
select name from v$flashback_database_logfile;
## This shows:
NAME
--------------------------------------------------------------------------------
+DATA02/ASMdummy/control01.ctl
+DATA02/ASMdummy/datafile/example.261.831313165
+DATA02/ASMdummy/datafile/sysaux.260.831313151
+DATA02/ASMdummy/datafile/system.259.831313135
+DATA02/ASMdummy/datafile/undotbs1.262.831313173
+DATA02/ASMdummy/datafile/users.264.831313177
+DATA02/ASMdummy/onlinelog/group_1.267.831369167
+DATA02/ASMdummy/onlinelog/group_2.268.831369169
+DATA02/ASMdummy/onlinelog/group_3.269.831369169
+DATA02/ASMdummy/tempfile/temp02.266.831313635
+FRA01/ASMdummy/onlinelog/group_1.257.831305063
+FRA01/ASMdummy/onlinelog/group_2.258.831305065
+FRA01/ASMdummy/onlinelog/group_3.259.831305065
14 rows selected.

## In an Oracle restart or Rac Environment you need to check the Clusterware setup now since it has knowledge about spfile, disk groups being used etc.

##First check the configuration in the Clusterware for the database:

srvctl config database -d ASMDUMMY

##This shows:

Database unique name: ASMDUMMY
Database name: ASMDUMMY
Oracle home: /opt/oracle/product/11203_ee_64_a/db
Oracle user: oracle
Spfile: +DATA01/ASMDUMMY/spfileASMDUMMY.ora <<- OLD DISK GROUP
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Database instance: ASMDUMMY
Disk Groups: DATA01,FRA01,DATA02 <<- OLD DISK GROUP
Services:

##So we have to perform two action points:

  • Make the spfile point to the correct disk group ( our new +data02)
  • Disk groups attribute still knows about that data01 disk group ( and it should not)

##First modification will be to inform the Clusterware which spfile to use:

srvctl modify database -d ASMdummy -p ' +DATA02/ASMDUMMY/spfileASMDUMMY.ora'

##After that similar action for the disk groups:

srvctl modify database -d ASMdummy -a 'DATA02,FRA01'

## don’t believe it check it
srvctl config database -d ASMdummy
Database unique name: ASMDUMMY
Database name: ASMDUMMY
Oracle home: /opt/oracle/product/11203_ee_64_a/db
Oracle user: oracle
Spfile: +DATA02/ASMDUMMY/spfileASMDUMMY.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Database instance: ASMDUMMY
Disk Groups: DATA02,FRA01

##This looks better so now let’s do a stop & start with srvctl

srvctl stop database -d ASMdummy
srvctl start database -d ASMdummy

##That worked !  happy dba

##Since I started the scenario with one control file I needed to go back to two control files:

##Set new location of both ctlfile in SPFILE:

alter system set control_files='+DATA02/ASMdummy/control01.ctl','+FRA01/ASMdummy/control02.ctl' scope=spfile sid='*';

##Perform a shutdown

SQL> shutdown;

## Open ASMcmd (make sure you point to the ASM instance) and copy the controlfile to the Fra01 Disk group to get them in sync again:

ASMCMD> cp +DATA02/ASMdummy/control01.ctl +FRA01/ASMdummy/control02.ctl

##Check it:

ls -l  +FRA01/ASMdummy/control02.ctl

##In Sqlplus startup nomount

SQL> startup nomount;

## Start an  “rman target /” and restore from old control and mount & open database:

RMAN> restore controlfile to '+FRA01/ASMdummy/control02.ctl' from '+DATA02/ASMdummy/control01.ctl';
sql 'alter database mount';
sql 'alter database open';

##In an Sqlplus session final check :

SQL> show parameter control
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
control_files                        string      +DATA02/ASMdummy/control01.ctl, +FRA01/ASMdummy/control02.ctl

As always Happy reading,

Mathijs.

Asm Instance not starting after Cluster Node Reboot

Introduction.

I have been involved again in a situation where  the Rac cluster did not start after a reboot of the server during a maintenance window. And as always a true challenge that was. In such cases it is true that the alert log of the node and the ohasd logging  will be your best friends ( well together with Metalink and Google of course).

Details:

After a Os patching action on one of the nodes on one of my 11.2 Racs (Grid Infrastructure)  i was contacted can you please take a look cause the clusterware is not starting. After first investigation it showed that statement was not entirely true .  The cluster ware itself had been started but the  log file for the ohasd. showed following details , that it was not able to start the asm Resource due to ORA-01031: insufficient privileges.

this is what it showed:

## /opt/crs/product/11.2.0.2_a/crs/log/Mysrvr1r/ohasd [+ASM1]# view ohasd.log
2013-08-06 11:05:01.643: [    AGFW][1980881216] {0:0:2} Received the reply to the message: RESOURCE_CLEAN[ora.asm 1 1] ID 4100:411 from the agent /opt/crs/product/11.2.0.2_a/crs/bin/oraagent_oracle
2013-08-06 11:05:01.644: [    AGFW][1980881216] {0:0:2} Agfw Proxy Server sending the reply to PE for message:RESOURCE_CLEAN[ora.asm 1 1] ID 4100:410
2013-08-06 11:05:01.644: [   CRSPE][1991387456] {0:0:2} Received reply to action [Clean] message ID: 410
2013-08-06 11:05:01.644: [   CRSPE][1991387456] {0:0:2} Got agent-specific msg: ORA-01031: insufficient privileges
2013-08-06 11:05:01.646: [    AGFW][1980881216] {0:0:2} Received the reply to the message: RESOURCE_CLEAN[ora.asm 1 1] ID 4100:411 from the agent /opt/crs/product/11.2.0.2_a/crs/bin/oraagent_oracle
2013-08-06 11:05:01.646: [    AGFW][1980881216] {0:0:2} Agfw Proxy Server sending the reply to PE for message:RESOURCE_CLEAN[ora.asm 1 1] ID 4100:410
2013-08-06 11:05:01.646: [   CRSPE][1991387456] {0:0:2} Received reply to action [Clean] message ID: 410
2013-08-06 11:05:01.829: [    AGFW][1980881216] {0:0:2} Received the reply to the message: RESOURCE_CLEAN[ora.asm 1 1] ID 4100:411 from the agent /opt/crs/product/11.2.0.2_a/crs/bin/oraagent_oracle
2013-08-06 11:05:01.829: [    AGFW][1980881216] {0:0:2} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_CLEAN[ora.asm 1 1] ID 4100:410
2013-08-06 11:05:01.829: [   CRSPE][1991387456] {0:0:2} Received reply to action [Clean] message ID: 410
2013-08-06 11:05:01.829: [   CRSPE][1991387456] {0:0:2} RI [ora.asm 1 1] new internal state: [STABLE] old value: [CLEANING]
2013-08-06 11:05:01.829: [   CRSPE][1991387456] {0:0:2} CRS-2681: Clean of 'ora.asm' on 'Mysrvr1r' succeeded

That did not look all to good. I had a first guess about what was going on by trying to connect to the asm instance on that box via sqlplus ( sqlplus / as sysasm). When that showed  also the ORA-01031: insufficient privileges.

I had to giggle cause when  looking for that  message on the web  i ended up with my blog. Which proves once again that you can help yourself by helping others by sharing in the Oracle community.   Basically i  focused on  three metalink notes that might apply:

Troubleshooting ORA-1031: Insufficient Privileges While Connecting As SYSDBA [ID 730067.1]

UNIX: Checklist for Resolving Connect AS SYSDBA Issues [ID 69642.1]

UNIX: Diagnostic C program for ORA-1031 from CONNECT INTERNAL / AS SYSDBA [ID 67984.1]

The third note (67984.1) was my bingo !  So it was proved that my groupid ( dba) altered from 101 to some other value by a ldap lookup.. I have asked the Linux colleague  to disable these lookups and after that the asm instance started and all the instances as well.  As a workaround , in the /etc/ldap.conf they have added the oracle user to the nss_initgroups_ignoreusers to prevent this from happening.

Happy reading,

Mathijs

Plan to setup OID (Oracle implementation of the Ldap) in a Rac environment

Introduction:

For some time it was planned to implement OID and frankly from the start I knew that would be something different compared to the things I have done so far.  Well but then again it does represent a new frontier does it not? So I felt challenged with the implementation. As an extra aspect during this setup we implemented the OID with Rac databases as a backend.  But as always the quote of the day for this was Success just loves good preperations. So we started with a setup of  a plan. The plan we have implemented captures the following 7 steps to success:

  • Check and plan the layout of your Ldap environment and your ldap.ora. For us that meant creating an environment to hold all production environments, one to hold the test environments well and after all  we added a third content as a concatination of both other environemnts since we  wanted to make sure that in the ldap.ora with only one setting in place we could contact all environments for clients who  will be on citrix servers ( and as far as I know in ldap.ora only one DEFAULT_ADMIN_CONTEXT = “dc=central,dc=env ,dc=EU” which you can use without using full qualified names):

CN=ORCLADMIN

And

dc= prod dc=EU

And

dc=test dc=EU

And

“dc=central,dc=env,dc=EU”

### think about the Ports for the Ldap you will need, one port for NON SSL, one for SSL traffic. Ask yourself if you will come across Firewalls during the setup  & operation.

  • Install a database (in this setup we use 11.2 as version). Since we wanted to deliver a High available solution in this setup we have implanted 3 Rac databases with 2 Oracle Instances each.
  • Install Weblogic 10.3.2 —> It should not be needed in the first place I read in some articles but I spoke to colleagues and they recommended that I should download the Weblogic part as well.  Only download I see is Weblogic server 12c.  So I download installers with Oracle Weblogic Server and Oracle coherence: http://www.oracle.com/technetwork/middleware/weblogic/downloads/index.html Since I am installing on 64bit Linux I was considering download generic file but given the fact that these boxes do not have Java setup properly decided to go with (wls1211_linux32.bin).
  • Setup and implement rules with regard to bulk load and bulk delete.
  • Implement Replication between the Environments.

Happy reading,

Mathijs