Move Oracle Rac Database to a new diskgroup in Asm (A real life scenario)

Introduction:

Below you will find detailled steps to move a Rac database ( Oracle 11.2.0.3) to a different Diskgroup using a total maintenance mode scenario where the Database is not available during the maintenance window for the appliciaton.

Happy reading,

Mathijs

Detailled  Scenario of a diskgroup move of a Rac Database

## create fresh pfile  to be used  as a basis for a new spfile in the new diskgroup
create pfile=’/opt/oracle/MYDB1/admin/pfile/initMYDB1.ora.20140111_1800′ from spfile;
## create new spfile in the new diskgroup
create spfile=’+MYDB_DATA01′ from pfile=’/opt/oracle/MYDB1/admin/pfile/initMYDB1.ora.20140111_1800′;

## Shutdown the database vi the cluster :
##  all actions being performed after this shutdown using sqlplus till further notice !
srvctl stop database  -d MYDB

## In a second screen make sure your environment is pointing to the ASM environment and go to Asmcmd and go to the new disk group to find new name

asmcmd -p
cd MYDB_DATA01/MYDB/PARAMETERFILE

## it shows:
ls -ltr
spfile.256.836591237

## Copy the file spfile.256.831311031 to the new disk group which will work as a more human readable file name:

cp +MYDB_DATA01/MYDB/PARAMETERFILE/spfile.256.836591237  +MYDB_DATA01/MYDB/spfileMYDB.ora

### After doing that on Linux level alter the location of the spfile in the init.ora in the $ORACLE_HOME/dbs  ON ALL NODES  ( mysrvr25r / mysrvr26r)

cd/opt/oracle/product/11203_ee_64/db/dbs

ls -ltr  initMYDB*

### current content:  initMYDB1.ora check and adapt on second node as well
spfile=’+DATA1/MYDB/spfileMYDB.ora’
## After changing the disk group my new init.ora looks like this:

SPFILE=’+MYDB_DATA01/MYDB/spfileMYDB.ora’

## Start the  database to find out that you did good thing :

SQL> startup

## Working with the control files and Perform Backup:
## shows we have three controlfiles in place
SQL> show parameter control_files

NAME                                                      TYPE VALUE
—————————- ———– ——————————
+DATA1/MYDB/control01.ctl,+DATA1/MYDB/control02.ctl,+DATA1/MYDB/control03.ctl

## Set new location of controlfile in SPFILE:
alter system set control_files=’+MYDB_DATA01/MYDB/control01.ctl’, ‘+MYDB_FRA1/MYDB/control02.ctl’, ‘+MYDB_DATA01/MYDB/control03.ctl’ scope=spfile sid=’*’;
alter system set cluster_database=false scope=spfile;
## Shutdown your database
SQL> shutdown;

## Open Asmcmd again with the environment pointing to +ASM instance:
##  Copy the current control file from +DATA01 to the correct Diskgroups and sync them by this copy
ASMCMD
cp +DATA1/MYDB/control01.ctl  +MYDB_DATA01/MYDB/control01.ctl
cp +DATA1/MYDB/control01.ctl  +MYDB_FRA1/MYDB/control02.ctl
cp +DATA1/MYDB/control01.ctl  +MYDB_DATA01/MYDB/control03.ctl
## check it:
ls -l +MYDB_DATA01/MYDB/control01.ctl
ls -l +MYDB_FRA1/MYDB/control02.ctl
ls -l +MYDB_DATA01/MYDB/control03.ctl

##  Start your database with startup nomount

SQL> startup nomount;

## Start an rman session : Open “rman target /” and restore from old control and mount + open database:
## Not 100 % sure if this step was needed since we copied file in asmcmd already but  it wont hurt and takes little time
rman target /
restore controlfile to ‘+MYDB_DATA01/MYDB/control01.ctl’ from ‘+DATA1/MYDB/control01.ctl’;
restore controlfile to ‘+MYDB_FRA1/MYDB/control02.ctl’   from ‘+DATA1/MYDB/control01.ctl’;
restore controlfile to ‘+MYDB_DATA01/MYDB/control03.ctl’ from ‘+DATA1/MYDB/control01.ctl’;

##This will show
rman target /

Recovery Manager: Release 11.2.0.3.0 – Production on Sat Jan 11 18:53:29 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: MYDB (not mounted)

RMAN> restore controlfile to ‘+MYDB_DATA01/MYDB/control01.ctl’ from ‘+DATA1/MYDB/control01.ctl’;

Starting restore at 11.01.2014 18:54:00
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1010 device type=DISK

channel ORA_DISK_1: copied control file copy
Finished restore at 11.01.2014 18:54:02

RMAN> restore controlfile to ‘+MYDB_FRA1/MYDB/control02.ctl’   from ‘+DATA1/MYDB/control01.ctl’;

Starting restore at 11.01.2014 18:54:33
using channel ORA_DISK_1

channel ORA_DISK_1: copied control file copy
Finished restore at 11.01.2014 18:54:34

RMAN> restore controlfile to ‘+MYDB_DATA01/MYDB/control03.ctl’ from ‘+DATA1/MYDB/control01.ctl’;

Starting restore at 11.01.2014 18:54:51
using channel ORA_DISK_1

channel ORA_DISK_1: copied control file copy
Finished restore at 11.01.2014 18:54:52

## Mount the database via Rman

RMAN> sql ‘alter database mount’;

##This will show

List of instances:
1 (myinst: 1)
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
Sat Jan 11 18:39:32 2014
LCK0 started with pid=31, OS id=4011
Starting background process RSMN
Sat Jan 11 18:39:33 2014
RSMN started with pid=32, OS id=4015
ORACLE_BASE from environment = /opt/oracle
Sat Jan 11 18:39:33 2014
ALTER DATABASE   MOUNT
This instance was first to mount
NOTE: Loaded library: System
SUCCESS: diskgroup DATA1 was mounted
NOTE: dependency between database MYDB and diskgroup resource ora.DATA1.dg is established
Successful mount of redo thread 1, with mount id 2306013605
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT

## Now it is time to make a backup of the database into the new Disk group (+MYDB_DATA01). If you are in a rac environment make sure all other instances are down.

## Issue the following command in rman because this will create a one to one copy of the database in the new Disk group:

RMAN>backup as copy database format ‘+MYDB_DATA01’;

##This will show:

RMAN> backup as copy database format ‘+MYDB_DATA01’;

Starting backup at 11.01.2014 18:58:11
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1766 device type=DISK
channel ORA_DISK_1: starting datafile copy
input datafile file number=00006 name=+DATA1/MYDB/datafile/gemprod.275.786977627
output file name=+MYDB_DATA01/MYDB/datafile/gemprod.260.836593093 tag=TAG20140111T185812 RECID=15 STAMP=836593134
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:45
channel ORA_DISK_1: starting datafile copy
input datafile file number=00002 name=+DATA1/MYDB/datafile/sysaux.262.786977877
output file name=+MYDB_DATA01/MYDB/datafile/sysaux.261.836593139 tag=TAG20140111T185812 RECID=16 STAMP=836593155
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:25
channel ORA_DISK_1: starting datafile copy
input datafile file number=00001 name=+DATA1/MYDB/datafile/system.269.786977709
output file name=+MYDB_DATA01/MYDB/datafile/system.262.836593163 tag=TAG20140111T185812 RECID=17 STAMP=836593175
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:15
channel ORA_DISK_1: starting datafile copy
input datafile file number=00003 name=+DATA1/MYDB/datafile/undotbs1.266.786977783
output file name=+MYDB_DATA01/MYDB/datafile/undotbs1.263.836593179 tag=TAG20140111T185812 RECID=18 STAMP=836593187
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:15
channel ORA_DISK_1: starting datafile copy
input datafile file number=00007 name=+DATA1/MYDB/datafile/undotbs2.267.786977883
output file name=+MYDB_DATA01/MYDB/datafile/undotbs2.264.836593195 tag=TAG20140111T185812 RECID=19 STAMP=836593202
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:15
channel ORA_DISK_1: starting datafile copy
input datafile file number=00004 name=+DATA1/MYDB/datafile/users.259.786977783
output file name=+MYDB_DATA01/MYDB/datafile/users.265.836593209 tag=TAG20140111T185812 RECID=20 STAMP=836593211
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:03
channel ORA_DISK_1: starting datafile copy
input datafile file number=00005 name=+DATA1/MYDB/datafile/tools.263.786977709
output file name=+MYDB_DATA01/MYDB/datafile/tools.266.836593213 tag=TAG20140111T185812 RECID=21 STAMP=836593215
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:03
channel ORA_DISK_1: starting datafile copy
copying current control file
output file name=+MYDB_DATA01/MYDB/controlfile/backup.267.836593215 tag=TAG20140111T185812 RECID=22 STAMP=836593217
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:03
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 11.01.2014 19:00:18
channel ORA_DISK_1: finished piece 1 at 11.01.2014 19:00:19
piece handle=+MYDB_DATA01/MYDB/backupset/2014_01_11/nnsnf0_tag20140111t185812_0.268.836593219 tag=TAG20140111T185812 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 11.01.2014 19:00:19

## Once that has finished issue the following command in Rman (note this switch to command performs the  … set newname for you  which is great:

RMAN>switch database to copy;

##This will show you:

RMAN> switch database to copy;

datafile 1 switched to datafile copy “+MYDB_DATA01/MYDB/datafile/system.262.836593163”
datafile 2 switched to datafile copy “+MYDB_DATA01/MYDB/datafile/sysaux.261.836593139”
datafile 3 switched to datafile copy “+MYDB_DATA01/MYDB/datafile/undotbs1.263.836593179”
datafile 4 switched to datafile copy “+MYDB_DATA01/MYDB/datafile/users.265.836593209”
datafile 5 switched to datafile copy “+MYDB_DATA01/MYDB/datafile/tools.266.836593213”
datafile 6 switched to datafile copy “+MYDB_DATA01/MYDB/datafile/gemprod.260.836593093”
datafile 7 switched to datafile copy “+MYDB_DATA01/MYDB/datafile/undotbs2.264.836593195”

##When that is finished issue following command in rman :
RMAN>sql ‘alter database open’;

## Start / open  Second instance as well via Sqlplus  as a check on the second box.
alter system set cluster_database=true scope=spfile;
startup

##Your alert file has been updated with following information:
Sat Jan 11 19:02:35 2014
WARNING: cataloging database area datafile
+DATA1/MYDB/datafile/system.269.786977709 as recovery area datafilecopy.
This datafilecopy is accounted into used space. Consider incrementing
db_recovery_file_dest_size parameter value by size of datafile.
Switch of datafile 1 complete to datafile copy
checkpoint is 13560558508638
WARNING: cataloging database area datafile
+DATA1/MYDB/datafile/sysaux.262.786977877 as recovery area datafilecopy.
This datafilecopy is accounted into used space. Consider incrementing
db_recovery_file_dest_size parameter value by size of datafile.
Switch of datafile 2 complete to datafile copy
checkpoint is 13560558508638
WARNING: cataloging database area datafile
+DATA1/MYDB/datafile/undotbs1.266.786977783 as recovery area datafilecopy.
This datafilecopy is accounted into used space. Consider incrementing
db_recovery_file_dest_size parameter value by size of datafile.
Switch of datafile 3 complete to datafile copy
checkpoint is 13560558508638
WARNING: cataloging database area datafile
+DATA1/MYDB/datafile/users.259.786977783 as recovery area datafilecopy.
This datafilecopy is accounted into used space. Consider incrementing
db_recovery_file_dest_size parameter value by size of datafile.
Switch of datafile 4 complete to datafile copy
checkpoint is 13560558508638
WARNING: cataloging database area datafile
+DATA1/MYDB/datafile/tools.263.786977709 as recovery area datafilecopy.
This datafilecopy is accounted into used space. Consider incrementing
db_recovery_file_dest_size parameter value by size of datafile.
Switch of datafile 5 complete to datafile copy
checkpoint is 13560558508638
WARNING: cataloging database area datafile
+DATA1/MYDB/datafile/gemprod.275.786977627 as recovery area datafilecopy.
This datafilecopy is accounted into used space. Consider incrementing
db_recovery_file_dest_size parameter value by size of datafile.
Switch of datafile 6 complete to datafile copy
checkpoint is 13560558508638
WARNING: cataloging database area datafile
+DATA1/MYDB/datafile/undotbs2.267.786977883 as recovery area datafilecopy.
This datafilecopy is accounted into used space. Consider incrementing
db_recovery_file_dest_size parameter value by size of datafile.
Switch of datafile 7 complete to datafile copy
checkpoint is 13560558508638
Sat Jan 11 19:04:21 2014
alter database open

## Open a Sqlplus session and check for the %create% parameter:

show parameter create
SQL> show parameter db_create

NAME                                 TYPE        VALUE
———————————— ———– ——————————
db_create_file_dest                  string
db_create_online_log_dest_1          string
db_create_online_log_dest_2          string
db_create_online_log_dest_3          string
db_create_online_log_dest_4          string
db_create_online_log_dest_5          string

alter system set db_create_file_dest=’+MYDB_DATA01′ sid=’*’;
alter system set db_create_online_log_dest_1=’+MYDB_FRA1′ sid=’*’;

## check again
show parameter db_create
show parameter cluster

##Now it is time to work with the temp files. You will have to create a New temp tablespace in the new disk group, make that the default one and drop the old one:
set lines 200
col tablespace_name format a40
col file_name format a80
select tablespace_name,FILE_NAME, bytes/1024/1024 MB from dba_temp_files;

## This shows: FILE_NAME
TABLESPACE_NAME                          FILE_NAME                                                                                MB
—————————————- ——————————————————————————– ———-
TEMP                                     +DATA1/MYDB/tempfile/temp.268.774880131                                             8192

##This will create a new temp tablespace , in the new disk group , make it the default tablespace and drop old

create temporary tablespace TEMP02 tempfile size 1024m;
alter database default temporary tablespace TEMP02;
drop tablespace TEMP ;
create temporary tablespace TEMP tempfile size 8192m;
alter database default temporary tablespace TEMP;
drop tablespace TEMP02;

##Check it again:
set lines 200
col tablespace_name format a40
col file_name format a80
select tablespace_name,FILE_NAME, bytes/1024/1024 MB from dba_temp_files;

## Working with the redo logs:
## In this step first we have to add new members to each group (to each thread (in a rac)). After that and after switching the log files you can delete the members in the old disk group
## First check the environment:

Set lines 2000
select * from v$log;
SQL>
GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIME          NEXT_CHANGE# NEXT_TIME
———- ———- ———- ———- ———- ———- — —————- ————- ——————- ———— ——————-
1          1      28288  209715200        512          2 YES INACTIVE            1.3559E+13 08.01.2014 14:54:59   1.3559E+13 08.01.2014 15:24:59
2          1      28289  209715200        512          2 YES INACTIVE            1.3559E+13 08.01.2014 15:24:59   1.3559E+13 08.01.2014 15:54:59
3          1      28290  209715200        512          2 NO  CURRENT             1.3559E+13 08.01.2014 15:54:59   2.8147E+14
5          2      28097  209715200        512          2 YES INACTIVE            1.3559E+13 08.01.2014 15:24:53   1.3559E+13 08.01.2014 15:54:52
6          2      28098  209715200        512          2 NO  CURRENT             1.3559E+13 08.01.2014 15:54:52   2.8147E+14
7          2      28095  209715200        512          2 YES INACTIVE            1.3559E+13 08.01.2014 14:24:51   1.3559E+13 08.01.2014 14:54:53
8          2      28096  209715200        512          2 YES INACTIVE            1.3559E+13 08.01.2014 14:54:53   1.3559E+13 08.01.2014 15:24:53

## and
col member format a80
set pagesize 33
select GROUP#,MEMBER from v$logfile order by 1;
GROUP# MEMBER
———- ——————————————————————————–
1 +DATA1/MYDB/onlinelog/group_1.260.774880079
1 +MYDB_FRA1/MYDB/onlinelog/group_1.258.774880081
2 +DATA1/MYDB/onlinelog/group_2.264.774880083
2 +MYDB_FRA1/MYDB/onlinelog/group_2.257.774880085
3 +DATA1/MYDB/onlinelog/group_3.261.774880087
3 +MYDB_FRA1/MYDB/onlinelog/group_3.256.774880089
5 +DATA1/MYDB/onlinelog/group_5.271.774880749
5 +MYDB_FRA1/MYDB/onlinelog/group_5.259.774880751
6 +DATA1/MYDB/onlinelog/group_6.272.774880753
6 +MYDB_FRA1/MYDB/onlinelog/group_6.260.774880755
7 +DATA1/MYDB/onlinelog/group_7.273.774880757
7 +MYDB_FRA1/MYDB/onlinelog/group_7.261.774880759
8 +DATA1/MYDB/onlinelog/group_8.274.774880763
8 +MYDB_FRA1/MYDB/onlinelog/group_8.262.774880765

##First we add new members to the correct , new disk group:
alter database add logfile member ‘+MYDB_DATA01’ to group 1;
alter database add logfile member ‘+MYDB_DATA01’ to group 2;
alter database add logfile member ‘+MYDB_DATA01’ to group 3;
alter database add logfile member ‘+MYDB_DATA01’ to group 5;
alter database add logfile member ‘+MYDB_DATA01’ to group 6;
alter database add logfile member ‘+MYDB_DATA01’ to group 7;
alter database add logfile member ‘+MYDB_DATA01’ to group 8;

###Check again:
select GROUP#,MEMBER from v$logfile order by 1;

## Perform some switches to make sure the new members have been in use  ( archive log current performs logswitch in  the whole  Database )

alter system archive log current;

alter system archive log current;

alter system archive log current;

alter system archive log current;

alter system archive log current;

alter system archive log current;

alter system archive log current;

alter system archive log current;

## First check the environment again:
Set lines 2000
select * from v$log;

##It is time to drop the members from the old ( data1 )
select GROUP#,MEMBER from v$logfile order by 1;

GROUP# MEMBER
———- ——————————————————————————–
1 +DATA1/MYDB/onlinelog/group_1.260.774880079
1 +MYDB_FRA1/MYDB/onlinelog/group_1.258.774880081
2 +DATA1/MYDB/onlinelog/group_2.264.774880083
2 +MYDB_FRA1/MYDB/onlinelog/group_2.257.774880085
3 +DATA1/MYDB/onlinelog/group_3.261.774880087
3 +MYDB_FRA1/MYDB/onlinelog/group_3.256.774880089
5 +DATA1/MYDB/onlinelog/group_5.271.774880749
5 +MYDB_FRA1/MYDB/onlinelog/group_5.259.774880751
6 +DATA1/MYDB/onlinelog/group_6.272.774880753
6 +MYDB_FRA1/MYDB/onlinelog/group_6.260.774880755
7 +DATA1/MYDB/onlinelog/group_7.273.774880757
7 +MYDB_FRA1/MYDB/onlinelog/group_7.261.774880759
8 +DATA1/MYDB/onlinelog/group_8.274.774880763
8 +MYDB_FRA1/MYDB/onlinelog/group_8.262.774880765

##So we have to drop the redo members that point to the old ( DATA01 ) Disk group ( but the group can not be current !!):

GROUP# MEMBER
———- ——————————————————————————–
1 +DATA1/MYDB/onlinelog/group_1.260.774880079
2 +DATA1/MYDB/onlinelog/group_2.264.774880083
3 +DATA1/MYDB/onlinelog/group_3.261.774880087
5 +DATA1/MYDB/onlinelog/group_5.271.774880749
6 +DATA1/MYDB/onlinelog/group_6.272.774880753
7 +DATA1/MYDB/onlinelog/group_7.273.774880757
8 +DATA1/MYDB/onlinelog/group_8.274.774880763

alter database drop logfile member ‘+DATA1/MYDB/onlinelog/group_1.260.774880079’;
alter database drop logfile member ‘+DATA1/MYDB/onlinelog/group_2.264.774880083’;
alter database drop logfile member ‘+DATA1/MYDB/onlinelog/group_3.261.774880087’;
alter database drop logfile member ‘+DATA1/MYDB/onlinelog/group_5.271.774880749’;
alter database drop logfile member ‘+DATA1/MYDB/onlinelog/group_6.272.774880753’;
alter database drop logfile member ‘+DATA1/MYDB/onlinelog/group_7.273.774880757’;
alter database drop logfile member ‘+DATA1/MYDB/onlinelog/group_8.274.774880763’;

##Checked again
select GROUP#,MEMBER from v$logfile order by 1;

##Working in the clusterware:

After these activities I tried Stopping and starting via srvctl ( this is an oracle restart environment crashed ) ..   Alertlog was having error messages and the start failed …   I did notice that the environment was using the old SPFILE in +Data01 again … I checked the spfile which was wrong again..

cd  $ORACLE_HOME/db/dbs

cat initMYDB1.ora
spfile=’+DATA1/MYDB/spfileMYDB.ora’

## so the clusteragent had altered my changes
##altered init.ora again and started that worked ..

##In sqlplus:

select name from v$controlfile
union
select name from v$datafile
union
select name from v$tempfile
union
select member from v$logfile
union
select filename from v$block_change_tracking
union
select name from v$flashback_database_logfile;

## This shows:

NAME
——————————————————————————–
+DATA1/MYDB/changetracking/ctf.265.774880707
+DATA1/MYDB/control01.ctl
+DATA1/MYDB/control02.ctl
+DATA1/MYDB/control03.ctl
+DATA1/MYDB/datafile/gemprod.275.786977627
+DATA1/MYDB/datafile/sysaux.262.786977877
+DATA1/MYDB/datafile/system.269.786977709
+DATA1/MYDB/datafile/tools.263.786977709
+DATA1/MYDB/datafile/undotbs1.266.786977783
+DATA1/MYDB/datafile/undotbs2.267.786977883
+DATA1/MYDB/datafile/users.259.786977783
+DATA1/MYDB/onlinelog/group_1.260.774880079
+DATA1/MYDB/onlinelog/group_2.264.774880083
+DATA1/MYDB/onlinelog/group_3.261.774880087
+DATA1/MYDB/onlinelog/group_5.271.774880749
+DATA1/MYDB/onlinelog/group_6.272.774880753
+DATA1/MYDB/onlinelog/group_7.273.774880757
+DATA1/MYDB/onlinelog/group_8.274.774880763
+DATA1/MYDB/tempfile/temp.268.774880131
+MYDB_FRA1/MYDB/onlinelog/group_1.258.774880081
+MYDB_FRA1/MYDB/onlinelog/group_2.257.774880085
+MYDB_FRA1/MYDB/onlinelog/group_3.256.774880089
+MYDB_FRA1/MYDB/onlinelog/group_5.259.774880751
+MYDB_FRA1/MYDB/onlinelog/group_6.260.774880755
+MYDB_FRA1/MYDB/onlinelog/group_7.261.774880759
+MYDB_FRA1/MYDB/onlinelog/group_8.262.774880765

##  means another action point as with regard to the block change tracking
SELECT filename, status, bytes
FROM v$block_change_tracking;
2
FILENAME                                                                         STATUS          BYTES
——————————————————————————– ———- ———-
+DATA1/MYDB/changetracking/ctf.265.774880707                                  ENABLED      11599872

## Disable it and enable it on the new location ( the new disk group )
ALTER DATABASE DISABLE BLOCK CHANGE TRACKING;
ALTER DATABASE ENABLE BLOCK CHANGE TRACKING USING FILE ‘+MYDB_DATA01’;

## In an Oracle restart or Rac Environment you need to check the Clusterware setup now since it has knowledge about spfile, disk groups being used etc.
## First check the configuration in the Clusterware for the database:
srvctl config database -d MYDB
Database unique name: MYDB
Database name:
Oracle home: /opt/oracle/product/11203_ee_64/db
Oracle user: oracle
Spfile: +DATA1/MYDB/spfileMYDB.ora
Domain: prod.vis
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: MYDB
Database instances: MYDB1,MYDB2
Disk Groups: DATA1,MYDB_FRA1
Mount point paths:
Services: MYDB_TAF.prod.vis
Type: RAC
Database is administrator managed

##So we have to perform two action points:
·        Make the spfile point to the correct disk group ( our new +MYDB_DATA01)
·        Disk groups attribute still knows about that data1 disk group ( and it should not)

##First modification will be to inform the Clusterware which spfile to use:
srvctl modify database -d MYDB -p ‘+MYDB_DATA01/MYDB/spfileMYDB.ora’

##After that similar action for the disk groups:
srvctl modify database -d  MYDB -a ‘MYDB_DATA01,MYDB_FRA1’

## don’t believe it check it
srvctl config database -d MYDB

##This looks better so now let’s do a stop & start with srvctl
srvctl stop database -d MYDB
srvctl start database -d MYDB

##That worked !  happy dba

TNS-12514 in an Oracle Rac Environment

Introduction:

On a frequent bases team members do hotline service where they act as a first line of contact when there are issues with the Oracle Environment on the various boxes. In the last week of 2013 Monday 30st of December  it was my turn to do that duty when one of the interesting  issues showed that  even caused a Priority One incident on the server srvr_33r which is part of a  4 node Rac cluster with the  database MYDB. So once again time to gear up and go investigate what was going on. Users received the following Error Message:  TNS-12514: TNS:listener does not know of service requested in connect descriptor

Summary:

In the end it showed that the Oracle Instance on the 33r Box was suffering from a communication issue with its ASM instance since this ASM instance had issues with its shared_pool ( in the alert file of the Oracle Instance at the start of the incident i found : ORA-04031: unable to allocate 760 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^928″,”kglss”).

Due to that  shared_pool issue the Instance was unable to communicate with its ASM Instance when it needed the information about  its Local listener in the spfile. That made that information unavailable to the Oracle instance so the Cluster agent performed this :  ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.20.30.72)(PORT=1521))))’ SCOPE=MEMORY SID=’MYDB1′;

So It altered my LOCAL_LISTENER in memory to the default listener on the cluster, listening on 1521 with NO services defined for this specific database !!

Even worse was that the environment was hit by an Oracle Bug ( which was pointed out by a team member to me : Bug 11772838 – oraagent modifies LOCAL_LISTENER with an incorrect value, when the parameter was set in spfile (Doc ID 11772838.8)

I have implemented the workaround for the listener and will have to schedule a maintenance window on the cluster to fix the Memory issues in the ASM instance(s).

Details:

Environment setting:  4 node 11.2.0.3 Grid Infrastructure  with a mix of  Databases (and versions) present. Every database has a dedicated listener on a specific port. And services for that database register with that dedicated listener. Every database has its spfile in an ASM Diskgroup.

When i started performing checks i first looked at the listener on the first Node ( Instance1 is on srvr_33r)

# lsnrctl status listener_MYDB
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 30-DEC-2013 10:02:51
 Copyright (c) 1991, 2011, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=srvr_33r-vip)(PORT=20008)(IP=FIRST)))
 STATUS of the LISTENER
 ------------------------
 Alias LISTENER_MYDB
 Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production
 Start Date 23-OCT-2013 01:21:27
 Uptime 68 days 9 hr. 41 min. 24 sec
 Trace Level off
 Security ON: Local OS Authentication
 SNMP OFF
 Listener Parameter File /opt/oracle/product/11203_ee_64/db/network/admin/listener.ora
 Listener Log File /opt/oracle/MYDB/admin/network/log/listener_MYDB.log
 Listening Endpoints Summary...
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.20.30.72)(PORT=20008)))
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.20.30.71)(PORT=20008)))
 Services Summary...
 Service "MYDB" has 1 instance(s).
 Instance "MYDB", status UNKNOWN, has 1 handler(s) for this service...
 The command completed successfully

 

And i compared that to the information in the listener on the other node: 

(Instance 2 is on srvr_34r)
oracle@srvr_34r:/opt/oracle [MYDB2]# lsnrctl status listener_MYDB
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 30-DEC-2013 10:00:27
 Copyright (c) 1991, 2011, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=srvr_34r-vip)(PORT=20008)(IP=FIRST)))
 STATUS of the LISTENER
 ------------------------
 Alias LISTENER_MYDB
 Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production
 Start Date 23-OCT-2013 03:10:15
 Uptime 68 days 7 hr. 50 min. 12 sec
 Trace Level off
 Security ON: Local OS Authentication
 SNMP OFF
 Listener Parameter File /opt/oracle/product/11203_ee_64/db/network/admin/listener.ora
 Listener Log File /opt/oracle/MYDB/admin/network/log/listener_MYDB.log
 Listening Endpoints Summary...
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.20.30.75)(PORT=20008)))
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.20.30.73)(PORT=20008)))
 Services Summary...
 Service "MYDB" has 1 instance(s).
 Instance "MYDB", status UNKNOWN, has 1 handler(s) for this service...
 Service "MYDB.prod.nl" has 1 instance(s).
 Instance "MYDB2", status READY, has 1 handler(s) for this service...
 Service "MYDB_TAF.prod.nl" has 1 instance(s).
 Instance "MYDB2", status READY, has 1 handler(s) for this service...
 The command completed successfully

 

Conclusion: So in short the listener on srvr_33r for this specific database was unaware of its services even though  the services where / are in place.

HOWEVER  when I checked for the local listener on the srvr_33r server   in an sqlplus session  this is what I saw:

SQL> show parameter listener
NAME TYPE VALUE
 ------------------------------------ ----------- ------------------------------
 listener_networks string
 local_listener string (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.20.30.72)(PORT=1521))))
 remote_listener string srvr_3rcl-scan:33100

Which proves that the Spfile was at this morning time during incident ( December 30 as of 09:00)  no longer aware of its LOCAL listener !!! In the end we fixed the issue by adding the local listener again and after that issues with connectivity where resolved.

Note when I check the listener Logs I do see that listener was unaware of services at the start of  the incident .

/opt/oracle/MYDB/admin/network/log/listener_MYDB.log
30-DEC-2013 00:14:40 * (CONNECT_DATA=(SERVICE_NAME=MYDB.prod.nl)(SERVER=DEDICATED)(CID=(PROGRAM=c:\oraclexe\app\oracle\product\11.2.0\server\bin\ORACLE.EXE)(HOST=X1234567)(USER=SYSTEM))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.60.205.57)(PORT=58445)) * establish * MYDB.prod.nl * 12514
TNS-12514: TNS:listener does not currently know of service requested in connect descriptor
30-DEC-2013 00:14:40 * (CONNECT_DATA=(SERVICE_NAME=MYDB.prod.nl)(SERVER=DEDICATED)(CID=(PROGRAM=c:\oraclexe\app\oracle\product\11.2.0\server\bin\ORACLE.EXE)(HOST=X1234567)(USER=SYSTEM))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.60.205.57)(PORT=58446)) * establish * MYDB.prod.nl * 12514
TNS-12514: TNS:listener does not currently know of service requested in connect descriptor

In the alert logs of the Instance I found this information and I think that this has caused our PRiO 1 incident due to the instance could not talk to the asm instance:

view /opt/oracle/MYDB/admin/Arch/alert_MYDB1.log.20131230_000158
See Note 411.1 at My Oracle Support for error and packaging details.
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
Dumping diagnostic data in directory=[cdmp_20131229112802], requested by (instance=1, osid=29870 (M000)), summary=[incident=48997].
Sun Dec 29 11:28:07 2013
Errors in file /opt/oracle/MYDB/diag/rdbms/MYDB/MYDB1/trace/MYDB1_m000_29870.trc  (incident=48998):
ORA-04031: unable to allocate 760 bytes of shared memory ("shared pool","unknown object","KKSSP^928","kglss")
Incident details in: /opt/oracle/MYDB/diag/rdbms/MYDB/MYDB1/incident/incdir_48998/MYDB1_m000_29870_i48998.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
Sun Dec 29 11:28:08 2013
Dumping diagnostic data in directory=[cdmp_20131229112808], requested by (instance=1, osid=29870 (M000)), summary=[incident=48998].
Sun Dec 29 11:28:33 2013
ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.20.30.72)(PORT=1521))))' SCOPE=MEMORY SID='MYDB1';

Addendum:

After Altering the local listener to the required and correct listener i closed the ticket. Much to my surprise after couple of hours to be kicked by the same issue on the same database again. A  Colleague pointed me to Mos to  Bug 11772838 – oraagent modifies LOCAL_LISTENER with an incorrect value, when the parameter was set in spfile (Doc ID 11772838.8) .  So that proved i would need to make another step to have a proper workaround in place before i could start plan a maintenance window that would help to fix the cause of all this ( which was the Oracle Instance unable to communicate to its ASM instance when it needed  information about  the local listener in its spfile).

As Always Happy reading,

Mathijs

Rman Level backup and Block change Tracking and the Positive effects in IO load.

Introduction:

Recently another environment came under surveillance ( or was it showed up on the radar) due to performance issues within a specific time interval ( app. 20:00 – 22:00). A quick scan showed that on that box various Rac Instances where present and they all have Rman level 0 and level 1 backups . First urgent request by customer was to reschedule the various backups so they would not run together anymore. After that first adoption the performance already changed for the better.  Yet it was time again to wear the thinking hat and investigate the environment closer. Fortunately we have awr running on that box so it was a nice challenge to go find out.  In this blog you will find the feedback of those analyses.

Summary

Most important in this matter was to have AWR reports running on a frequent (hour) bases. Check with your licences if you are allowed to use the awr reports. From those Awr Reports we saw Peek  Disk IO during level 1 backups. Well That as not supposed to be the case .. So a dive into the details of the database showed that there was no block_change_tracking enabled. After doing so the level 1 backup was so much more balanced in its io – needs . So when you have large Databases ( mine is app 1.2 TB) make sure that you or your fellow Dbas have enabled this block_change_tracking if  you do level 1 backups with Rman.

Details.

Environment we are working with is a Real application Cluster on RedHat Linux with multiple Instances on the Nodes of the cluster. Backups are performed with Rman to Tape  with Level 0 and Level 1  on a daily basis . Customer complained about performance going down the tube in a specific time interval ( 20.00 )  – ( 22.00). And as always the backups where first prime suspects causing the performance drain. Frankly at first I was frowning with the idea.. ‘Let’s blame the backups again for the poor performance of the database’.. Investigations however showed a couple of shocks to me.

First it came to my notice that the database (11.2.0.3) was setup as a Real application cluster and it was 1.2 TB in size. Frankly I could not understand why this Database was pushed to tape with Rman backup instead of using disk backups or using Snapshot technology to mount the database on a backup server and prevent every kind of overhead on the disks.  But as they say once you are in the kitchen for the short-term you need to cook with the tools you  have present.

Block change tracking:

I came across this blog and  that was once again an eye Opener: http://arup.blogspot.nl/2008/09/magic-of-block-change-tracking.html

From the official oracle Documentation:

http://docs.oracle.com/cd/E11882_01/backup.112/e10642/rcmbckba.htm#BRADV8125

So once it is enabled Oracle uses a file to keep track of the blocks that are being  changed  in the database with the goal to ease the burden of incremental backups.

This is how you can find out if that is already in place:

select filename, status, bytes from v$block_change_tracking;

If  block change tracking is in place you will see something like this ( in my case the file is stored in an ASM diskgroup since I am working with a Rac database) :

FILENAME
------------------------------------------------------------------------------------------------------------------------------------------------
STATUS                BYTES
---------- ----------
+MYDB_DATA01/mydb/changetracking/ctf.407.833981253
ENABLED     158401024

The smallest size for this file is 10MB and that it will expand with 10 Mb intervals. From the Oracle documentation:  The size of the block change tracking file is proportional to the size of the database and the number of enabled threads of redo. The size of the block change tracking file can increase and decrease as the database changes. The size is not related to the frequency of updates to the database.

Typically, the space required for block change tracking for a single instance is about 1/30,000 the size of the data blocks to be tracked. For an Oracle RAC environment, it is 1/30,000 of the size of the database, times the number of enabled threads.

So enabling the block_change_tracking was a first improvement I have implemented for that database.

Awr Reports.

As I mentioned I am a lucky Dba to be allowed to use the Awr reports to do analyses to prove if the Rman backups indeed affected the performance of the database in a dramatic way:

A level 1Rman backup should be much faster – causing less overhead on the database since only the changes since last backup need to be put to tape right ?  WRONG!  ( well at least it is wrong if you did not enable the block_change_tracking yet). Ah but you want proof ..  See is believe right :

AWR without the block_change tracking enabled  the hour before Rman backup kicks in:

Top 5 Timed Foreground Events

Event

Waits

Time(s)

Avg wait (ms)

% DB time

Wait Class

db file sequential read

92,008

723

8

59.13

User I/O
DB CPU  

214

 

17.49

 
log file sync

192,402

194

1

15.85

Commit
control file sequential read

19,000

18

1

1.45

System I/O
enq: TX – index contention

1,604

14

9

1.17

Concurrency

IOStat by Function summary

Function Name

Reads: Data

Reqs per sec

Data per sec

Writes: Data

Reqs per sec

Data per sec

Waits: Count

Avg Tm(ms)

Others 2G

10.03

.554437 1.2G

3.35

.346592 36.7K

1.00

LGWR 1M

0.02

.000277 2.1G

109.02

.600501 195.4K

0.23

DBWR 0M

0.00

0M 1.2G

24.90

.342707 0  
Buffer Cache Reads 697M

24.55

.193414 0M

0.00

0M 88.4K

7.59

Direct Reads 344M

2.16

.095458 1M

0.02

.000277 0  
Direct Writes 0M

0.00

0M 42M

1.49

.011654 0  
RMAN 0M

0.01

0M 0M

0.00

0M 42

1.40

TOTAL: 3G

36.77

.843588 4.6G

138.78

1.30173 320.5K

2.35

 

 AWR report DURING the Rman backup without the block change tracking enabled:

Top 5 Timed Foreground Events with Level 1 Backup without Block_change_tracking

Event

Waits

Time(s)

Avg wait (ms)

% DB time

Wait Class

log file sync

119,690

30,133

252

74.69

Commit
db file sequential read

47,120

6,113

130

15.15

User I/O
enq: TX – index contention

5,454

1,107

203

2.74

Concurrency
direct path write

7,679

586

76

1.45

User I/O
control file sequential read

4,505

436

97

1.08

System I/O

As you can see the Avg wait time has increased dramatically. Let’s take a look at the IO and this was a real shock to me since even with a level 1 Backup all the database is being read (RMAN reading 1.2T):

IOStat by Function summary

Function Name

Reads: Data

Reqs per sec

Data per sec

Writes: Data

Reqs per sec

Data per sec

Waits: Count

Avg Tm(ms)

RMAN 1.2T

350.69

349.559 42M

0.23

.011712 4527

16.02

Others 1.1G

5.50

.305634 819M

3.66

.228389 20.2K

138.86

LGWR 1M

0.01

.000278 1.3G

33.35

.380091 57.1K

57.57

DBWR 0M

0.00

0M 834M

17.08

.232572 0  
Buffer Cache Reads 371M

13.04

.103458 0M

0.00

0M 46.6K

124.67

Direct Reads 341M

1.98

.095092 0M

0.02

0M 0  
Direct Writes 0M

0.00

0M 12M

0.43

.003346 0  
Streams AQ 0M

0.00

0M 0M

0.00

0M 14

56.64

TOTAL: 1.2T

371.22

350.064 3G

54.76

.856112 128.5K

93.26

 

Awr Report same time interval with block_change_tracking active and Rman Level 1 backup

Top 5 Timed Foreground Events

Event

Waits

Time(s)

Avg wait (ms)

% DB time

Wait Class

db file sequential read

89,148

675

8

55.22

User I/O
DB CPU  

222

 

18.15

 
log file sync

188,028

193

1

15.79

Commit
SQL*Net more data from client

125,490

81

1

6.65

Network
enq: TX – index contention

1,779

18

10

1.50

Concurrency

 

Average wait times are back to normal.

 IOStat by Function summary

Function Name

Reads: Data

Reqs per sec

Data per sec

Writes: Data

Reqs per sec

Data per sec

Waits: Count

Avg Tm(ms)

Others 2.2G

10.92

.629154 1.7G

5.39

.480530 41.3K

0.85

LGWR 1M

0.02

.000277 2.3G

106.90

.649395 191.2K

0.24

DBWR 0M

0.00

0M 1.2G

23.95

.353535 0  
Buffer Cache Reads 679M

24.03

.188274 0M

0.00

0M 86.7K

7.28

Direct Writes 0M

0.00

0M 87M

3.02

.024123 0  
RMAN 60M

0.94

.016636 15M

0.20

.004159 4008

0.49

Direct Reads 28M

0.96

.007763 2M

0.04

.000554 0  
Streams AQ 0M

0.00

0M 0M

0.00

0M 17

5.53

TOTAL: 3G

36.88

.842107 5.3G

139.50

1.51229 323.2K

2.21


Happy reading,

Mathijs

Upgrade to 11.2.0.3 Grid Infra and Rdbms with Psu October 2013

Introduction

It is always a pleasure to be challenged with a new puzzle to solve. This time i was asked to take part in  the following scenario which will be implemented on various boxes:

  • Linux box(es) being upgraded from Red Hat Linux  5.5 to 5.9 .
  • Grid infra structure (GI) to be updated from 11.2.0.2 to 11.2.0.3 ( this is a need to work with Rdbms 11.2.0.3).
  • Database software (Rdbms) 11.2.0.3 Installation.
  • Implement Psu October 2013 on both GI and Rdbms
  • Upgrade the databases ( 24 in total ) to  11.2.0.3.

Environment: Linux Server with RH 5.5, 11.2 Grid Infra installed together with 11.2.0.2 Rdbms  on a single server ( Oracle Restart).

Quite a packed program right ?  Well the Linux part i was merely a spectator i  could only wait and see how the server was bounced  and see that  My Oracle Environment ( both the databases and the listener) came back up after bouncing  so the Restart environment was performing well.  After the third reboot it became time to do my actions. Below you will see a case study which i wanted to start at first cause  I thought that would save time .. In the end I implemented Plan B ( always good to have one available). Well better buckle up and let’s get started with it.

Summary:

Below you will find three scenarios you could follow, Scenario 1 would enable you to do installation in parallel and even a week in advance but  would need extra steps ( relinking software, enable the Grid Infra structure in a number of steps). In the End when setting this up and testing with it i came to the conclusion that it does not save that much time to do the installs before  the Linux upgrade .  Average installation is about  10 – 15 Minutes so i recommend to first have the Linux upgrade in place and after the Databases and listeners come back online when they (Linux) boot the server proceed with the Installations and patching .  The scenario is less complicated and i think even more error prone. So  this will be the scenario 2 and i will follow it on the environments where i am asked to do.

Addendum

Meanwhile i have implemented the plan and upgraded the databases ( 25 in total ). I was not very pleased with the  use of DBUA  (used it in a script with silent option) because  i indeed felt less control on the process. Two major setbacks i witnessed during the upgrade with DBU: 1) in my case it bugged me that it added a local listener to the init.oras while doing the upgrade cause that crashed the upgrade ( at a restart of database with that new generated – altered init.ora the db would of course not restart )  2) the Grid agent kept altering  the Oracle Home of the databases ( so was pointing to wrong env.). Well  Together with a colleague in the end we did save the day . But that was because we failed back to the manual upgrade Method. I have listed the activities and i will add it to the  Scenario : Real Life Implemented plan.

PS . by customers Request and due to the fact that the following Parameter altered its Default behavior ( was FALSE in 11.2.0.3 became TRUE)  i had to make sure the following parameter was set again in spfile:

alter system set “_use_adaptive_log_file_sync”=FALSE scope = both;

Important Add on. In all scenarios as a baseline  i ran utlu112i.sql on all Databases  in scope. And the good news was that all components  installed where  valid! AND i created a list of invalid objects per schema to compare to the situation after the upgrade ( as proof that this dba did not break the application).

As always happy reading,

Mathijs

Scenario 1  Installing software only  as a preparation:

When I started my preparations it seemed like the best thing to install  both the Software parts ( GI and Rdbms) as software only and perform the needed steps after that.  In this case following would have been performed:

  1. Install 11.2.0. 3 Rdbms as “software only”
  2. Install 11.2.0.3 GI as software only
  3. After  The Linux upgrade would have to relink the software (described below)
  4. Would have to perform various steps to activate  the 11.2.0. GI.
  5. Would have to implement  PSU October 2013
  6. Upgrade the Databases.

Scenario 1 After the Linux upgrade would have to relink my software in full again:

Stopping the databases under control in an easy way:
 As prep for the relinking of the software I performed following step:
srvctl status home -o /opt/oracle/product/112_ee_64/db -s /var/tmp/state_file.status
srvctl stop home -o /opt/oracle/product/112_ee_64/db -s /var/tmp/state_file.dmp
In Order to relink the Rdbms software:
After shutting down the databases (see above):
Had the ORACLE_HOME  set properly
$ORACLE_HOME/bin/relink all
Note: writing relink log to: /opt/oracle/product/112_ee_64/db/install/relink.log
In Order to relink the Oracle Restart software:
 Prepare the Oracle Grid Infrastructure for a Standalone Server home for modification using the following procedure:
  1. Log in as the Oracle Grid Infrastructure software owner user and change the directory to the path Grid_home/bin, where Grid_home is the path to the Oracle Grid Infrastructure home:
cd /opt/crs/product/112_ee_64/crs/bin
  1. Shut down the Oracle Restart stack using the following command:
crsctl stop has –f
 This will show:
 CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'MySrvr1hr'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'MySrvr1hr'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'MySrvr1hr' succeeded
CRS-2673: Attempting to stop 'ora.evmd' on 'MySrvr1hr'
CRS-2677: Stop of 'ora.evmd' on 'MySrvr1hr' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'MySrvr1hr' has completed
CRS-4133: Oracle High Availability Services has been stopped.
oracle@MySrvr1hr:/opt/crs/product/112_ee_64/crs/bin [+ASM]#
oracle@MySrvr1hr:/opt/crs/product/112_ee_64/crs/bin [+ASM]#
 Then:
 Relink Oracle Grid Infrastructure for a Standalone Server using the following procedure:
  1. Login as root
  2. Log in as the Oracle Grid Infrastructure for a Standalone Server owner:
  3. Login as root again:
4.     # cd /opt/crs/product/112_ee_64/crs/crs/install
5.     # perl roothas.pl -unlock
7.     $ export ORACLE_HOME=/opt/crs/product/112_ee_64/crs
8.     $ $ORACLE_HOME/bin/relink
This will show: oracle@MySrvr1hr:/opt/oracle [+ASM]# $ORACLE_HOME/bin/relink
writing relink log to: /opt/crs/product/112_ee_64/crs/install/relink.log
10.  cd/opt/crs/product/112_ee_64/crs/rdbms/install/
11.  ./rootadd_rdbms.sh 
Note. Rootadd_rdbms  came back very fast  without any output.
12.  cd /opt/crs/product/112_ee_64/crs/crs/install
13.  perl roothas.pl -patch
Checked with:
  1. export $ORACLE_HOME=/opt/crs/product/112_ee_64/crs/
  2. ./crsctl check has
 This showed:CRS-4638: Oracle High Availability Services is online Starting the databases under control in an easy way: srvctl start home -o /opt/oracle/product/112_ee_64/db -s /var/tmp/state_file.dmp

Scenario 1 After the Linux upgrade would have to do various steps to activate the GI environment:

1)      First of all, please shutdown the current database & ASM instances.

oracle@MySrvr1hr:/opt/oracle/admin/tools [+ASM]# stop.ksh

2) You need to install the new 11.2.0.<#> Patchset Grid Infrastructure Standalone  “Software Only” first (on a separated Oracle Home/directory AKA “out of place”).

3) Then configure the CSS & OHAS services as root user:

/opt/oracle/product/112_ee_64_a/asm/crs/install/roothas.pl -deconfig -force
and
/opt/crs/product/112_ee_64/crs/crs/install/roothas.pl

4) Please perform the next steps as oracle or grid OS user (Grid Infrastructure OS owner):

/opt/crs/product/112_ee_64/crs/bin/crsctl modify resource “ora.cssd” -attr “AUTO_START=1”

/opt/crs/product/112_ee_64/crs/bin/crsctl modify resource “ora.diskmon” -attr “AUTO_START=1”

Note: On release & onwards 11.2.0.3 (non-Exadata), “ora.diskmon” is not required anymore since this is an Exadata related process responsible for the I/O fencing.

5) Restart the OHAS stack as grid or oracle OS user:

/opt/crs/product/112_ee_64/crs/bin/crsctl stop has

/opt/crs/product/112_ee_64/crs/bin/crsctl start has

6) Check the CSS & OHAS state as grid or oracle OS user:

/opt/crs/product/112_ee_64/crs/bin/crsctl check has

/opt/crs/product/112_ee_64/crs/bin/crsctl check css

/opt/crs/product/112_ee_64/crs/bin/ crsctl stat resource

/opt/crs/product/112_ee_64/crs/bin/crsctl stat res –t

Note: If the CSS & OHAS service did NOT start, then you will need to reboot the Linux box and check them again.

7) Remove the Old listener running under the old Grid Infrastructure Oracle Home, thru the NETCA GUI (from the old Grid Infrastructure Oracle Home).
ps -ef|grep inherit

oracle   23430     1  0  2012 ?        1-08:34:36 /opt/oracle/product/112_ee_64_a/db/bin/tnslsnr LISTENER_MYSRVR1HR -inherit

8) Recreate the default listener (LISTENER) using port 1521, thru the NETCA GUI on the new Grid Infrastructure Oracle Home.

9) Please create the init+ASM.ora file on the $ Grid Infrastructure Oracle Home>/dbs directory with the next parameters:

asm_diskgroups= <list of diskgroups>

asm_diskstring= ‘ORCL:*’ or ‘/dev/oracleasm/disks/*’

instance_type=’asm’

large_pool_size=12M

10) Add the ASM instance as grid or oracle user or the Grid Infrastructure installation owner and execute:

/opt/crs/product/112_ee_64/crs bin/srvctl add asm


11) Enable ASM instance Auto Start as follow:

/opt/crs/product/112_ee_64/crs /bin/crsctl modify resource “ora.asm” -attr “AUTO_START=1”


12) Make sure the disks are discovered by kfod:

Example:

/opt/crs/product/112_ee_64/crs /bin/kfod asm_diskstring=’/dev/mapper/asm-*p1′ disks=all

13) If so, then startup the ASM instance as follow:

export ORACLE_SID=+ASM

/opt/crs/product/112_ee_64/crs /bin/sqlplus “/as sysasm”

SQL> startup pfile=init+ASM.ora –#init file from point #9

SQL> show parameter asm

14) Validate that the original diskgroup(s) were mounted:

SQL> select name, state from v$asm_diskgroup;

15 Finally confirm the OHAS (autostart) services start as follows:

/opt/crs/product/112_ee_64/crs /bin/crsctl stop has

/opt/crs/product/112_ee_64/crs //bin/crsctl start has

/opt/crs/product/112_ee_64/crs //bin/crsctl stat res

/opt/crs/product/112_ee_64/crs //bin/crsctl stat res -t

16) Check the new HAS version as well as follows:

/opt/crs/product/112_ee_64/crs /bin/crsctl query has releaseversion
/opt/crs/product/112_ee_64/crs /bin/crsctl query has softwareversion


Scenario 1 And after that would be able to continue with applying the Psu October 2013 on the software:

Copy to GI home & extract /opt/oracle/product/11203_ee_64/stage/p6880880_112000_Linux-x86-64.zip
Create Response file oracle@mysrvr:/opt/oracle/product/11203_ee_64/db/OPatch/ocm/bin [11203]# ocm.rsp
/opt/crs/product/112_ee_64/crs/OPatch/ocm/bin [CRS]# ocm.rsp
As Root export PATH=/opt/crs/product/112_ee_64/crs/OPatch:$PATH
which opatch
opatch auto /opt/oracle/product/11203_ee_64/stage -ocmrf /opt/crs/product/112_ee_64/crs/OPatch/ocm/bin/ocm.rsp -oh /opt/crs/product/112_ee_64/crs, /opt/oracle/product/11203_ee_64/db

Note this command  performed an installation of the psu october both in my GI home and Rdbms home.).

After that Upgrade the databases with DBUA.

Scenario 2  ( Installing new Grid Infra and Rdbms AFTER Linux patching )

As i wrote this is my preferred scenario. It will involve less steps and the runInstaller will enable you to  Upgrade the existing environement. Please be aware that the 11.2.0.3 Installations are a so-called out-of-place installation requiring new Oracle Homes . For that purpose i have requested ( and got ) extra space  +15 Gb in /opt/oracle  and same space in /opt/crs. I followed the steps below bullet by bullet.

  • Create fresh copy init.ora /opt/oracle/admin/tools/cSpfile.ksh. (using scripts to create and save spfile).
  • Perform status and config /opt/oracle/admin/tools/cSrvctlAct.ksh status. (Checking and holding status of resources in logfiles).
  • /opt/oracle/admin/tools/cSrvctlAct.ksh config (Catching details on cluster config in files).
  • Stop the dbs / Listener /opt/oracle/admin/tools/cSrvctlAct.ksh stop (This was done by Grid Infra during GI install
  • Install Gi 1120.3 Option Oracle Grid infra structuren for a standalone server
  • Install Rdbms 1120.3 Software only. ( make sure no parallel installs before )
  • Copy to GI home & extract /opt/oracle/product/11203_ee_64/stage/p6880880_112000_Linux-x86-64.zip ( this is the most up to date opatch ).
  • Create Response file oracle@mysrvr1:/opt/oracle/product/11203_ee_64/db/OPatch/ocm/bin [11203]# ocm.rsp
  • /opt/crs/product/112_ee_64/crs/OPatch/ocm/bin/ocm.rsp (location and name are needed during opatch ).
  • As Root export PATH=/opt/crs/product/112_ee_64/crs/OPatch:$PATH
  • As Root which opatch
  • As Root opatch auto /opt/oracle/product/11203_ee_64/stage -ocmrf /opt/crs/product/112_ee_64/crs/OPatch/ocm/bin/ocm.rsp -oh /opt/crs/product/112_ee_64/crs, /opt/oracle/product/11203_ee_64/db (## will apply to these two homes)
  • As Oracle Perform DBUA per Instance cDbua.kshTwo odd things:
  • 1) created spfile in $ORACLE_HOME/dbs  (was before in ASM).
  • 2) had a local listener defined alll over sudden so needed get rid of it on this box again.
  • As oracle perform /opt/oracle/admin/tools/cCatBundle.ksh
  • As oracle perform /opt/oracle/admin/tools/cSrvctlmod.ksh (done by dbua so no need for it in sep script).
  • As oracle perform /opt/oracle/admin/tools/cSrvctlAct.ksh status (Checking the status in GI).
  • As oracle perform /opt/oracle/admin/tools/cSrvctlAct.ksh config  (Stop and start using the GI).
  • As oracle perform ln -s /opt/networker/lib/libnwora.so libobk.so in new oracle home  (Dont forget your Networker Library in the lib dir in the new oracle home like i did :)).
  • As oracle perform run an archive backup as check
  • As oracle perform  per Db alter system set "_use_adaptive_log_file_sync"=FALSE scope = both; (Needed in each upgraded db by customers request).
  • As oracle perform  per Db create new pfile for the database
  • As oracle perform  per Db remover local listener entry from created new pfile
  • As oracle perform  per Db Check spfile because after upgrade it is on $ORACLE_HOME/dbs again instead of in asm where i had expected it
  • As oracle perform  per Db create new spfile in asm
  • As oracle perform  per Db srvctl modify database -dMYDB1 -p '+DATA/MYDB1/spfileMYDB1.ora'

Scenario 3 The Real Life implemented Plan.

Install the software and patch
/opt/oracle/admin/tools/cSpfile.ksh Creating a copy of spfile to Init.ora
create Migration pfiles with larger settings for shared_pool Recommended setting Min. Value Shared_pool_size > 600M ( prefer 1024M)
/opt/oracle/admin/tools/cSrvctlAct.ksh status Check the current setup in the clusterware
/opt/oracle/admin/tools/cSrvctlAct.ksh config Check the current setup in the clusterware
Dryrun . Start ./runInstaller to check Prerequisites before install and correct if needed. check for warnings and errors and correct them
Option Oracle Grid infra structuren for a standalone server Install in separate Oracle Home And choose Upgrade
Software only. ( make sure no parallel installs before ) Oracle Home Needs 2 b empty
/opt/oracle/product/11203_ee_64/stage/p6880880_112000_Linux-x86-64.zip Copy and install the Latest Opatch to both GI and Rdbms
oracle@mysrvr1:/opt/oracle/product/11203_ee_64/db/OPatch/ocm/bin [11203]# ocm.rsp Create a response File  needed during the opatch .
/opt/crs/product/112_ee_64/crs/OPatch/ocm/bin [CRS]# ocm.rsp
export PATH=/opt/crs/product/112_ee_64/crs/OPatch:$PATH Opatch runs as root , Set Path
which opatch Check version of Opatch
opatch auto /opt/oracle/product/11203_ee_64/stage -ocmrf /opt/crs/product/112_ee_64/crs/OPatch/ocm/bin/ocm.rsp -oh /opt/crs/product/112_ee_64/crs, /opt/oracle/product/11203_ee_64/db Run Opatch
Upgrade the Databases
Perform DBUA per Instance DBUA messed up by adding local listener to Init.ora and continued altering the  oratab by the Grid agent. That is why i recommend against the DBU for bulk upgrades . I would script the Upgrade using a Fixed Oracle_HOME ( the new one ) and a dedicated initora / spfile  for the MIG.
Steps for Manual Upgrade: Preferred WAY !
Create a new spfile from a  migration pfile mig pfile has larger shared_pool-size
1) Start sqlplus and run catupgrd.sql script from the NEW  $ORACLE_HOME/rdbms/admin
sqlplus ” / as sysdba “
spool /tmp/upgrade<DB>.log
startup upgrade
set echo on
@?/rdbms/admin/catupgrd.sql; After Catupgrd.sql finishes it will shutdown the database
2) Check catupgrd.sql spool file for errors.
3) Restart the database in normal mode.
4)  @$ORACLE_HOME/rdbms/admin/catuppst.sql; Post steps for the migration
5)  @$ORACLE_HOME/rdbms/admin/utlrp.sql;
alter system set “_use_adaptive_log_file_sync”=FALSE scope = both; Requested by customer
set lines 2000
select instance_name from v$instance; Check sanity of upgrade
select * from v$version; Check sanity of upgrade
select COMP_NAME,VERSION,STATUS,MODIFIED from dba_registry order by 1; Check sanity of upgrade all the installed components should be valid !
select * from DBA_REGISTRY_HISTORY order by action_time desc Check if Catbundle Ran (shows most  recent Entry first.
Check $ORACLE_HOME/dbs for the presence / correct init.ora Should point to the Asm Diskgroup to an Spfile.
srvctl upgrade database -d <Db>  -o /opt/oracle/product/11203_ee_64/db Inform clusterware about altered  Oracle home
srvctl modify database -d<Db> -p ‘+DATA/<Db>/spfile<Db>.ora’ make sure Clusterware knows about the Spfile alter if needed
srvctl modify database  -d <Db>  -o ‘/opt/oracle/product/11203_ee_64/db’ make sure Clusterware knows about the New Oracle Home
If you have a listener per Database make sure it is started from the NEW oracle Home with the correct listener.ora
/opt/oracle/admin/tools/cSrvctlAct.ksh status Check status of db in cluster
/opt/oracle/admin/tools/cSrvctlAct.ksh config Check configuration of db in cluster
srvctl stop database -d  & start database -d as check . /opt/oracle/admin/tools/cSrvctlAct.ksh stop /opt/oracle/admin/tools/cSrvctlAct.ksh start As a test stop and start via srvctl stop/start database -d  <Db>
ln -s /opt/networker/lib/libnwora.so libobk.so in new oracle home Check For the Networker Lib present in the new Oracle Home
run an archive or control file  backup as check Run an archive backup as a  test

Mission completed.

When truncating a table does not work ORA-01426

Introduction.

This mornings incident brought a big smile to my face so i thought highest time to share it with the community again. I got a  call from a colleague who takes care of an application that he had issues with the 11.1 Oracle environment. He explained to me that he tried to truncate a table and that in stead of being rewarded with an empty table he got punished with a ORA-01426.  As often the web was a Dbas best friend again so the puzzle got solved.

Details:

This incident had two staging tables.  Both of them  had 1,000,000,000 ( that is right that is  1,000 million rows this is not a typo) , and  Oracle would not allow the Application  /  user to truncate that staging table in one blow by truncating it  because it brought ORA-01426 the horror! Hmm do i sound sarcastic yet cause frankly i was bobbing head when hearing these details.

Anyhow as always  Internet ( Metalink too)  is  your friend so in the end we had two options:

either patch it

Apply Patch 8226471

OR

a. Flush in-memory monitoring information for all tables in the dictionary.

exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO()

b. For problematic table set a small value for NUMROWS using
## for both tables in scope
exec DBMS_STATS.SET_TABLE_STATS( ‘MYUSER’,’TABLE_UPLD2′,NUMROWS=> 10000 )
exec DBMS_STATS.SET_TABLE_STATS( ‘MYUSER’,’TABLE_UPLD1′,NUMROWS=> 10000 )

c. Issue truncate/exchange partition statement.
exec  DBMS_UTILITY.EXEC_DDL_STATEMENT(‘TRUNCATE TABLE TABLE_UPLD2’);
exec  DBMS_UTILITY.EXEC_DDL_STATEMENT(‘TRUNCATE TABLE TABLE_UPLD1’);

Given the fact that this was a production issue i performed workaround and recommended customer to check with his software provider if this was an out of control cleanup issue of staging tables  or just a bug in the application software.

As always happy reading

Mathijs

First Thoughts on 12C from a Dba Perspective

Introduction.

Last few weeks ago I invested some time in attending  a webinar on Oracle 12CRac Upgrade  and  i was also present at the 2 Days  Oracle Roadshow for Partners in the Netherlands  and i must say I was really impressed with the community an Oracle sharing new Technology this way.  Of course an article like this can only give you a glimpse of all the great new features  that are out there for you  but I will try to show you some of my favorites  from a Dbas perspective.

It was not much of a surprise to see that Oracle in the Database environment will focus on Big Data (seems that it never stops growing)  on Engineered Systems ( still have to put my hands some time on an Exadata but so far no project brought me near that) and of course into the Cloud computing ( Not without a reason they called their latest Release 12C where C stands for Cloud computing and for consolidation of data(bases)).

From an architecture point it is clear that Oracle will focus on a continued drive towards shared servers with a common OS and shared Databases in the Multi tenant environment (on the web and in the Debates already know as the Pluggable Databases).  As a dba working in a large Data-center that makes extra curious and cautious cause in consolidation this way is also something to bear in mind  when consolidating more and more databases on less and less ( but more powerful machine). You and your team should try hard to bring similar environments together on such multi tenant environments cause from experience it can be very hard to patch the very  diverse environments on limited boxes ( even with rolling upgrades , Rac being present etc.).

Lets take a look at some of the key Focus Areas:

Multi tenant Architecture.

In order to save resources both in hardware,  memory by consolidate databases  Oracle has introduced  Container databases (aka CDB). A Container databases in itself will act like literally a container cause it will be able to hold multiple Pluggable Databases ( meanwhile already well-known as pluggable Databases  (PDB).  AS in mentioned in my intro please be careful what you bring together though  try to look for similar environments with regard to security , patching requirements etc.

The big promise the Container databases will offer:

  • Consolidation of (pluggable) databases in to a container save Resources , offer the ability to handle multiple as on and even  give separate PDBs different priorities in the Container Database.
  • With changing workload unplugging and plugging a Pluggable database into another container.
  • In separating duties between Infra structure dbas and Application Dbas  the Multi tenant Dbs  offer you the option to manage the containers  using the Infra structure Dbas and it offers you the option in the dedicated PDB to set up Application Dba environment  limited to that one PDB only.
  • Manage the PDBS as one since you set up  your backups on the Container Level ( restore on the  Pluggable Db level is of course possible).
  • Flexible management with the PDBS since you can move them towards different tiers when they become more mission critical. ( And  with a specific tier you cover Slas with regard to (High) availability and  restores).
  • Fast provisioning with regard to setting up cloning , copy of the Pluggable Databases. Which as a Dba i particularly like since it offers  option to refresh preproduction from  Prod, maybe even move from Preproduction to production that way.

Automatic Data Optimizing.

Even though i will not cover this in this Article i think it is good to mention key development as with regard to  Performance tuning and Information life cycle management.  You really should check out the great features of:

  • Heat Map
  • (Smart) compression ( somehow i think  this technology was adopted from the Exadata environment)
  • The Automated tiering . It looks really like dbas together with their customers can put an effort in automating Information life-cycle management  in 12c.

Flex ASM.

In my day to day business dealing with a lot of Rac environments , and indeed sofar ( 11.2 included)  it can be a real bad day when an ASM instance crashes cause it will take all other instances on that box with her.  IN 12c  there is no longer a 1:1 map between Node-ASM instance. So  you will be able to have nodes without ASM instances  that are connected ( indeed most likely using the same Interconnect) to nodes that Do have ASM instances. And  of course you are right ASM could still be a single point of failure..  but of course Oracle thought of that  by allowing that non ASM node to connect to a different ASM Instance and this concept really makes me excited !  Well Done Oracle .

Real Application Clusters (RAC).

  • Till 12c it was a requirement to have all servers same in capacities like cpu, memory that is no longer needed .
  • Adding – Removing nodes become more easy.
  • Setting up pools and being able to move nodes between pools .
  • Multiple scans possible.
  • Support of Ipv6.
  • Of course supporting the Multi tenant options that 12c is offering .
  • Flex clustering where  Central ( Hub nodes ) can be easily connected to so-called Leaf Nodes which  are not directly connected to the storage nor via interconnects to different nodes . The promise this offers is  that it will be easier to add or remove (leaf) nodes since they only connected to their hub node.

Hmm i guess i could be writing on an and on. But the purpose of this article is only to share some insights and to make you want more by looking for more yourself .

Some hints to get you even more started:

  • Two great book recommendations i recently purchased myself:
    • Expert Oracle Rac 12c ( reading it  and it is an asset to read !)
    • Expert Oracle Enterprise Manager 12c ( will be great reading about 12C Grind control since i really need to get my hand back on the Enterprise Manager).
  • The web  is really a Candy store with regard to information being present on the topic of 12c. My recommendation would be to get involved in blogging and twitter to have early insight on the developments in the community that is out there waiting for you
  • A big recommendation ( also a big Thank you to Steve Karam on the Web also know as the Oracle Alchemist ) who did  excellent job in offering a Summary on a lot of articles on 12c that are already out there. so TY Steve ! : http://www.oraclealchemist.com/news/install-oracle-12c-12-1/

AS always  happy reading.

Mathijs

Asm Instance not starting after Cluster Node Reboot

Introduction.

I have been involved again in a situation where  the Rac cluster did not start after a reboot of the server during a maintenance window. And as always a true challenge that was. In such cases it is true that the alert log of the node and the ohasd logging  will be your best friends ( well together with Metalink and Google of course).

Details:

After a Os patching action on one of the nodes on one of my 11.2 Racs (Grid Infrastructure)  i was contacted can you please take a look cause the clusterware is not starting. After first investigation it showed that statement was not entirely true .  The cluster ware itself had been started but the  log file for the ohasd. showed following details , that it was not able to start the asm Resource due to ORA-01031: insufficient privileges.

this is what it showed:

## /opt/crs/product/11.2.0.2_a/crs/log/Mysrvr1r/ohasd [+ASM1]# view ohasd.log
2013-08-06 11:05:01.643: [    AGFW][1980881216] {0:0:2} Received the reply to the message: RESOURCE_CLEAN[ora.asm 1 1] ID 4100:411 from the agent /opt/crs/product/11.2.0.2_a/crs/bin/oraagent_oracle
2013-08-06 11:05:01.644: [    AGFW][1980881216] {0:0:2} Agfw Proxy Server sending the reply to PE for message:RESOURCE_CLEAN[ora.asm 1 1] ID 4100:410
2013-08-06 11:05:01.644: [   CRSPE][1991387456] {0:0:2} Received reply to action [Clean] message ID: 410
2013-08-06 11:05:01.644: [   CRSPE][1991387456] {0:0:2} Got agent-specific msg: ORA-01031: insufficient privileges
2013-08-06 11:05:01.646: [    AGFW][1980881216] {0:0:2} Received the reply to the message: RESOURCE_CLEAN[ora.asm 1 1] ID 4100:411 from the agent /opt/crs/product/11.2.0.2_a/crs/bin/oraagent_oracle
2013-08-06 11:05:01.646: [    AGFW][1980881216] {0:0:2} Agfw Proxy Server sending the reply to PE for message:RESOURCE_CLEAN[ora.asm 1 1] ID 4100:410
2013-08-06 11:05:01.646: [   CRSPE][1991387456] {0:0:2} Received reply to action [Clean] message ID: 410
2013-08-06 11:05:01.829: [    AGFW][1980881216] {0:0:2} Received the reply to the message: RESOURCE_CLEAN[ora.asm 1 1] ID 4100:411 from the agent /opt/crs/product/11.2.0.2_a/crs/bin/oraagent_oracle
2013-08-06 11:05:01.829: [    AGFW][1980881216] {0:0:2} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_CLEAN[ora.asm 1 1] ID 4100:410
2013-08-06 11:05:01.829: [   CRSPE][1991387456] {0:0:2} Received reply to action [Clean] message ID: 410
2013-08-06 11:05:01.829: [   CRSPE][1991387456] {0:0:2} RI [ora.asm 1 1] new internal state: [STABLE] old value: [CLEANING]
2013-08-06 11:05:01.829: [   CRSPE][1991387456] {0:0:2} CRS-2681: Clean of 'ora.asm' on 'Mysrvr1r' succeeded

That did not look all to good. I had a first guess about what was going on by trying to connect to the asm instance on that box via sqlplus ( sqlplus / as sysasm). When that showed  also the ORA-01031: insufficient privileges.

I had to giggle cause when  looking for that  message on the web  i ended up with my blog. Which proves once again that you can help yourself by helping others by sharing in the Oracle community.   Basically i  focused on  three metalink notes that might apply:

Troubleshooting ORA-1031: Insufficient Privileges While Connecting As SYSDBA [ID 730067.1]

UNIX: Checklist for Resolving Connect AS SYSDBA Issues [ID 69642.1]

UNIX: Diagnostic C program for ORA-1031 from CONNECT INTERNAL / AS SYSDBA [ID 67984.1]

The third note (67984.1) was my bingo !  So it was proved that my groupid ( dba) altered from 101 to some other value by a ldap lookup.. I have asked the Linux colleague  to disable these lookups and after that the asm instance started and all the instances as well.  As a workaround , in the /etc/ldap.conf they have added the oracle user to the nss_initgroups_ignoreusers to prevent this from happening.

Happy reading,

Mathijs