Introduction

On a grid infra-structure (11.2.0.3.0) cluster (SRVR1Rr – 2r -3r) we got more and more tickets that the ASM instances where having fragmented memory for the shared pool. The investigation showed that there was no SPFILE present, with automatic memory management in there. So this environment clearly needed better. Only thing found was some out of date INIT.ORA. From history this environment had been set up as an 11gr1 RAC cluster, and some time ago it had been upgraded to 11gr2 Grid Infra.

In this blog you will see the steps to move to an SPFILE in a cluster environment step by step. But this procedure can / should also be followed in case of lost SPFILE in a RAC cluster.

Note. What was interesting to find out was the way Oracle uses a specific configuration file as a properties file when starting the Cluster ware. It is VERY important to understand that the activities below will have to be performed on ONE Node in the cluster!

After the work had finished I noticed that Oracle had created a new property file. But there was more to be wow about. After starting the Grid-Infra for the other two nodes I see that grid infra took care of those files on the other machines too. So apparently the order of importance with starting the ASM instances =

The property file
SPFILE if present
INIT.ORA if present

I have performed my activities on the first NODE in the cluster:

oracle@SRVR1R:/opt/crs/product/112_ee_64/crs/gpnp/SRVR1R/profiles/peer [+ASM1]# ls -al

total 20

drwxr-x— 2 oracle dba 4096 Sep 25 20:03 .

drwxr-x— 3 oracle dba 4096 Mar 16 2012 ..

-rw-r–r– 1 oracle dba 1946 Sep 25 19:41 profile.old

-rw-r–r– 1 oracle dba 2004 Mar 16 2012 profile_orig.xml

-rw-r–r– 1 oracle dba 2007 Sep 25 19:41 profile.xml

When I finished and first node was happy. I started the other two Cluster ware environments (on the other nodes) . And to my positive surprise I saw this:

oracle@SRVR2R:/opt/crs/product/112_ee_64/crs/gpnp/SRVR2R/profiles/peer []# ls -ltr

total 16

-rw-r–r– 1 oracle dba 2004 Mar 16 2012 profile_orig.xml

-rw-r–r– 1 oracle dba 2004 Mar 16 2012 profile.old

-rw-r–r– 1 oracle dba 2004 Sep 25 20:03 pending.old

-rw-r–r– 1 oracle dba 2007 Sep 25 20:03 profile.xml

oracle@SRVR3R:/opt/oracle [+ASM3]# cd oracle@SRVR3R:/opt/crs/product/112_ee_64/crs/gpnp/SRVR3R/profiles/peer [+ASM3]# ls -ltr

total 20

-rw-r–r– 1 oracle dba 2004 Mar 16 2012 profile_orig.xml

-rw-r–r– 1 oracle dba 2004 Mar 16 2012 profile.old

-rw-r–r– 1 oracle dba 2004 Sep 25 20:08 pending.old

-rw-r–r– 1 oracle dba 2007 Sep 25 20:08 profile.xml

-rw-r–r– 1 oracle dba 2007 Sep 25 20:08 pending.xml

Investigate the environments.

Since I had issues with memory on the SRVR1R Node ( +ASM1) I checked both other nodes for their settings first. Interesting part is that a lot of parameters are being default already like cluster_database = true etc. I am sure that is cause of that property file I spoke about in previous chapter. I used values for asm_diskgroup as preparation for my own new INIT.ORA file. In which I would also increase memory settings.

Plan of approach

I learned that my only chance will be to get things right now is to stop the FULL Cluster ware stack on ALL nodes in the cluster. THIS cannot be done on a per Node base ( trust me I have investigated that option). After shutting down then you will have to work with cluster ware on one node in a kind of restricted mode .

Once that situation is in place ( all down , working from One node without the crs active) I will have to present a prepared INIT.ORA , and create an SPFILE in a disk group in ASM.

Steps below will follow that scenario, describe it and show the needed aftercare I came across.

Stopping the grid infra:

First I stopped the databases using the corresponding srvctl commands. NOTE . this is a mixed env of 11.1 and 11.2 Databases all under control of the Grid infra structure of 11.2.0.3.0 and listeners so you NEED to use appropriate environment setting before going to work.

As ROOT:

On ALL Nodes to be done !

export PATH=/opt/crs/product/112_ee_64/crs/bin:$PATH

which crsctl

crsctl stop crs -f

`Stopping Grid Infra Information on SRVR1R`

First attempt showed errors below (marked in red). That was to be expected since SRVR1R, +ASM1 was having the memory issues in the first place

SRVR1R:root:/opt/crs/product/112_ee_64/crs/bin # crsctl stop crs -f

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘SRVR1R’

CRS-2673: Attempting to stop ‘ora.crsd’ on ‘SRVR1R’

CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on ‘SRVR1R’

CRS-2673: Attempting to stop ‘ora.oc4j’ on ‘SRVR1R’

CRS-2673: Attempting to stop ‘ora.cvu’ on ‘SRVR1R’

CRS-2673: Attempting to stop ‘ora.LISTENER_SCAN3.lsnr’ on ‘SRVR1R’

CRS-2673: Attempting to stop ‘ora.DATA03.dg’ on ‘SRVR1R’

CRS-2673: Attempting to stop ‘ora.DATA05.dg’ on ‘SRVR1R’

CRS-2673: Attempting to stop ‘ora.DATA1.dg’ on ‘SRVR1R’

CRS-2673: Attempting to stop ‘ora.FRA1.dg’ on ‘SRVR1R’

CRS-2677: Stop of ‘ora.scan3.vip’ on ‘SRVR1R’ succeeded

CRS-2672: Attempting to start ‘ora.scan3.vip’ on ‘SRVR2R’

CRS-5017: The resource action “ora.UNICAT_FRA01.dg stop” encountered the following error:

ORA-04031: unable to allocate 4000 bytes of shared memory (“shared pool”,”ALTER DISKGROUP UNICAT_FRA01…”,”sga heap(1,0)”,”kglsim heap”)

. For details refer to “(:CLSN00108:)” in “/opt/crs/product/112_ee_64/crs/log/SRVR1R/agent/crsd/oraagent_oracle/oraagent_oracle.log”.

CRS-5017: The resource action “ora.DATA1.dg stop” encountered the following error:

ORA-04031: unable to allocate 3896 bytes of shared memory (“shared pool”,”ALTER DISKGROUP DATA1 DISMOU…”,”sga heap(1,0)”,”kglsim object batch”)

. For details refer to “(:CLSN00108:)” in “/opt/crs/product/112_ee_64/crs/log/SRVR1R/agent/crsd/oraagent_oracle/oraagent_oracle.log”.

CRS-2794: Shutdown of Cluster Ready Services-managed resources on ‘SRVR1R’ has failed

CRS-2675: Stop of ‘ora.crsd’ on ‘SRVR1R’ failed

CRS-2799: Failed to shut down resource ‘ora.crsd’ on ‘SRVR1R’

CRS-2795: Shutdown of Oracle High Availability Services-managed resources on ‘SRVR1R’ has failed

CRS-4687: Shutdown command has completed with errors.

CRS-4000: Command Stop failed, or completed with errors.

Now that did not look good .

`Stopping Grid Infra Information on SRVR2R`

This one went smooth the way it is supposed 2 b.

SRVR2R:root:/root # crsctl stop crs -f

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘SRVR2R’

CRS-2673: Attempting to stop ‘ora.crsd’ on ‘SRVR2R’

CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on ‘SRVR2R’

CRS-2673: Attempting to stop ‘ora.ons’ on ‘SRVR2R’

CRS-2677: Stop of ‘ora.ons’ on ‘SRVR2R’ succeeded

CRS-2673: Attempting to stop ‘ora.net1.network’ on ‘SRVR2R’

CRS-2677: Stop of ‘ora.net1.network’ on ‘SRVR2R’ succeeded

CRS-2792: Shutdown of Cluster Ready Services-managed resources on ‘SRVR2R’ has completed

CRS-2677: Stop of ‘ora.crsd’ on ‘SRVR2R’ succeeded

CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘SRVR2R’ has completed

CRS-4133: Oracle High Availability Services has been stopped.

I checked it to see if there was any process hanging:

ps -ef|grep d.bin

And that was not the case nothing was showing so that was great.

`Stopping Grid Infra Information on SRVR3R`

SRVR3R:root:/root # crsctl stop crs -f

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘SRVR3R’

CRS-2673: Attempting to stop ‘ora.crsd’ on ‘SRVR3R’

CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on ‘SRVR3R’

CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘SRVR3R’ has completed

CRS-4133: Oracle High Availability Services has been stopped.

ps -ef|grep d.bin

Again I check on this node with the command above. Nothing was showing so that was great 2. Hmm some would say 2 out of three is not bad but of course we wanted the full score.

Issues on SRVR1R

As we saw earlier that shutdown the grid was not happy. An additional check:

SRVR1R:root:/opt/crs/product/112_ee_64/crs/bin # ps -ef|grep d.bin

It showed the following process still out there . Note I had checked the other two nodes also and there they were not present.

root 7868 1 0 May17 ? 04:14:54 /opt/crs/product/112_ee_64/crs/bin/ohasd.bin reboot

oracle 8466 1 0 May17 ? 00:04:07 /opt/crs/product/112_ee_64/crs/bin/mdnsd.bin

oracle 8494 1 0 May17 ? 01:12:02 /opt/crs/product/112_ee_64/crs/bin/gpnpd.bin

oracle 8549 1 0 May17 ? 05:22:23 /opt/crs/product/112_ee_64/crs/bin/gipcd.bin

oracle 9507 1 5 May17 ? 7-10:59:30 /opt/crs/product/112_ee_64/crs/bin/ocssd.bin

root 10113 1 0 May17 ? 01:12:38 /opt/crs/product/112_ee_64/crs/bin/octssd.bin reboot

root 10146 1 7 May17 ? 10-01:10:41 /opt/crs/product/112_ee_64/crs/bin/crsd.bin reboot

oracle 10159 1 0 May17 ? 00:43:47 /opt/crs/product/112_ee_64/crs/bin/evmd.bin

root 13975 1 0 Sep01 ? 04:47:23 /opt/crs/product/112_ee_64/crs/bin/osysmond.bin

root 28633 13319 0 18:45 pts/3 00:00:00 grep d.bin

Documentation showed I would have to kill them with kill -15

## So I issued:

kill -15 7868 8466 8494 8549 9507 10113 10146 10159 13975

Oh well that worked out not very well. System was hanging and after some time I noticed that the node was in a panic because it rebooted itself. That was not great at all. I showed patience and waited for the node to come back in full. Asked a Linux colleague to take a look and he confirmed in the end, system is struggling but getting online anytime soon. So I waited till the end.

Second attempt stop Grid Infra SRVR1R

OK to let the cat out of the bag J And this time it looked MUCH better!

SRVR1R:root:/root # crsctl stop crs -f

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘SRVR1R’

CRS-2673: Attempting to stop ‘ora.crsd’ on ‘SRVR1R’

CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on ‘SRVR1R’

CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘SRVR1R’ has completed

CRS-4133: Oracle High Availability Services has been stopped.

Ah so this time it was not only happy Node but also happy Dba. Of course I did final check:

ps -ef|grep d.bin

And all looks great now.

Started Grid Infra on one node in exclusive (restricted mode):

As the root user I would have to start the cluster ware (Grid Infra) in an exclusive mode without starting the crs.

SRVR1R:root:/root # crsctl start crs -excl -nocrs

CRS-4123: Oracle High Availability Services has been started.

CRS-2672: Attempting to start ‘ora.mdnsd’ on ‘SRVR1R’

CRS-2676: Start of ‘ora.mdnsd’ on ‘SRVR1R’ succeeded

CRS-2672: Attempting to start ‘ora.gpnpd’ on ‘SRVR1R’

CRS-2676: Start of ‘ora.gpnpd’ on ‘SRVR1R’ succeeded

CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘SRVR1R’

CRS-2672: Attempting to start ‘ora.gipcd’ on ‘SRVR1R’

CRS-2676: Start of ‘ora.cssdmonitor’ on ‘SRVR1R’ succeeded

CRS-2676: Start of ‘ora.gipcd’ on ‘SRVR1R’ succeeded

CRS-2672: Attempting to start ‘ora.cssd’ on ‘SRVR1R’

CRS-2672: Attempting to start ‘ora.diskmon’ on ‘SRVR1R’

CRS-2676: Start of ‘ora.diskmon’ on ‘SRVR1R’ succeeded

CRS-2676: Start of ‘ora.cssd’ on ‘SRVR1R’ succeeded

Startup the ASM instance manually.

I had prepared an INIT.ORA file with new and higher settings (and had included the Automatic memory management that came recommend by the Oracle. So with that specific INIT.ORA I started the ASM instance of the first node manually.

SQL> startup pfile=’/opt/oracle/initNew.ora’;

ASM instance started

Total System Global Area 4275781632 bytes

Fixed Size 2235208 bytes

Variable Size 4239991992 bytes

ASM Cache 33554432 bytes

ASM diskgroups mounted

## I checked all the diskgroups that I wanted to be mounted for all ASM instances and was happy to see them when I checked.

SQL> select * from v$asm_diskgroup;

SQL> select name,state from v$asm_diskgroup order by 1;

Note .. there was no specific disk group available for voting Disk and for my soon to be born SPFILE So that is a bummer .

So I choose the +DATA1 Disk group and created the SPFILE:

create SPFILE=’+DATA1′ from pfile=’/opt/oracle/initNew.ora’;

And then I saw error message below..

SQL> create SPFILE=’+DATA1′ from pfile=’/opt/oracle/initNew.ora’;

create SPFILE=’+DATA1′ from pfile=’/opt/oracle/initNew.ora’

ERROR at line 1:

ORA-17502: ksfdcre:4 Failed to create file +DATA1

ORA-15221: ASM operation requires compatible.asm of 11.2.0.0.0 or higher

So I had to check another thing:

col COMPATIBILITY form a10

col DATABASE_COMPATIBILITY form a10

col NAME form a20

select group_number, name,compatibility, database_compatibility from v$asm_diskgroup order by 1;

GROUP_NUMBER NAME COMPATIBIL DATABASE_C

6 DATA1 10.1.0.0.0 10.1.0.0.0

17 rows selected.

So I had to chance compatible.asm to proceed with the Disk group I had selected (being data1).

alter diskgroup data1 set attribute ‘compatible.asm’=’11.2’;

Diskgroup altered.

Then I created the SPFILE from pfile:

SQL> create SPFILE=’+DATA1′ from pfile=’/opt/oracle/initNew.ora’;

File created.

SQL> shutdown immediate;

As a non-believer of course I wanted to make sure about the new automatic memory management settings because maintenance windows like these should be very rare on a RAC cluster. So I restarted the instance manually again one more time. Cause I wanted 2 b happy with final checks.

The parameters marked are the ones I had increased or had set.

ASM instance started

Total System Global Area 4275781632 bytes

Fixed Size 2235208 bytes

Variable Size 4239991992 bytes

ASM Cache 33554432 bytes

ASM diskgroups mounted

SQL> show parameter SPFILE

NAME TYPE VALUE

———————————— ———– ——————————

SPFILE string +DATA1/srvr1r/asmparameterfile/registry.253.794950891

SQL> show parameter asm

NAME TYPE VALUE

———————————— ———– ——————————

asm_diskgroups string ALL Disk groups to be on mounted

asm_diskstring string

asm_power_limit integer 1

asm_preferred_read_failure_groups string

SQL> show parameter memory

NAME TYPE VALUE

———————————— ———– ——————————

memory_max_target big integer 4G

memory_target big integer 2G

SQL> show parameter sga

NAME TYPE VALUE

———————————— ———– ——————————

lock_sga boolean FALSE

sga_max_size big integer 4G

sga_target big integer 0

SQL> show parameter large

NAME TYPE VALUE

———————————— ———– ——————————

large_pool_size big integer 256M

use_large_pages string TRUE

SQL> show parameter proce

NAME TYPE VALUE

———————————— ———– ——————————

processes integer 999

This was everything I had planned for so time to shut down the instance .

SQL> shutdown immediate;

ASM diskgroups dismounted

ASM instance shutdown

SQL> exit

Time for the Normal restart of the first Node in the Grid infra

Since the node had been started in exclusive mode I first had to stop the cluster again:

AS the user ROOT.

crsctl stop crs

I was very curious, and crossed finger about the outcome.

As part of shops standards where each node has a mount point per DB. I established the link in the /opt/oracle

ls -s +ASM +ASM1

And started the grid-infra structure.

crsctl start crs

Once the first node was up and ASM was on its feet I did same for the two other nodes.

Last things

Had to restart the databases that where 11.2 Manually. Noticed that the 11.1 had restarted themselves. My stop procedure was causing that . If this is scenario is being performed on production box this part will need improvement.

Listeners of ASM did not come up as a resource due to .

Missing log directory but I added that one
Tried starting the listener from crs environment and it failed.

ora.SRVR1R.ASM1.asm ONLINE ONLINE on SRVR1R

ora.SRVR1R.listener_+ASM1.lsnr ONLINE OFFLINE

ora.SRVR2R.ASM2.asm ONLINE ONLINE on SRVR2R

ora.SRVR2R.listener_+ASM2.lsnr ONLINE ONLINE on SRVR2R

ora.SRVR3R.ASM3.asm ONLINE ONLINE on SRVR3R

ora.SRVR3R.listener_+ASM3.lsnr ONLINE OFFLINE

Manully I could start the listener after adding logfile directory (also part of standards). So I had to understrand what is going on . I think 11gr2 grid infra did not accept the + in the Name which was perfectly valind in 11.1. This is an action Point for the future.

Grid infra did start the listner for ASM2 but it I try to check the resource :

oracle@SRVR1R:/opt/oracle [+ASM1]# crsctl status resource ora.SRVR2R.listener_+ASM2.lsnr -p

CRS-4037: The name ‘ora.SRVR2R.listener_+ASM2.lsnr’ contains the following invalid characters: ‘+’

CRS-4000: Command Status failed, or completed with errors.

I have started the listeners manually , and they work . if I use srvctl start listener listener_+ASM1.lsnr then had errors similar as above.

In the end I had to start them with the oracle home pointed to 11gr1– and it worked.

Happy END

Mathijs Bruggink

Mathijs Bruggink

Tips Tricks and Blogs on Oracle

The Case of a Missing or Corrupt Spfile in your Grid Infra (11.2.0.3.0)

Introduction

Investigate the environments.

Plan of approach

Stopping the grid infra:

`Stopping Grid Infra Information on SRVR1R`

`Stopping Grid Infra Information on SRVR2R`

`Stopping Grid Infra Information on SRVR3R`

Issues on SRVR1R

Second attempt stop Grid Infra SRVR1R

Started Grid Infra on one node in exclusive (restricted mode):

Startup the ASM instance manually.

Time for the Normal restart of the first Node in the Grid infra

Last things

Leave a comment Cancel reply

Mathijs Bruggink

Introduction

Investigate the environments.

Plan of approach

Stopping the grid infra:

Stopping Grid Infra Information on SRVR1R

Stopping Grid Infra Information on SRVR2R

Stopping Grid Infra Information on SRVR3R

Issues on SRVR1R

Second attempt stop Grid Infra SRVR1R

Started Grid Infra on one node in exclusive (restricted mode):

Startup the ASM instance manually.

Time for the Normal restart of the first Node in the Grid infra

Last things

Share this:

Related

Leave a comment Cancel reply

`Stopping Grid Infra Information on SRVR1R`

`Stopping Grid Infra Information on SRVR2R`

`Stopping Grid Infra Information on SRVR3R`