Changing Hearbeat in Oracle Rac.

 
1As of 11.2 Grid Infrastructure, the private network configuration is not only stored in OCR but also in the gpnp profile.  If the private network is not available or its definition is incorrect, the CRSD process will not start and any subsequent changes to the OCR will be impossible. Therefore care needs to be taken when making modifications to the configuration of the private network. It is important to perform the changes in the correct order. Please also note that manual modification of gpnp profile is not supported.
Please take a backup of profile.xml on all cluster nodes before proceeding, as grid user:
cd $GRID_HOME/gpnp/<hostname>/profiles/peer/
cd /app/oracle/product/12.x.x/grid/gpnp/mysrvrahr/profiles/peer
cd /app/oracle/product/12.x.x/grid/gpnp/mysrvrbhr/profiles/peer
cp -p profile.xml profile.xml.bk
2Ensure Oracle Clusterware is running on ALL cluster nodes in the cluster and save current status of resource.
/app/oracle/product/12.x.x/grid/bin/crsctl check cluster -all
/app/oracle/product/12.x.x/grid/bin/crsctl status resource -t>/tmp/beforeNewIps.lst
3As grid user ( always curious who might that b, to me it was the Oracle user btw.):
Get the existing information.
Here you will see only One interconnect in place For example:
/app/oracle/product/12.x.x/grid/bin/oifcfg getif
bond1  172.18.112.208  global  cluster_interconnect
bond0  195.233.190.64  global  public
##
Check the interfaces / subnet address can be identified by command for eth specifically:
/app/oracle/product/12.x.x/grid/bin/oifcfg iflist|grep -i eth|sort
eth0  172.18.32.0
eth2  192.168.10.0
eth6  192.168.11.0
or check  interfaces / subnets in general on OS with oifcfg:
/app/oracle/product/12.x.x/grid/bin/oifcfg iflist|sort
4Add the new cluster_interconnect information:
/app/oracle/product/12.x.x/grid/bin/oifcfg setif -global eth2/192.168.10.0:cluster_interconnect,asm
/app/oracle/product/12.x.x/grid/bin/oifcfg setif -global eth6/192.168.11.0:cluster_interconnect,asm
5Verify the change:
/app/oracle/product/12.x.x/grid/bin/oifcfg getif

With this information checked and in place it is time for setting up new listeners for asm since the original ASM listener during the installation used eth0 and that eth0 will be dropped  – removed from cluster configuration in steps below:

Existing listener ASMNET1LSNR  will become new one ASMNET122LSNR.
srvctl add listener -asmlistener -l ASMNET1221LSNR -subnet 192.168.10.0 (as mentioned this is the eth2 interface that we are going to use).
srvctl add listener -asmlistener -l ASMNET1222LSNR -subnet 192.168.11.0 (as mentioned this is the eth6 interface that we are going to use).

As always seeing is believing : use 
crsctl status resource -t to see details.
Note: The new ASM listener is created as a resource and it is in a status offline offline on all nodes in the cluster at this point and time.

In the next step we will remove the old ASM listener, and use a -f option to prevent errors – messages with regard to dependencies.

srvctl update listener -listener ASMNET1LSNR_ASM -asm -remove -force
I have checked again with crsctl status resource -t to make sure the old resource is gone now.

Removing the old ASM listener
In the Mos note there is a little inconsistency because it claims  that as a next step the old ASM listener should be stopped.  I was able to grep for the listener ( ps -ef|grep -i inherit)  and i saw it on OS level on the machine(S). But I am not able to stop that listener  since the cluster resource is already gone and lsnrctl did not work. Solution: What I noticed that when I skipped this step and stopped and started the cluster which is mandatory in this scenario, the listener was gone on all nodes.
Should have given this command, but that is NOT working: lsnrctl stop ASMNET1LSNR_ASM
Check configuration before restarting GI:

First command: srvctl config listener -asmlistener Name: ASMNET122LSNR_ASM Type: ASM Listener Owner: oracle Subnet: 192.168.10.0 Home: <CRS home> End points: TCP:1527 Listener is enabled. Listener is individually enabled on nodes: Listener is individually disabled on nodes: Second Command: srvctl config asm ASM home: <CRS home> Password file: +VOTE/orapwASM Backup of Password file: ASM listener: LISTENER ASM instance count: ALL Cluster ASM listener: ASMNET122LSNR_ASM
6In GridInfrastructure: Shutdown Oracle Clusterware on all nodes and disable the Oracle Clusterware as root user ( in my example i was allowed to sudo ):
 sudo su – 
./app/oracle/product/12.x.x/grid/bin/crsctl stop crs
./app/oracle/product/12.x.x/grid/bin/crsctl disable crs
7Make the network configuration change at OS level as required, ensure the new interface is available on all nodes after the change. ( check to ping the interfaces on all nodes ).
for x in 10 11;do for xx in 75 76 77 78;do ping -c2 192.168.${x}.${xx}|egrep ‘icmp_seq|transmitted’;done;echo;done
for x in a b c d; do for xx in 1 2;do ping -c2 mysrvr${x}hr-hb$xx|egrep ‘icmp_seq|transmitted’;done;echo;done 
8Restart Oracle Clusterware on all nodes as root user:
sudo su – 
./app/oracle/product/12.x.x/grid/bin/crsctl start crs
9Check 
/app/oracle/product/12.x.x/grid/bin/crsctl check cluster -all
/app/oracle/product/12.x.x/grid/bin/crsctl status resource -t>/tmp/afterNewIps.lst
sdiff /tmp/afterNewIps.lst /tmp/beforeNewIps.lst
Enable Oracle Clusterware on all nodes as root user:
./app/oracle/product/12.x.x/grid/bin/crsctl enable crs
10Remove the old interface if required:
/app/oracle/product/12.x.x/grid/bin/oifcfg delif -global bond1/172.18.112.208:cluster_interconnect
11Verify the remove:
/app/oracle/product/12.x.x/grid/bin/oifcfg getif

Missing or Corrupted Spfile in ASM instance in Rac

Introduction:

Old saying is , always expect the unexpected, well  this time this was another proof of that. During patching of an 8 node cluster on the first node we came across 2 issues , both requiring a work around. Issue one was  that after applying January 2019 Cluster did not start. This workaround is not part of this note btw. The second issue was that once we had the first Workaround in place the asm instance on the first node would not start. This note is explaining the steps followed to create a new spfile for the ASM instance(s) in a rac cluster.

General information:

This was the scenario where we found ourselves in: Patching in a rolling way had started on the  first node with January 2019 on 12.2 Oracle (GI and RDBMS) the first node was patched but crs would not start ( and hmm never liked opatchauto a lot 2 b honest). Together with Oracle support a work around was provided  but after that the ASM instance still would not start. During compare of environments it showed one very significant  memory setting on this cluster. Maybe this would have been MEMORY_*  and we are using hugepages on that cluster)  that prevented asm from starting once the patching  on the first node completed .

As a work around we created a pfile and once the cluster on node one was up we started the asm instance with the  pfile ( which was altered by me).

However we could no longer update the spfile for the other asm instances since oracle told us that in a rolling upgrade mode  you cannot make changes to the spfile . That meant that on all 8 Nodes we performed the patching , and once crs was up we had to start the asm with a copy of the init.ora  we used on 1st node too. In our case 8 nodes with a local copy of the init.ora which did not make us happy at all. That however brought us to below scenario where you need to bring  the asm back to using an spfile .

Important note: Since oracle 11.2  the GPNP profile is the key for such change !!!!

From old days , came up with below scenario to create a pfile,  alter that file to meet your needs and bring that as a spfile for the asm instance :

Scenario

  • Could be used with a missing spfile  – or  corrupted .
  • With an existing spfile with wrong settings , but where  you cannot alter since you started  patching already ( spfile updates are prohibited in rolling upgrade scenario ) ,  so maybe best practice is to analyse spfile before.
  • Important message: Scenario  has the requirement that the FULL cluster stack is down and you will work with ONE node only!!

#### With spget you can check current location of spfile  in asmcmd.

ASMCMD [+] > spget

+VOTE/mysrvr18cl/ASMPARAMETERFILE/registry.253.978015605

#### created and altered the pfile on the first node. ( and copied it to all other nodes during the workaround).

oracle@mysrvr1dr:/app/grid/product/12201/grid/dbs []# cd /app/oracle/admin/+ASM1/pfile

oracle@mysrvr1dr:/app/oracle/admin/+ASM1/pfile []# ls -ltr

total 4

-rw-r–r–. 1 oracle dba 2433 Feb  8 09:23 initASM.ora

##### starting  cluster  1st attempt (recalled that the cluster needed 2 b in some part of restricted mode for that, so all of cluster was stopped , then below command was issued). But Oracle showed mercy , telling to use the correct syntax:

mysrvr1dr:root:/app/grid/product/12201/grid/bin $ ./crsctl start crs restrict

Parse error:

  ‘restrict’ is an invalid argument

Usage:

  crsctl start crs [-excl [-nocrs | -cssonly]] | [-wait | -waithas | -nowait] | [-noautostart]

     Start OHAS on this server

where

     -excl        Start Oracle Clusterware in exclusive mode

     -nocrs       Start Oracle Clusterware in exclusive mode without starting CRS

     -nowait      Do not wait for OHAS to start

     -wait        Wait until startup is complete and display all progress and status messages

     -waithas     Wait until startup is complete and display OHASD progress and status messages

     -cssonly     Start only CSS

     -noautostart Start only OHAS

## Then started cluster in exclusive mode ,  that failed too  btw since it tried to start the asm instance, which was still holding the original spfile with the incorrect information.

mysrvr1dr:root:/app/grid/product/12201/grid/bin $ ./crsctl start crs  -excl

  • CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘mysrvr1dr’
  • CRS-2676: Start of ‘ora.cssdmonitor’ on ‘mysrvr1dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.cssd’ on ‘mysrvr1dr’
  • CRS-2672: Attempting to start ‘ora.diskmon’ on ‘mysrvr1dr’
  • CRS-2676: Start of ‘ora.diskmon’ on ‘mysrvr1dr’ succeeded
  • CRS-2676: Start of ‘ora.cssd’ on ‘mysrvr1dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.ctssd’ on ‘mysrvr1dr’
  • CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip’ on ‘mysrvr1dr’
  • CRS-2676: Start of ‘ora.ctssd’ on ‘mysrvr1dr’ succeeded
  • CRS-2676: Start of ‘ora.cluster_interconnect.haip’ on ‘mysrvr1dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.asm’ on ‘mysrvr1dr’
  • CRS-2674: Start of ‘ora.asm’ on ‘mysrvr1dr’ failed
  • CRS-2672: Attempting to start ‘ora.storage’ on ‘mysrvr1dr’
  • ORA-15077: could not locate ASM instance serving a required diskgroup
  • CRS-2674: Start of ‘ora.storage’ on ‘mysrvr1dr’ failed
  • CRS-2679: Attempting to clean ‘ora.storage’ on ‘mysrvr1dr’
  • CRS-2681: Clean of ‘ora.storage’ on ‘mysrvr1dr’ succeeded
  • CRS-2673: Attempting to stop ‘ora.cluster_interconnect.haip’ on ‘mysrvr1dr’
  • CRS-2677: Stop of ‘ora.cluster_interconnect.haip’ on ‘mysrvr1dr’ succeeded
  • CRS-2673: Attempting to stop ‘ora.ctssd’ on ‘mysrvr1dr’
  • CRS-2677: Stop of ‘ora.ctssd’ on ‘mysrvr1dr’ succeeded
  • CRS-4000: Command Start failed, or completed with errors.

## Next attempt ,  whole cluster was down , on first node performed  

  • mysrvr1dr:root:/app/grid/product/12201/grid/bin $ ./crsctl start crs -excl -nocrs
  • ## in this nocrs mode  we were able to start the +ASM1 instance  manually now.
  • SQL> STARTUP PFILE=’/app/oracle/admin/+ASM1/pfile/initASM.ora’;
  • ## Once ASM had started  create the new SPFILE
  • create spfile = ‘+VOTE’ from PFILE=’/app/oracle/admin/+ASM1/pfile/initASM.ora’;
  • ## in Alert noticed this which is good SINCE this shows gpnp profile had been updated accordingly
  • 019-02-08T16:31:32.839547+01:00
  • NOTE: updated gpnp profile ASM SPFILE to
  • NOTE: header on disk 0 advanced to format #2 using fcn 0.0
  • NOTE: header on disk 2 advanced to format #2 using fcn 0.0
  • NOTE: updated gpnp profile ASM diskstring: /dev/mapper/ASM_*
  • NOTE: updated gpnp profile ASM diskstring: /dev/mapper/ASM_*
  • 2019-02-08T16:31:34.381619+01:00
  • NOTE: updated gpnp profile ASM SPFILE to +VOTE/mysrvr18cl/ASMPARAMETERFILE/registry.253.999707493

##### Checked our activities  in ASMCMD:

  • oracle@mysrvr1dr:/app/oracle/admin/+ASM1/pfile [+ASM1]# asmcmd
  • [Option  -p will be used ]
  • ASMCMD [+] > spget
  • +VOTE/mysrvr18cl/ASMPARAMETERFILE/registry.253.999707493

### Checked our activities in gpnptool

oracle@mysrvr1dr:/app/oracle/admin/+ASM1/pfile [+ASM1]# gpnptool get

Warning: some command line parameters were defaulted. Resulting command line:

         /app/grid/product/12201/grid/bin/gpnptool.bin get -o-

<?xml version=”1.0″ encoding=”UTF-8″?><gpnp:GPnP-Profile Version=”1.0″ xmlns=”http://www.grid-pnp.org/2005/11/gpnp-profile” xmlns:gpnp=”http://www.grid-pnp.org/2005/11/gpnp-profile” xmlns:orcl=”http://www.oracle.com/gpnp/2005/11/gpnp-profile” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd” ProfileSequence=”7″ ClusterUId=”afc024ecfd5ffff8ffbeda0a212bebe1″ ClusterName=”mysrvr18cl” PALocation=””><gpnp:Network-Profile><gpnp:HostNetwork id=”gen” HostName=”*”><gpnp:Network id=”net1″ IP=”198.19.11.0″ Adapter=”bond0″ Use=”public”/><gpnp:Network id=”net2″ IP=”192.168.10.0″ Adapter=”eth3″ Use=”asm,cluster_interconnect”/><gpnp:Network id=”net3″ IP=”192.168.11.0″ Adapter=”eth5″ Use=”cluster_interconnect”/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id=”css” DiscoveryString=”+asm” LeaseDuration=”400″/><orcl:ASM-Profile id=”asm” DiscoveryString=”/dev/mapper/ASM_*” SPFile=”+VOTE/mysrvr18cl/ASMPARAMETERFILE/registry.253.999707493″ Mode=”remote” Extended=”false”/><ds:Signature xmlns:ds=”http://www.w3.org/2000/09/xmldsig#“><ds:SignedInfo><ds:CanonicalizationMethod Algorithm=”http://www.w3.org/2001/10/xml-exc-c14n#“/><ds:SignatureMethod Algorithm=”http://www.w3.org/2000/09/xmldsig#rsa-sha1“/><ds:Reference URI=””><ds:Transforms><ds:Transform Algorithm=”http://www.w3.org/2000/09/xmldsig#enveloped-signature“/><ds:Transform Algorithm=”http://www.w3.org/2001/10/xml-exc-c14n#“> <InclusiveNamespaces xmlns=”http://www.w3.org/2001/10/xml-exc-c14n#” PrefixList=”gpnp orcl xsi”/></ds:Transform></ds:Transforms><ds:DigestMethod Algorithm=”http://www.w3.org/2000/09/xmldsig#sha1“/><ds:DigestValue>QH9UPO559zhufkrc7tFxQts6oF0=</ds:DigestValue></ds:Reference></ds:SignedInfo><ds:SignatureValue>aL2hOnxyLt5YwMcPjGg8LUDx2KD97Y75eLv+

yqvcfQ5O705K8ceQPCnwnsTs4Wn5E1jNeYCEzXnrVp5zM3hMbz9LdEEP2GKk9XJInQprWc39z7JKxm4uEw

NX3Ocs54FqxP1JdBX7PRiMh/

ePd8CoJIVtIaVMD29giX078uGwXcQ=</ds:SignatureValue></ds:Signature></gpnp:GPnP-Profile>

### since we have started cluster with -excl –nocrs  time to stop the cluster and start it normally

mysrvr1dr:root:/app/grid/product/12201/grid/bin $ ./crsctl stop crs

  • CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘mysrvr1dr’
  • CRS-2673: Attempting to stop ‘ora.crsd’ on ‘mysrvr1dr’
  • CRS-2677: Stop of ‘ora.crsd’ on ‘mysrvr1dr’ succeeded
  • CRS-2673: Attempting to stop ‘ora.cluster_interconnect.haip’ on ‘mysrvr1dr’
  • CRS-2673: Attempting to stop ‘ora.crf’ on ‘mysrvr1dr’
  • CRS-2673: Attempting to stop ‘ora.drivers.acfs’ on ‘mysrvr1dr’
  • CRS-2673: Attempting to stop ‘ora.gpnpd’ on ‘mysrvr1dr’
  • CRS-2673: Attempting to stop ‘ora.mdnsd’ on ‘mysrvr1dr’
  • CRS-2677: Stop of ‘ora.drivers.acfs’ on ‘mysrvr1dr’ succeeded
  • CRS-2677: Stop of ‘ora.cluster_interconnect.haip’ on ‘mysrvr1dr’ succeeded
  • CRS-2677: Stop of ‘ora.crf’ on ‘mysrvr1dr’ succeeded
  • CRS-2677: Stop of ‘ora.gpnpd’ on ‘mysrvr1dr’ succeeded
  • CRS-2673: Attempting to stop ‘ora.ctssd’ on ‘mysrvr1dr’
  • CRS-2673: Attempting to stop ‘ora.storage’ on ‘mysrvr1dr’
  • CRS-2677: Stop of ‘ora.storage’ on ‘mysrvr1dr’ succeeded
  • CRS-2677: Stop of ‘ora.mdnsd’ on ‘mysrvr1dr’ succeeded
  • CRS-2677: Stop of ‘ora.ctssd’ on ‘mysrvr1dr’ succeeded
  • CRS-2673: Attempting to stop ‘ora.cssd’ on ‘mysrvr1dr’

### starting cluster normally on first node in normal mode

mysrvr1dr:root:/app/grid/product/12201/grid/bin $ ./crsctl start crs

###  Had small issue , so decided to stop the cluster on node 1 with force option

mysrvr1dr:root:/app/grid/product/12201/grid/bin $ ./crsctl stop crs  -f

  • CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘mysrvr1dr’
  • CRS-2673: Attempting to stop ‘ora.mdnsd’ on ‘mysrvr1dr’
  • CRS-2673: Attempting to stop ‘ora.gpnpd’ on ‘mysrvr1dr’
  • CRS-2677: Stop of ‘ora.mdnsd’ on ‘mysrvr1dr’ succeeded
  • CRS-2677: Stop of ‘ora.gpnpd’ on ‘mysrvr1dr’ succeeded
  • CRS-2673: Attempting to stop ‘ora.ctssd’ on ‘mysrvr1dr’
  • CRS-2673: Attempting to stop ‘ora.evmd’ on ‘mysrvr1dr’
  • CRS-2673: Attempting to stop ‘ora.asm’ on ‘mysrvr1dr’
  • CRS-2673: Attempting to stop ‘ora.drivers.acfs’ on ‘mysrvr1dr’
  • CRS-2677: Stop of ‘ora.drivers.acfs’ on ‘mysrvr1dr’ succeeded
  • CRS-2677: Stop of ‘ora.ctssd’ on ‘mysrvr1dr’ succeeded
  • CRS-2677: Stop of ‘ora.evmd’ on ‘mysrvr1dr’ succeeded
  • CRS-2677: Stop of ‘ora.asm’ on ‘mysrvr1dr’ succeeded
  • CRS-2673: Attempting to stop ‘ora.cluster_interconnect.haip’ on ‘mysrvr1dr’
  • CRS-2677: Stop of ‘ora.cluster_interconnect.haip’ on ‘mysrvr1dr’ succeeded
  • CRS-2673: Attempting to stop ‘ora.cssd’ on ‘mysrvr1dr’
  • CRS-2677: Stop of ‘ora.cssd’ on ‘mysrvr1dr’ succeeded
  • CRS-2673: Attempting to stop ‘ora.gipcd’ on ‘mysrvr1dr’
  • CRS-2677: Stop of ‘ora.gipcd’ on ‘mysrvr1dr’ succeeded
  • CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘mysrvr1dr’ has completed
  • CRS-4133: Oracle High Availability Services has been stopped.

#### Time to start the cluster in normal mode for all nodes

mysrvr1dr:root:/root $ cd /app/grid/product/12201/grid/bin

mysrvr1dr:root:/app/grid/product/12201/grid/bin $ ./crsctl start cluster -all

  • CRS-2672: Attempting to start ‘ora.evmd’ on ‘mysrvr6dr’
  • CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘mysrvr6dr’
  • CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘mysrvr4dr’
  • CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘mysrvr2dr’
  • CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘mysrvr3dr’
  • CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘mysrvr8dr’
  • CRS-2672: Attempting to start ‘ora.evmd’ on ‘mysrvr2dr’
  • CRS-2672: Attempting to start ‘ora.evmd’ on ‘mysrvr4dr’
  • CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘mysrvr7dr’
  • CRS-2672: Attempting to start ‘ora.evmd’ on ‘mysrvr3dr’
  • CRS-2672: Attempting to start ‘ora.evmd’ on ‘mysrvr8dr’
  • CRS-2672: Attempting to start ‘ora.evmd’ on ‘mysrvr7dr’
  • CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘mysrvr5dr’
  • CRS-2672: Attempting to start ‘ora.evmd’ on ‘mysrvr5dr’
  • CRS-2676: Start of ‘ora.cssdmonitor’ on ‘mysrvr4dr’ succeeded
  • CRS-2676: Start of ‘ora.cssdmonitor’ on ‘mysrvr8dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.cssd’ on ‘mysrvr4dr’
  • CRS-2672: Attempting to start ‘ora.diskmon’ on ‘mysrvr4dr’
  • CRS-2672: Attempting to start ‘ora.cssd’ on ‘mysrvr8dr’
  • CRS-2676: Start of ‘ora.cssdmonitor’ on ‘mysrvr2dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.diskmon’ on ‘mysrvr8dr’
  • CRS-2676: Start of ‘ora.cssdmonitor’ on ‘mysrvr3dr’ succeeded
  • CRS-2676: Start of ‘ora.cssdmonitor’ on ‘mysrvr6dr’ succeeded
  • CRS-2676: Start of ‘ora.cssdmonitor’ on ‘mysrvr7dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.cssd’ on ‘mysrvr2dr’
  • CRS-2672: Attempting to start ‘ora.cssd’ on ‘mysrvr6dr’
  • CRS-2672: Attempting to start ‘ora.diskmon’ on ‘mysrvr2dr’
  • CRS-2672: Attempting to start ‘ora.cssd’ on ‘mysrvr3dr’
  • CRS-2672: Attempting to start ‘ora.diskmon’ on ‘mysrvr6dr’
  • CRS-2672: Attempting to start ‘ora.cssd’ on ‘mysrvr7dr’
  • CRS-2672: Attempting to start ‘ora.diskmon’ on ‘mysrvr3dr’
  • CRS-2676: Start of ‘ora.diskmon’ on ‘mysrvr4dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.diskmon’ on ‘mysrvr7dr’
  • CRS-2676: Start of ‘ora.diskmon’ on ‘mysrvr8dr’ succeeded
  • CRS-2676: Start of ‘ora.diskmon’ on ‘mysrvr2dr’ succeeded
  • CRS-2676: Start of ‘ora.diskmon’ on ‘mysrvr6dr’ succeeded
  • CRS-2676: Start of ‘ora.diskmon’ on ‘mysrvr3dr’ succeeded
  • CRS-2676: Start of ‘ora.diskmon’ on ‘mysrvr7dr’ succeeded
  • CRS-2676: Start of ‘ora.cssdmonitor’ on ‘mysrvr5dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.cssd’ on ‘mysrvr5dr’
  • CRS-2672: Attempting to start ‘ora.diskmon’ on ‘mysrvr5dr’
  • CRS-2676: Start of ‘ora.diskmon’ on ‘mysrvr5dr’ succeeded
  • CRS-2676: Start of ‘ora.evmd’ on ‘mysrvr6dr’ succeeded
  • CRS-2676: Start of ‘ora.evmd’ on ‘mysrvr2dr’ succeeded
  • CRS-2676: Start of ‘ora.evmd’ on ‘mysrvr4dr’ succeeded
  • CRS-2676: Start of ‘ora.evmd’ on ‘mysrvr8dr’ succeeded
  • CRS-2676: Start of ‘ora.evmd’ on ‘mysrvr3dr’ succeeded
  • CRS-2676: Start of ‘ora.evmd’ on ‘mysrvr7dr’ succeeded
  • CRS-2676: Start of ‘ora.evmd’ on ‘mysrvr5dr’ succeeded
  • CRS-2676: Start of ‘ora.cssd’ on ‘mysrvr8dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.ctssd’ on ‘mysrvr8dr’
  • CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip’ on ‘mysrvr8dr’
  • CRS-2676: Start of ‘ora.cssd’ on ‘mysrvr2dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.ctssd’ on ‘mysrvr2dr’
  • CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip’ on ‘mysrvr2dr’
  • CRS-2676: Start of ‘ora.cssd’ on ‘mysrvr5dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.ctssd’ on ‘mysrvr5dr’
  • CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip’ on ‘mysrvr5dr’
  • CRS-2676: Start of ‘ora.ctssd’ on ‘mysrvr8dr’ succeeded
  • CRS-2676: Start of ‘ora.ctssd’ on ‘mysrvr2dr’ succeeded
  • CRS-2676: Start of ‘ora.cssd’ on ‘mysrvr7dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.ctssd’ on ‘mysrvr7dr’
  • CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip’ on ‘mysrvr7dr’
  • CRS-2676: Start of ‘ora.ctssd’ on ‘mysrvr5dr’ succeeded
  • CRS-2676: Start of ‘ora.cssd’ on ‘mysrvr4dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.ctssd’ on ‘mysrvr4dr’
  • CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip’ on ‘mysrvr4dr’
  • CRS-2676: Start of ‘ora.cssd’ on ‘mysrvr3dr’ succeeded
  • CRS-2676: Start of ‘ora.cssd’ on ‘mysrvr6dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.ctssd’ on ‘mysrvr3dr’
  • CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip’ on ‘mysrvr3dr’
  • CRS-2672: Attempting to start ‘ora.ctssd’ on ‘mysrvr6dr’
  • CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip’ on ‘mysrvr6dr’
  • CRS-2676: Start of ‘ora.ctssd’ on ‘mysrvr7dr’ succeeded
  • CRS-2676: Start of ‘ora.cluster_interconnect.haip’ on ‘mysrvr8dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.asm’ on ‘mysrvr8dr’
  • CRS-2676: Start of ‘ora.asm’ on ‘mysrvr8dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.storage’ on ‘mysrvr8dr’
  • CRS-2676: Start of ‘ora.ctssd’ on ‘mysrvr4dr’ succeeded
  • CRS-2676: Start of ‘ora.ctssd’ on ‘mysrvr3dr’ succeeded
  • CRS-2676: Start of ‘ora.ctssd’ on ‘mysrvr6dr’ succeeded
  • CRS-2676: Start of ‘ora.cluster_interconnect.haip’ on ‘mysrvr2dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.asm’ on ‘mysrvr2dr’
  • CRS-2676: Start of ‘ora.asm’ on ‘mysrvr2dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.storage’ on ‘mysrvr2dr’
  • CRS-2676: Start of ‘ora.storage’ on ‘mysrvr8dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.crsd’ on ‘mysrvr8dr’
  • CRS-2676: Start of ‘ora.crsd’ on ‘mysrvr8dr’ succeeded
  • CRS-2676: Start of ‘ora.storage’ on ‘mysrvr2dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.crsd’ on ‘mysrvr2dr’
  • CRS-2676: Start of ‘ora.cluster_interconnect.haip’ on ‘mysrvr5dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.asm’ on ‘mysrvr5dr’
  • CRS-2676: Start of ‘ora.asm’ on ‘mysrvr5dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.storage’ on ‘mysrvr5dr’
  • CRS-2676: Start of ‘ora.crsd’ on ‘mysrvr2dr’ succeeded
  • CRS-2676: Start of ‘ora.storage’ on ‘mysrvr5dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.crsd’ on ‘mysrvr5dr’
  • CRS-2676: Start of ‘ora.cluster_interconnect.haip’ on ‘mysrvr7dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.asm’ on ‘mysrvr7dr’
  • CRS-2676: Start of ‘ora.asm’ on ‘mysrvr7dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.storage’ on ‘mysrvr7dr’
  • CRS-2676: Start of ‘ora.crsd’ on ‘mysrvr5dr’ succeeded
  • CRS-2676: Start of ‘ora.storage’ on ‘mysrvr7dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.crsd’ on ‘mysrvr7dr’
  • CRS-2676: Start of ‘ora.cluster_interconnect.haip’ on ‘mysrvr6dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.asm’ on ‘mysrvr6dr’
  • CRS-2676: Start of ‘ora.asm’ on ‘mysrvr6dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.storage’ on ‘mysrvr6dr’
  • CRS-2676: Start of ‘ora.cluster_interconnect.haip’ on ‘mysrvr4dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.asm’ on ‘mysrvr4dr’
  • CRS-2676: Start of ‘ora.asm’ on ‘mysrvr4dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.storage’ on ‘mysrvr4dr’
  • CRS-2676: Start of ‘ora.cluster_interconnect.haip’ on ‘mysrvr3dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.asm’ on ‘mysrvr3dr’
  • CRS-2676: Start of ‘ora.asm’ on ‘mysrvr3dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.storage’ on ‘mysrvr3dr’
  • CRS-2676: Start of ‘ora.crsd’ on ‘mysrvr7dr’ succeeded
  • CRS-2676: Start of ‘ora.storage’ on ‘mysrvr6dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.crsd’ on ‘mysrvr6dr’
  • CRS-2676: Start of ‘ora.storage’ on ‘mysrvr4dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.crsd’ on ‘mysrvr4dr’
  • CRS-2676: Start of ‘ora.storage’ on ‘mysrvr3dr’ succeeded
  • CRS-2672: Attempting to start ‘ora.crsd’ on ‘mysrvr3dr’
  • CRS-2676: Start of ‘ora.crsd’ on ‘mysrvr6dr’ succeeded
  • CRS-2676: Start of ‘ora.crsd’ on ‘mysrvr4dr’ succeeded
  • CRS-2676: Start of ‘ora.crsd’ on ‘mysrvr3dr’ succeeded
  • CRS-4690: Oracle Clusterware is already running on ‘mysrvr1dr’ à fine since  we kept cluster running on node 1

CRS-4000: Command Start failed, or completed with errors.

### checks performed :

On each node: ps -ef|grep d.bin

On each node: crsctl stat res -t -init

On a node: crsctl check cluster -all

Happy reading,

And till next time

Mathijs

Reconfigure Rac 12.2 Gridinfra network

Summary.

In 12.2 Grid infrastructure Oracle has altered the concept of ASM to flex-ASM as a default. This blog will take a focus on re-configuring the Oracle Rac 12.2 Grid infrastructure network component parts like the interconnect, the public or to change the interface to do-not-use, whenever that applies / is an improvement to the situation at hand. Read carefully in full before performing it on one of your clusters. Baseline for this action will be a document on Mos (How to Modify Private Network Information in Oracle Clusterware (Doc ID 283684.1))

Details:

As with any change, when going through input – processing – output it is important to have a clear picture of the situation as is. So a first and  very mandatory step should be to check with the oifcfg getif command how things are before starting the changes:

When entering the command information with regard to the known network interfaces in the Rac cluster similar to below should be showing:

oracle@mysrvr1dr:/app/oracle/stage/27468969 [+ASM1]# oifcfg getif

bond0  198.19.11.0  global  public
eth0  10.217.210.0  global  cluster_interconnect,asm
eth2  192.168.10.0  global  cluster_interconnect
eth7  192.168.11.0  global  cluster_interconnect

Here bond0 will be used used as public, eth0 at the moment is holding activities for the cluster interconnect and for asm, eth2 and eth7 are dedicated to  the interconnect. Eth0 is defined as admin lan for various activities. In this setup the cluster is unstable, nodes are being evicted. Time to perform steps to stabilize it.

From the Mos note, looking at Case IV. Changing private network interface name, subnet or netmask. For 12c Oracle Clusterware with Flex ASM.

Precaution, taking backup of profile.xml on each node.

Take a backup of profile.xml on all cluster nodes before proceeding, as grid user. In this specific case this is the user that has installed the Grid Infrastructure ( in this scenario that was the  oracle user):
Command:

$ cd $GRID_HOME/gpnp/<hostname>/profiles/peer/
 $ cp -p profile.xml profile.xml.bk
cd /app/grid/product/12201/grid/gpnp/mysrvr1dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr2dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr3dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr4dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr5dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr6dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr7dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr8dr/profiles/peer
cp -p profile.xml profile.xml.bk

Altering the interconnect:

One of the interconnects should be altered to make sure that the ASM listener is able to communicate using that interface to. In this scenario eth2 was used to do so. When doing this take note of the ip since it will be needed to configure a new ASM listener.

oifcfg setif -global eth2/192.168.10.0:cluster_interconnect,asm
oifcfg setif -global eth7/192.168.11.0:cluster_interconnect

Now eth2  shows that it setup for interconnect and asm (only one interconnect should be setup to combine cluster_interconnect+asm).

peer [+ASM1]# oifcfg getif

bond0  198.19.11.0  global  public
eth0  10.217.210.0  global  cluster_interconnect,asm
eth2  192.168.10.0  global  cluster_interconnect,asm
eth7  192.168.11.0  global  cluster_interconnect

With this information checked and in place it is time for setting up new listener for asm since the original ASM listener during the installation used eth0 and that eth0 will be dropped  – removed from cluster configuration in steps below:

Existing listener ASMNET1LSNR  will become new one ASMNET122LSNR.

srvctl add listener -asmlistener -l ASMNET122LSNR -subnet 192.168.10.0
(as mentioned this is the eth2 interface that we are going to use).

As always seeing is believing : use crsctl status resource -t to see details similar to below. The new ASM listener is created as a resource and it is in a status offline offline on all nodes in the cluster at this point and time :

--------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------
ora.ASMNET122LSNR_ASM.lsnr
               OFFLINE OFFLINE      mysrvr1dr                 STABLE
               OFFLINE OFFLINE      mysrvr2dr                 STABLE
               OFFLINE OFFLINE      mysrvr3dr                 STABLE
               OFFLINE OFFLINE      mysrvr4dr                 STABLE
               OFFLINE OFFLINE      mysrvr5dr                 STABLE
               OFFLINE OFFLINE      mysrvr6dr                 STABLE
               OFFLINE OFFLINE      mysrvr7dr                 STABLE
               OFFLINE OFFLINE      mysrvr8dr                 STABLE

In the next step we will remove the old ASM listener, and use a -f option to prevent errors – messages with regard to dependencies.

srvctl update listener -listener ASMNET1LSNR_ASM -asm -remove -force

I have checked again with crsctl status resource -t to make sure the old resource is gone now.

Removing the old ASM listener

In the Mos note there is a little inconsistency because it claims  that as a next step the old ASM listener should be stopped.  I was able to grep for the listener ( ps -ef|grep -i inherit)  and i saw it on OS level on the machine(S). But I am not able to stop that listener  since the cluster resource is already gone and lsnrctl did not work. Solution: What I noticed that when I skipped this step and stopped and started the cluster which is mandatory in this scenario, the listener was gone on all nodes.

Should have given this command, but that is NOT working:
lsnrctl stop ASMNET1LSNR_ASM

Check configuration before restarting GI:

First command:
srvctl config listener -asmlistener

Name: ASMNET122LSNR_ASM
Type: ASM Listener
Owner: oracle
Subnet: 192.168.10.0
Home: <CRS home>
End points: TCP:1527
Listener is enabled.
Listener is individually enabled on nodes:
Listener is individually disabled on nodes:

Second Command:
srvctl config asm

ASM home: <CRS home>
Password file: +VOTE/orapwASM
Backup of Password file:
ASM listener: LISTENER
ASM instance count: ALL
Cluster ASM listener: ASMNET122LSNR_ASM

Both results look great so time to move to the next step (restarting the Grid Infra structure on all nodes).

Restarting Grid infrastructure on all Nodes:

For this next step you have to become root (or sudo su – ) to do the next steps. First and importantly make sure that the Grid infra structure is not restarting automatically should a cluster node perform a reboot (disable crs) , then stop the Grid infrastructure software:

As root

/app/grid/product/12201/grid/bin/crsctl disable crs
/app/grid/product/12201/grid/bin/crsctl stop crs
To be done on: mysrvr[1-8]dr

Checking network configuration on all nodes.

mysrvr1dr:root:/root $ ifconfig -a

Starting cluster again:

As root

/app/grid/product/12201/grid/bin/crsctl enable crs
/app/grid/product/12201/grid/bin/crsctl start crs

To be done on: mysrvr[1-8]dr

Final checks:

oifcfg getif

bond0  198.19.11.0  global  public
eth0  10.217.210.0  global  cluster_interconnect,asm
eth2  192.168.10.0  global  cluster_interconnect,asm
eth7  192.168.11.0  global  cluster_interconnect

Time to delete eth0

Since eth0 is admin lan, and after our reconfigure steps, time  to get rid of the eth0 (remove it from the Grid infra structure).

oifcfg delif -global eth0/10.217.210.0 

And a last check again:

oifcfg getif

bond0  198.19.11.0  global  public
eth2  192.168.10.0  global  cluster_interconnect,asm
eth7  192.168.11.0  global  cluster_interconnect

Happy reading, and till we meet again,

Mathijs.

 

 

Install & Patch 12.2 GI in Oracle Restart

Introduction:

Where would we be without challenges. I have become team member of a project team for  a new billing environment and this team is aiming to use ( and go live) with Oracle 12.2 Grind Infra structure and Database version). The information of  this article will become a baseline for the installation of several Oracle environments on Linux. Oracle is referring to this as Oracle Restart. Next in line after that (and I love it) will be Real application clusters to be set up.

General Preparations 12.2 Grid Kata:

## Identifying ORACLE_BASE and layout of Grid Infrastructure.

echo $ORACLE_BASE
/app/oracle

echo $ORACLE_HOME
/app/grid/product/12.2.0.1/grid

## Identifying ORACLE_BASE  and Db software
echo $ORACLE_BASE
/app/oracle

echo $ORACLE_HOME
/app/oracle/product/12.2.0.1/db

## So for the 12.2 layout which is in scope for the actions on a Restart or Rac environment:

+ASM1     /app/grid/product/12.2.0.1/grid          
CRS       /app/grid/product/12.2.0.1/grid           
-MGMTDB   /app/grid/product/12.2.0.1/grid          
MYDB     /app/oracle/product/12.2.0.1/db

## Checking Red Hat release:

oracle@mysrvr1hr:/dev/mapper []# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.9 (Santiago)

## Oracle restart installation for 12.2  instructions to be found:

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/ladbi/installing-oracle-grid-infrastructure-for-a-standalone-server-with-a-new-database-installation.html#GUID-0B1CEE8C-C893-46AA-8A6A-7B5FAAEC72B3

Interesting point is that in 12.2 the famous, well known runInstaller is replaced by ./gridSetup.sh (when opening runInstaller even ran into errors (oui-10133 wen running runInstaller in 12.2).  Second point of interest will be that you have to pre-create the directory where the software will be running. 

oui-10133

## Preparations  for Installation:

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/ladbi/installing-oracle-grid-infrastructure-for-a-standalone-server-with-a-new-database-installation.html#GUID-0B1CEE8C-C893-46AA-8A6A-7B5FAAEC72B3

  1. On the server where you will install the Grid infrastructure create the directory where you want to install the software (the location you will later on call Your ORACLE_HOME).  On the source server  and in my specific case that meant that: I had to do mkdir -p /app/grid/product/12.2.0.1/grid on the source server.
  2. From Solltau: oracle@myhost:/opt/oracle/Odrive/depot/software/oracle/12c/GI []# scp linuxx64_12201_grid_home.zip oracle@mysrvr1hr:/app/grid/product/12.2.0.1/grid
  3. UNSET your environment variables if any on the installation box:
unset ORACLE_BASE 
unset ORACLE_HOME 
unset GI_HOME 
unset ORA_CRS_HOME 
unset TNS_ADMIN
unset ORACLE_SID
unset ORA_NLS10

echo $ORACLE_BASE etc.

  1. ## Check zip file in the destination that will also become your ORACLE_HOME for this install.
oracle@mysrvr1hr:/app/grid/product/12.2.0.1/grid []# ls -ltr
total 2924504
-rw-r--r--. 1 oracle dba 2994687209 Jan 3 16:28 linuxx64_12201_grid_home.zip

Make sure you use the future ORACLE_HOME destination to unzip the file. This is mandatory because compared to previous Installations where you can alter the Software installed directory,  during the 12.2 installation there will NOT be an option to choose the destination where to setup this installation. ## make sure you are in the directory /app/grid/product/12.2.0.1/grid and extract the zip!!

  1. ##Once the Zip file is extracted fire below script:
./gridSetup.sh

In one of the following detailed screens make sure you change “change directory path” to discover the disks that you will be using for  this installation. In my case this meant that the Linux Admin colleague has set up – has labelled dedicated Luns (disks) as a preparation to my actions.

root # ls -lH /dev/mapper/ASM_*
brw-rw----. 1 oracle dba 253,  6 Dec 22 16:01 /dev/mapper/ASM_ACFS_035_001
brw-rw----. 1 oracle dba 253, 33 Dec 22 16:01 /dev/mapper/ASM_OCRM_008_001
brw-rw----. 1 oracle dba 253, 34 Dec 22 16:01 /dev/mapper/ASM_OCRM_008_002

brw-rw----. 1 oracle dba 253, 25 Dec 22 16:01 /dev/mapper/ASM_VOTE_008_001
brw-rw----. 1 oracle dba 253, 26 Dec 22 16:01 /dev/mapper/ASM_VOTE_008_002
brw-rw----. 1 oracle dba 253, 30 Dec 22 16:01 /dev/mapper/ASM_VOTE_008_003

 

## Once you have made all the selections needed below screen will appear:

Grid summary

Once you selected install  you will be updated by next progress screen:

Grid prograss

 

## In a separate screen as the Root user ./root.sh  has to run which will show:

mysrvr1hr:root:/app/grid/product/12.2.0.1/grid # ./root.sh

Performing root user operation.
The following environment variables are set as:

ORACLE_OWNER= oracle
ORACLE_HOME=  /app/grid/product/12.2.0.1/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /app/grid/product/12.2.0.1/grid/crs/install/crsconfig_params
The log of current session can be found at:
/app/oracle/crsdata/mysrvr1hr/crsconfig/roothas_2018-01-03_05-02-27PM.log

## logging details of root.sh :
LOCAL ADD MODE
Creating OCR keys for user 'oracle', privgrp 'dba'..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.

CRS-4664: Node mysrvr1hr successfully pinned.
2018/01/03 17:02:50 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.conf'
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'mysrvr1hr'
CRS-2673: Attempting to stop 'ora.evmd' on 'mysrvr1hr'
CRS-2677: Stop of 'ora.evmd' on 'mysrvr1hr' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'mysrvr1hr' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.

mysrvr1hr     2018/01/03 17:03:44     /app/grid/product/12.2.0.1/grid/cdata/mysrvr1hr/backup_20180103_170344.olr     0
2018/01/03 17:03:49 CLSRSC-327: Successfully configured Oracle Restart for a standalone server
mysrvr1hr:root:/app/grid/product/12.2.0.1/grid #

 

Resolving possible issues:  12.2 GI standalone : [INS-20802] Automatic Storage Management Configuration Assistant failed (Doc ID 2277224.1)

grid completing

 

## The Installation will create the ASM instance with a default spfile. Due to company standards and due to best practice (knowing that size does matter and default settings will not do well in a heavily used environment) you should connect to the ASM instance and  alter below values:

## Specific setup for asm Instance

ALTER SYSTEM SET memory_max_target=4096M SCOPE=SPFILE;
ALTER SYSTEM SET memory_target=1536M SCOPE=SPFILE;
ALTER SYSTEM SET large_pool_size=100M SCOPE=SPFILE;
ALTER SYSTEM SET shared_pool_size=512M SCOPE=BOTH;
ALTER SYSTEM SET shared_pool_reserved_size=100M SCOPE=SPFILE;

## Nothing to do with performance but mandatory due to Standards.
ALTER SYSTEM SET audit_file_dest='/app/oracle/+ASM/admin/adump' SCOPE=SPFILE;
ALTER SYSTEM SET background_dump_dest='/app/oracle/diag/asm/+asm/+ASM/trace' SCOPE=BOTH;

 

## Company Standards as with regard to  listener:

  • Log destination: /app/oracle/diag/tnslsnr/mysrvr1hr/listener
  • One listener per vip

## So I have added a listener with the netca tool running from  the GridInfrastructurehome.

## /app/oracle/diag/tnslsnr/<servername>/<listenername>/trace

oracle@mysrvr1hr:/app/grid/product/12.2.0.1/grid/network/admin [+ASM]# lsnrctl status LISTENER_MYSRVR1HR

## Deinstallation when needed. As always you might need a way out (back again) .

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/ladbi/deinstalling-previous-release-grid-home.html#GUID-D84F94CE-251C-4D3B-9D6E-F584FB30F5B3

Note: For upgrades from previous releases, if you want to uninstall the previous release Grid home, then perform the following steps:

  1. Log in as the root user.
  2. Manually change the permissions of the previous release Grid home (see below).
  3. Run the /app/grid/product/12.2.0.1/grid/deinstall/deinstall command (as oracle User).
For example, on Grid Infrastructure for a standalone server:

# chown -R oracle:dba /app/grid/product/12.2.0.1
# chmod -R 775 /app/grid/product/12.2.0.1

In this example:
/u01/app/oracle/product/11.2.0/grid is the previous release Oracle Grid Infrastructure for a standalone server home

oracle is the Oracle Grid Infrastructure installation owner user
dba is the name of the Oracle Inventory group (OINSTALL group)

For example, on Oracle Database:

# chown -R oracle:dba /app/oracle/product/12.2.0.1
# chmod -R 775 /app/oracle/product/12.2.0.1

 

If all is well Time to start Patching the Environment !

## patching : GI : p26737266_122010_Linux-x86-64.zip

## oracle@soltau2:/opt/oracle/Odrive/depot/software/oracle/patches/Linuxx86 []# scp p26737266_122010_Linux-x86-64.zip oracle@mysrvr1hr:/app/grid/product/12.2.0.1/stage

## check current situation with opatch before patching.

opatch lsinventory -detail -oh /app/grid/product/12.2.0.1/grid

This shows:

Oracle Interim Patch Installer version 12.2.0.1.6
Copyright (c) 2018, Oracle Corporation.  All rights reserved.

Oracle Home       : /app/grid/product/12.2.0.1/grid
Central Inventory : /app/oraInventory
from           : /app/grid/product/12.2.0.1/grid/oraInst.loc

OPatch version    : 12.2.0.1.6
OUI version       : 12.2.0.1.4

Log file location : /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatch/opatch2018-01-05_14-39-27PM_1.log
Lsinventory Output file location : /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatch/lsinv/lsinventory2018-01-05_14-39-27PM.txt
--------------------------------------------------------------------------------
Local Machine Information::
Hostname: mysrvr1hr.mydomain 

ARU platform id: 226
ARU platform description:: Linux x86-64
Installed Top-level Products (1):
Oracle Grid Infrastructure 12c                                       12.2.0.1.0
There are 1 products installed in this Oracle Home.

Installed Products (99):
etc.
.
.
There are 99 products installed in this Oracle Home.
There are no Interim patches installed in this Oracle Home.
--------------------------------------------------------------------------------
OPatch succeeded.

## Use opatch to  check for conflicts:

$ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /app/grid/product/12.2.0.1/stage/26737266/26710464
$ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /app/grid/product/12.2.0.1/stage/26737266/26925644
$ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /app/grid/product/12.2.0.1/stage/26737266/26737232
$ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /app/grid/product/12.2.0.1/stage/26737266/26839277
$ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /app/grid/product/12.2.0.1/stage/26737266/26928563

## This did not show any conflicts

## Next step will be : use opatch check for space requirements (you would not want to end up installation failing due to lacking storage:

For Grid Infrastructure Home, as home user:

Create file /tmp/patch_list_gihome.txt with the following content:
cat /tmp/patch_list_gihome.txt3.  

/app/grid/product/12.2.0.1/stage/26737266/269285634.  
/app/grid/product/12.2.0.1/stage/26737266/268392775.  
/app/grid/product/12.2.0.1/stage/26737266/267372326.  
/app/grid/product/12.2.0.1/stage/26737266/269256447.  
/app/grid/product/12.2.0.1/stage 26737266/26710464

 

Run the opatch command to check if enough free space is available in the Grid Infrastructure Home:

$ORACLE_HOME/OPatch/opatch prereq CheckSystemSpace -phBaseFile /tmp/patch_list_gihome.txt

## this shows

oracle@mysrvr1hr:/app/grid/product/12.2.0.1/stage [+ASM]# $ORACLE_HOME/OPatch/opatch prereq CheckSystemSpace -phBaseFile /tmp/patch_list_gihome.txt
Oracle Interim Patch Installer version 12.2.0.1.6
Copyright (c) 2018, Oracle Corporation.  All rights reserved.

PREREQ session

Oracle Home       : /app/grid/product/12.2.0.1/grid
Central Inventory : /app/oraInventory
from           : /app/grid/product/12.2.0.1/grid/oraInst.loc

OPatch version    : 12.2.0.1.6
OUI version       : 12.2.0.1.4

Log file location : /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatch/opatch2018-01-05_14-55-06PM_1.log
Invoking prereq "checksystemspace"

Prereq "checkSystemSpace" passed.
OPatch succeeded.


 

## To patch only the GI home:

# opatchauto apply /app/grid/product/12.2.0.1/stage/26737266 -oh /app/grid/product/12.2.0.1/grid

## failed with:

OPATCHAUTO-72046: Invalid wallet parameters.
OPATCHAUTO-72046: The wallet path or wallet password provided is not valid.
OPATCHAUTO-72046: Please provide valid wallet information.

opatchauto bootstrapping failed with error code 46.

## Thank you Mos for elaborating.

 

OPATCHAUTO-72046: Invalid wallet parameters (Doc ID 2150070.1)

 

opatchauto command is not being run as root user.  Opatchauto for Grid PSUs should always be run as root user.

## So as the root user :

/app/grid/product/12.2.0.1/grid/OPatch/opatchauto apply /app/grid/product/12.2.0.1/stage/26737266 -oh /app/grid/product/12.2.0.1/grid

## and it failed again !!!

mysrvr1hr:root:/root # /app/grid/product/12.2.0.1/grid/OPatch/opatchauto apply /app/grid/product/12.2.0.1/stage/26737266 -oh /app/grid/product/12.2.0.1/grid

System initialization log file is /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchautodb/systemconfig2018-01-05_03-09-09PM.log.
Session log file is /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/opatchauto2018-01-05_03-09-12PM.log

The id for this session is 5LQ1
[init:init] Executing OPatchAutoBinaryAction action on home /app/grid/product/12.2.0.1/grid
Executing OPatch prereq operations to verify patch applicability on SIHA Home........
[init:init] OPatchAutoBinaryAction action completed on home /app/grid/product/12.2.0.1/grid with failure
Execution of [OPatchAutoBinaryAction] patch action failed, check log for more details. Failures:
Patch Target : mysrvr1hr->/app/grid/product/12.2.0.1/grid Type[siha]

Details: [
---------------------------Patching Failed---------------------------------
Command execution failed during patching in home: /app/grid/product/12.2.0.1/grid, host: mysrvr1hr.
Command failed:  /app/grid/product/12.2.0.1/grid/OPatch/opatchauto  apply /app/grid/product/12.2.0.1/stage/26737266 -oh /app/grid/product/12.2.0.1/grid -target_type has -binary -invPtrLoc /app/grid/product/12.2.0.1/grid/oraInst.loc -persistresult /app/grid/product/12.2.0.1/grid/OPatch/auto/dbsessioninfo/sessionresult_analyze_mysrvr1hr_siha.ser -analyze -online
Command failure output:
==Following patches FAILED in analysis for apply:

Patch: /app/grid/product/12.2.0.1/stage/26737266/26925644
Log: /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-05_15-09-16PM_1.log

Reason: Failed during Analysis: CheckNApplyReport Failed, [ Prerequisite Status: FAILED, Prerequisite output:
The details are:
Prerequisite check "CheckMinimumOPatchVersion" failed.]
Failed during Analysis: CheckMinimumOPatchVersion Failed, [ Prerequisite Status: FAILED, Prerequisite output:

The details are:
The OPatch being used has version 12.2.0.1.6 while the following patch(es) require higher versions: 

Patch 26710464 requires OPatch version 12.2.0.1.7.

Please download latest OPatch from My Orac ...

etc.
.
.

OPatchAuto failed.

opatchauto failed with error code 42

mysrvr1hr:root:/root #

 

## So I downloaded latest opatch  version and parked it in a temporary directory on that targeted server:

unzip p6880880_122011_Linux-x86-64.zip -d /app/grid/product/12.2.0.1/grid

## now  Opatch shows:

oracle@mysrvr1hr:/app/grid/product/12.2.0.1/opatch [+ASM]# opatch version
OPatch Version: 12.2.0.1.11

OPatch succeeded.

## Sometimes you just have to be patient to hear the lambs being silent:

## Next run as the root user :

/app/grid/product/12.2.0.1/grid/OPatch/opatchauto apply /app/grid/product/12.2.0.1/stage/26737266 -oh /app/grid/product/12.2.0.1/grid

 

## logfiles:

oracle@mysrvr1hr:/app/oracle/crsdata/mysrvr1hr/crsconfig

-rw-rw----. 1 oracle dba  17364 Jan  5 15:35 hapatch_2018-01-05_03-34-42PM.log

-rw-rw----. 1 oracle dba  23725 Jan  5 15:42 hapatch_2018-01-05_03-42-41PM.log

##showed
mysrvr1hr:root:/root # /app/grid/product/12.2.0.1/grid/OPatch/opatchauto apply /app/grid/product/12.2.0.1/stage/26737266 -oh /app/grid/product/12.2.0.1/grid
OPatchauto session is initiated at Fri Jan  5 15:33:54 2018
System initialization log file is /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchautodb/systemconfig2018-01-05_03-33-58PM.log.
Session log file is /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/opatchauto2018-01-05_03-34-02PM.log

The id for this session is XLE2
Executing OPatch prereq operations to verify patch applicability on home /app/grid/product/12.2.0.1/grid
Patch applicability verified successfully on home /app/grid/product/12.2.0.1/grid

Bringing down CRS service on home /app/grid/product/12.2.0.1/grid
Prepatch operation log file location: /app/oracle/crsdata/mysrvr1hr/crsconfig/hapatch_2018-01-05_03-34-42PM.log
CRS service brought down successfully on home /app/grid/product/12.2.0.1/grid

Start applying binary patch on home /app/grid/product/12.2.0.1/grid
Binary patch applied successfully on home /app/grid/product/12.2.0.1/grid

Starting CRS service on home /app/grid/product/12.2.0.1/grid
Postpatch operation log file location: /app/oracle/crsdata/mysrvr1hr/crsconfig/hapatch_2018-01-05_03-42-41PM.log
CRS service started successfully on home /app/grid/product/12.2.0.1/grid

OPatchAuto successful.

--------------------------------Summary--------------------------------
Patching is completed successfully. Please find the summary as follows:
Host:mysrvr1hr
SIHA Home:/app/grid/product/12.2.0.1/grid

Summary:
==Following patches were SUCCESSFULLY applied:
Patch: /app/grid/product/12.2.0.1/stage/26737266/26710464
Log: /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-05_15-35-04PM_1.log

Patch: /app/grid/product/12.2.0.1/stage/26737266/26737232
Log: /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-05_15-35-04PM_1.log

Patch: /app/grid/product/12.2.0.1/stage/26737266/26839277
Log: /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-05_15-35-04PM_1.log

Patch: /app/grid/product/12.2.0.1/stage/26737266/26925644
Log: /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-05_15-35-04PM_1.log

Patch: /app/grid/product/12.2.0.1/stage/26737266/26928563
Log: /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-05_15-35-04PM_1.log

OPatchauto session completed at Fri Jan  5 15:43:05 2018
Time taken to complete the session 9 minutes, 11 seconds


 

Happy Dba , Installed 12.2 GI and Patched it with RU October 2017.

Thanks for reading and till we meet again,

Mathijs.

Altering the Hearbeat in Oracle Rac environment.

Introduction

When being asked for a part of job description of a dba  I would like to take a moment, smile and reply:  to serve and to protect the cluster the databases and the data of course. In the matter at hand this will have to mean that this missions/assignment will be all about increasing High availability and improve the functionality of the cluster interconnect between nodes and that all for one price: doubling  your cluster interconnect ips on the various layers.

And to be honest this assignment is even more of interest because after all how often is there a need or a challenge to alter IP addresses once you have setup the cluster. Hmm only valid options  I could think of would be action where there is a action like lifting and shifting the servers to other rooms  or as in this case because we simply want to improve availability.

Some days before the action was scheduled, I looked into the matter of the required dedicated ips together with the colleagues from team. In the file in /etc/hosts being the source for that we looked for ips which had  hb in the alias of the naming convention which is according to standards on OS level. This information  was used to setup the ips on the OS level. Once that task was completed a first and very important test had to be: can you ping these new and dedicated ips from every node which is part of the cluster. And since this is a happy flow scenario that was of course what happened. During the maintenance window itself detailed steps as seen below have been performed to make these addresses known / usable on the cluster layer in the grid infrastructure.

Detailed Plan

In this scenario we are using a 4 node Rac cluster on RH with 11.2.0.4 Grid Infrastructure. When looking on the OS level this is what we found already present in the hosts file. And even better these ips where all available and not in use.

grep -i hb /etc/hosts
 • 10.124.10.75 mysrvrahr-hb1.dc.nl mysrvrahr-hb1
 • 10.124.11.75 mysrvrahr-hb2.dc.nl mysrvrahr-hb2
 • 10.124.10.76 mysrvrbhr-hb1.dc.nl mysrvrbhr-hb1
 • 10.124.11.76 mysrvrbhr-hb2.dc.nl mysrvrbhr-hb2
 • 10.124.10.77 mysrvrchr-hb1.dc.nl mysrvrchr-hb1
 • 10.124.11.77 mysrvrchr-hb2.dc.nl mysrvrchr-hb2
 • 10.124.10.78 mysrvrdhr-hb1.dc.nl mysrvrdhr-hb1
 • 10.124.11.78 mysrvrdhr-hb2.dc.nl mysrvrdhr-hb2

The steps below have been followed based on a great Mos note in order to complete the tasks that are needed to make the Grid infrastructure (cluster) aware of the new ips. The scenario is running through a number of steps to be well prepared but also of course to be on the save side before and during the changes on the cluster layer.  It is like hmm paying respect and being brave but cautious .

Preparation steps:
As of 11.2 Grid Infrastructure, the private network configuration is not only stored in OCR but also in the gpnp profile. Documentation was very clear on this: If the private network is not available or its definition is incorrect, the CRSD process will not start and any subsequent changes to the OCR will be impossible.

  • Therefore care needs to be taken when making modifications to the configuration of the private network.
  • It is important to perform the changes in the correct order.

Note that manual modification of gpnp profile is not supported so it is best to stick to proper actions and not go into hacking mode!

So let’s take a backup of profile.xml on all cluster nodes before proceeding:

As grid user ( in my case the oracle user) , move to the correct directory( cd $GRID_HOME/gpnp/<hostname>/profiles/peer/).

cd /app/oracle/product/11.2.0/grid/gpnp/mysrvrahr/profiles/peer 
cd /app/oracle/product/11.2.0/grid/gpnp/mysrvrbhr/profiles/peer 
cd /app/oracle/product/11.2.0/grid/gpnp/mysrvrchr/profiles/peer 
cd /app/oracle/product/11.2.0/grid/gpnp/mysrvrdhr/profiles/peer 

During startup of cluster Oracle is relying on this very important xml file
for specific data like spfile / diskgroups and of course IPS. 

cp -p profile.xml profile.xml.bk

Ensure Oracle Clusterware is running on ALL cluster nodes in the cluster and save current status of resource. (Better save then sorry and to make sure you know about the health of  the cluster and its resources as they have been defined. So do check the cluster and save the current status of resources in a file as a pre-change image.

/app/oracle/product/11.2.0/grid/bin/crsctl check cluster -all 
/app/oracle/product/11.2.0/grid/bin/crsctl status resource -t>/tmp/beforeNewIps.lst

As grid user( in my case the oracle user): Get the existing information.

showing which interfaces are defined in the cluster.
##below you will see that the current (single) cluster interconnect is set up at 64.18.112.208

/app/oracle/product/11.2.0/grid/bin/oifcfg getif
bond1 64.18.112.208  global  cluster_interconnect
bond0  195.233.190.64  global  public

The command iflist will show you the network information known on the OS. Showing defined all ( or specific ) ips. Check the interfaces / subnet address can be identified by command for eth specifically:

/app/oracle/product/11.2.0/grid/bin/oifcfg iflist|grep -i eth|sort 
64.18.32.0 eth0
10.124.10.0 eth2
10.124.11.0 eth6

or

## check  interfaces / subnets in general:
 /app/oracle/product/11.2.0/grid/bin/oifcfg iflist|sort

Since we now have a good picture of the status of the cluster and since we know more about the ips being used (oifcfg getif) and about the ips being present on the system (oifcfg iflist) all things set to Add the new cluster_interconnect information. As you can see definition of both eth2 Address and eth6. And with the -global parameter the information is shared in the complete cluster on all nodes

/app/oracle/product/11.2.0/grid/bin/oifcfg setif -global eth2/10.124.10.0:cluster_interconnect 
/app/oracle/product/11.2.0/grid/bin/oifcfg setif -global eth6/10.124.11.0:cluster_interconnect

Of course there cannot be a change without verifying it. So i checked on all nodes with below command.

/app/oracle/product/11.2.0/grid/bin/oifcfg getif  

Since we are using 11GR2 Grid Infrastructure below steps are to be followed now: Shutdown Oracle Cluster ware on all nodes and disable the Oracle Cluster ware as root

Action is to be performed as the root user: 
sudo su -  
./app/oracle/product/11.2.0/grid/bin/crsctl stop crs 
./app/oracle/product/11.2.0/grid/bin/crsctl disable crs

In this specific scenario my Linux brothers in arms had already made the network configuration change at OS level as was required and that great job was seen in the oifcfg iflist command. They made sure that the new interfaces were available on all nodes after their change.

(check to ping the interfaces on all nodes with script kindly provided by Linux team member). 
for x in 10 11;do for xx in 75 76 77 78;do ping -c2 10.124.${x}.${xx}|egrep 'icmp_seq|transmitted';done;echo;done 
for x in a b c d; do for xx in 1 2;do ping -c2 mysrvr${x}hr-hb$xx|egrep 'icmp_seq|transmitted';done;echo;done 

Well all went well and has been checked so it is time to restart Oracle Cluster ware and once completed enable Oracle Cluster ware again.

On all nodes in the cluster:

## as root user: 
sudo su -  
/app/oracle/product/11.2.0/grid/bin/crsctl start crs

Seeing  = believing in this matter so after some time Check:

/app/oracle/product/11.2.0/grid/bin/crsctl check cluster -all 

In the step below we are checking the status of the resources in the cluster again and adding that information to  a file. This “post” operation file is then being used to compare the status of the cluster resources before and after.

/app/oracle/product/11.2.0/grid/bin/crsctl status resource -t>/tmp/afterNewIps.lst 
sdiff /tmp/afterNewIps.lst /tmp/beforeNewIps.lst

This compare showed me that a 10G RAC database resource and its services needed my intention, so via the cluster commands i checked and observed their status after starting them with srvctl command as the oracle user.  Once completed I ran another check as described and ah happy me all resource in the post status file were in a similar status ( online online) as in the pre status file.

as root user: 
sudo su -  
/app/oracle/product/11.2.0/grid/bin/crsctl enable crs

Time to wrap up this scenario. As part of housekeeping remove the old interface:

/app/oracle/product/11.2.0/grid/bin/oifcfg delif -global bond1/64.18.112.208:cluster_interconnect

Verified the environment one more time.

/app/oracle/product/11.2.0/grid/bin/oifcfg getif

Cluster ware proved already of course but checked the databases and listeners as a last sanity check and that completed the tasks for this time.

Time to inform Apps team that they can perform their sanity checks and start the applications again.

Happy reading and till next Time.

Mathijs

Upgrade to 12C GridInfra lessons learned

Introduction:

Not sure if it was word from a wise Dba or just from a fortune cookie (might even have been from a Pink Panther movie). It said always expect the unexpected and as an add-on success just loves preparation.

This week one of my tasks was to upgrade a 4 node Oracle Rac cluster from 11.2.0.4 to 12c (12.1.0.2.0) grid infrastructure. And even though I came well prepared (see also detailed other blog for that ( https://mathijsbruggink.com/2017/05/01/upgrading-11g-gridinfra-to-12c-in-linux/) several small surprises occurred which will be used as a lesson learned in upcoming upgrades of  the grid infra structure. Also I would like to offer some timeline as with regard to how long the upgrade process really took.

Lessons learned:

  • During the preparations needed to order extra disks for ASM storage for the Grid Infrastructure management repository (GIMR). When i started the runInstaller as a first check if all was well prepared  noticed that the installer software is indeed most likely looking for a diskgroup called +OCR or +VOTING. This could be a trap if you had not extended one of them ( but instead a +GRID diskgroup ). So when preparing look for either OCR or VOTING ( best both if present) to add extra disks ( and have some disks at spare).
  • During the start of maintenance window the Linux colleague mentioned that he would have to stop the Hyperion services. This activity took some 45 minutes of the change window. Will have to find out if this was a justified claim to stop those services and will need to add an extra step to the pre-checks to find out about other services – daemons that are running on the cluster that might be impacted when doing an upgrade.
  • Purpose of rootupgrade.sh after the installation part via the runinstaller completes and the upgrade part via the runinstaller commences. Rootupgrade.sh will perform the actual ASM upgrade, will configure the OLR (local registry) amongst other things.

Timelines:

Every change on a test  or production environment will have to come with a plan with regard to an estimated time needed how long the change will take. First and most important of course choose the strategy, will  a rolling window be used (thus minimizing impact since at least one node will be up ( thinking about a kind of batch where first batch will hold first node , second batch holding node 2 and 3 in my  4 node example, and a last batch holding the last node)).

Start of Change Window : 20:00 CET ( 6:00 UTC ) .
According to Linux expert Hyperion services needed to be stopped before we could continue.

Start of installation: 20:45 CET.
Started the runInstaller on the first node. Software was deployed to first node and all the nodes in the cluster (4 Node Rac).

Upgrade part of the existing 11.2.0.4 GridInfra structure:
21:30 – 21:58 on the first Node (MYSRVR09hr) the rootupgrade.sh was started. (used the manual upgrade ( still a bit hmm unwilling to leave it all to the automated option), this means set up a root session on first node and run: ./app/grid/product/`1102/grid/rootupgrade.sh).

In the runInstaller it was offered to automate and to run the rootupgrade.sh in parallel on Node number 2 and 3. So in separate windows but to me it felt better to open a terminal session as root in parallel  to run the script on each server.

22:06 – 22:13 on MYSRVR10hr : ./app/grid/product/`1102/grid/rootupgrade.sh
22:06 – 22:24 on MYSRVR11hr : ./app/grid/product/`1102/grid/rootupgrade.sh

On the last node MYSRVR12hr:
22:28 – 22:48 ./app/grid/product/`1102/grid/rootupgrade.sh

After that install continued with the Grid Infrastructure management repository (GIMR) database  and  once completed  i ran a number sanity checks in the cluster:

22:50 23:55

At 23:59:59 Reported mission completed.

Happy reading and till next time,

Mathijs

 

 

Upgrading 11G GridInfra to 12C in Linux

Introduction:

With spring 2017 around new initiatives are developed. As a preparation to start doing Database upgrades to 12C  it will be a mandatory step to upgrade the Cluster-ware ( Grid-Infrastructure) first before doing the database part. So in  this case very happy me that finally the time has come that one of the customers requests to upgrade a number of Clusters to 12C Grid-infrastructure.  In  this document will share thoughts , and my plan to tackle this interesting puzzle. Since the first Cluster upgrade will happen pretty soon (this week) the document might evolve with the lessons learned of that first upgrade. Happy reading in advance.

Preparations:

It could be some text of a fortune cookie but every success just loves preparation so in this case that will not be any different. First thing to do was to identify a scope of clusters that had to be upgraded. Together with customer an inventory  list had to be created and in the end 10 Clusters have been defined as part of scope for this action. 8 Test clusters and 2 production environments . Interesting detail will be that all Clusters have been patched pretty recently all holding 11.2.0.4 Grid infrastructure with some extra challenge that the below Operating system will come in two flavors (being Red Hat Linux server release 5.11 (Tikanga) and 6.5 (Santiago). Curious in advance already to see if these different versions of Red Hat will have an influence of the steps to be performed. In the details below you will find more details on detailed preparations and actions of the upgrade.

Operating System:

One of the first steps to investigate is of course to find out if the Operating versions at hand are supported ones for the Upgrade. Oracle support confirmed that even though it would be recommended to upgrade the 5.11 Red Hat version first to Red Hat 7, it should work with the 5.11 version at hand. The 6.5 Os version was okay anyhow. The project decided however that an OS upgrade of the 5.11 boxes would delay things so upgrading the OS will be done in a different project.

Storage:

Before even considering to run the upgrade of the grid-infrastructure some extra time needs to be spend to investigate the storage in place in the Cluster for such upgrade. Often the Oracle software is first set up locally on each box on Volume group VG0 but with the out-of-place-installation these days that might become a challenge if  there is not enough local storage present anymore in the box. Due to standards those root disk become nearly untouchable. For my project this storage requirement has been defined as an absolute minimum which means there will  be a need for extra local storage per node or even for San storage per node which  will be presented as required mount points to me. If such storage would not (or no longer be present locally)  I have to request and received additional storage for it.

/app/grid /app/oracle /var/opt/oracle /tmp San 4 lvm dbs
50GB 70GB 32M 1GB

Short explain for this:

/app/grid : 12C Grid-Infra software will be installed.
/app/oracle: For the 12C Database software.
/var/opt/oracle and /tmp: required minimum space.
San 4 lvm dbs:  will be setup for 4GB mountpoints, for each Instance on the local node in order to hold logfiles.

When migrating to 12C and coming  from 11G  please be informed that you might need extra storage in your OCR – VOTING disk group due to a new feature as well. This new repository database will have to be implemented during the upgrade. This Grid Infrastructure management repository (GIMR) database has become mandatory in Oracle GI 12.1.0.2. Data files associated with it will be created in same diskgroup as OCR or voting.  (Average growth per day per node = app 750 MB so a 4 node cluster would lead at default retention of 3 days to app 9 GB storage requirement in OCR  or VOTING diskgroup).  A fortunate Note is that retention can be changed. Well in my case this means that more ASM disks will need to be added  to the specific disk group. At work most OCR and VOTING diskgroups are set up as bare minimum ( in normal redundancy with three disks each like 4 GB each). ( extra info on this topic: https://blogs.oracle.com/UPGRADE/entry/grid_infrastructure_management_repository_gimr)

Detailed preparations and health checks.

One of the quotes in IT sometimes is that you should not touch a well running system. Well in this case I would like to add but if you do, come well prepared.  In this case i have put the focus on the three below tools to prove that the current system is in a good shape to run the upgrade which is also to be regarded as a health check of the environment. These preps are based on the Mos note (1579762.1) from from reading Chapter 13 in the great book “Expert Oracle Rac 12C”  by Syed Jaffar Hussain, Tariq Farooq,Riyaj Shamsudeen and Kai Yu. ( ISBN-13 (electronic): 978-1-4302-5045-6).

  • Opatch
  • RACcheck: Orachk
  • Runcluvfy

Opatch

Using opatch in order to make sure that the Orainventory is in good shape on all nodes in the cluster. Command issued is investiging the current gridinfrastructure:

opatch lsinventory -oh /opt/crs/product/11204/crs -detail

-oh means for the specific ORACLE_HOME.

-detail shows all details.

RACcheck: Orachk

I have looked on Metalink and Downloaded and installed this tool on the cluster (nodes).

orachk Version
12.2.0.1.2_20161215

Following Quick start guide for this tool:

http://docs.oracle.com/cd/E68491_01/OEXUG/quick-start-guide.htm#OEXUG-GUID-CB4224DA-F389-4E9C-AB6A-C57F46A80C61

Clear information to be found In mos :

ORAchk Upgrade Readiness Assessment (Doc ID 1457357.1)

With the  tool downloaded below steps have been performed:

According to documentation the tool needs to be copied, unpacked (and installed) in suptools subdirectory of the cluster software installation.

scp orachk.zip oracle@mysrvr23hr:/opt/crs/product/11204/crs/suptools
scp orachk.zip oracle@mysrvr24hr:/opt/crs/product/11204/crs/suptools

Once unzipped the tool can run in two modes, a pre upgrade mode and a post upgrade mode:

./orachk u -o pre |tee Orachk_pre_20170124.log
./orachk u -o post |tee Orachk_post_20170124.log

Note: the tee command will also create a log file holding all the steps – progress information during run time.
Note: /opt/oracle/.orachk should be empty before stat otherwise:‘Another instance of orachk is running on:: #  message.

Runcluvfy

Working with runcluvfy  is like meeting an old friend again. Yet each time it is a bit of struggle to find optimal syntax – parameters to be used for your set up.

#Wrong setup was
./runcluvfy.sh stage -pre crsinst -upgrade -n mysrvr23hr,mysrvr24hr -rolling -fixup -src_crshome /opt/crs/product/11204/crs -dest_home /app/grid/product/12102/grid -dest_version 12.1.0 -verbose
## working version
./runcluvfy.sh stage -pre crsinst -n mysrvr23hr,mysrvr24hr -verbose|tee runcluvfy_20170130_pre.lst
Or
./runcluvfy.sh stage -pre crsinst -upgrade -rolling -src_crshome /opt/crs/product/11204/crs -dest_crshome /app/grid/product/12102/grid -dest_version 12.1.0.2.0 -verbose|tee runcluvfy_20170130_preUpgrade.lst

Upgrade steps:

Now it will become to plan and set up your upgrade steps after the confidence build on the preparation. In the upgrade multiple approaches will be possible.  But my goal in this is plain and simple, minimum Impact on Cluster and on the databases hosted on that cluster so I will be aiming for this Scenario:  rolling upgrade ASM + Clusterware. A baseline for such will be the below URL:

https://docs.oracle.com/database/121/CWLIN/procstop.htm#CWLIN10001

Working according to company standards will require to use following specific settings for an $ORACLE_BASE, $ORACLE_HOME for the GI installation and a different $ORACLE_HOME for the database software.

oracle@mysrvrhr:/home/oracle [CRS]# echo $ORACLE_BASE
/app/oracle
oracle@mysrvrhr:/home/oracle [CRS]# echo $ORACLE_HOME
/app/grid/product/12102/grid

oracle@mysrvrhr:/home/oracle [MYDB1]# echo $ORACLE_HOME
/app/oracle/product/12102/db

Below in the bullets will go through the steps and comment where needed.

  • Due to Grid Infrastructure management repository (GIMR) database I had to add larger disks to VOTING diskgroup to have enough storage in place (the steps on how to add the new disks and drop the old ones are too detailed for this blog (after all it is a blog and not a book 🙂 so I will have to blog about that in a separate blog).
  • Check /tmp because upgrade requires at least 1GB present in /tmp. Either clean up or have  /tmp extended. (use ls -lSh  command).
  •  check ocr integrity by :
cluvfy comp ocr -n all -verbose
  • Check backup of ocr and voting disk in the cluster:
    ocrconfig -showbackup

Note: this command can be performed as ORACLE user and will shows info similar to the information below.  Interesting aspect here was that I issued the command on the first node ( but the automated back-ups are all on  node 11hr).

oracle@mysrvr09hr:/opt/oracle [CRS]# ocrconfig -showbackup
mysrvr11hr 2017/04/21 05:20:36 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup00.ocr
mysrvr11hr 2017/04/21 01:20:29 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup01.ocr
mysrvr11hr 2017/04/20 21:20:07 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup02.ocr
mysrvr11hr 2017/04/20 01:19:42 /opt/crs/product/11204/crs/cdata/mysrvr03cl/day.ocr
mysrvr11hr 2017/04/12 17:16:11 /opt/crs/product/11204/crs/cdata/mysrvr03cl/week.ocr
PROT-25: Manual backups for the Oracle Cluster Registry are not available
  • As the root user Run a Manual Backup of the OCR information. Run the ocrconfig -manualbackup command on a node where the Oracle Cluster-ware stack is up and running to force Oracle Cluster-ware to perform a backup of OCR at any time, rather than wait for the automatic backup.  Note: The -manualbackup option is especially useful when you want to obtain a binary backup on demand, such as before you make changes to OCR. The OLR only supports manual backups. NOTE: In 11gR2, the voting files are backed up automatically as part of OCR. Oracle recommends NOT used dd command to backup or restore as this can lead to loss of the voting disk.
mysrvr09hr:root:/root # cd /opt/crs/product/11204/crs/bin/
mysrvr09hr:root:/opt/crs/product/11204/crs/bin # ./ocrconfig -manualbackup
mysrvr11hr 2017/04/21 09:12:40 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup_20170421_091240.ocr

## Checking a second time will now also show a manual backup 2 b in place:
mysrvr09hr:root:/opt/crs/product/11204/crs/bin # ./ocrconfig -showbackup
mysrvr11hr 2017/04/21 05:20:36 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup00.ocr
mysrvr11hr 2017/04/21 01:20:29 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup01.ocr
mysrvr11hr 2017/04/20 21:20:07 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup02.ocr
mysrvr11hr 2017/04/20 01:19:42 /opt/crs/product/11204/crs/cdata/mysrvr03cl/day.ocr
mysrvr11hr 2017/04/12 17:16:11 /opt/crs/product/11204/crs/cdata/mysrvr03cl/week.ocr
mysrvr11hr 2017/04/21 09:12:40 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup_20170421_091240.ocr

Last line is now showing the manual backup
(since it is showing the format (backup_yyyymmdd_hhmmss.ocr)
  • Check Location of OCR and Voting Disk (need to be in a diskgroup )
##How:
cat /etc/oracle/ocr.loc
## Shows output similiar to this
## (if ocr is already mirrored in other Diskgroup with normal Redundancy)
#Device/file getting replaced by device +OCR
ocrconfig_loc=+VOTE
ocrmirrorconfig_loc=+OCR

 

 

##How: 
crsctl query css votedisk

## Will show 3 voting disks in Disk group Vote due to Normal redundancy (and 3 Disk)
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
 1. ONLINE 36b26f862b9a4f54bfba3096e3d50afa (/dev/mapper/asm-vote01) [VOTE]
 2. ONLINE 9d45d791c1124febbf0a093d5a185c13 (/dev/mapper/asm-vote02) [VOTE]
 3. ONLINE 1b7e510a302e4f03bfdea942d55d7067 (/dev/mapper/asm-vote03) [VOTE]
Located 3 voting disk(s).
## check in ASM:
select a.name dg_name,
a.GROUP_NUMBER dg_number,
a.state dg_state,
b.DISK_NUMBER d_number, 
b.name d_name,
b.mount_status
d_mount_status,
b.header_status
d_header_status,
b.mode_status
d_mode_status,
b.state d_state,
b.FAILGROUP
d_failgroup,
b.path d_path
from
v$asm_diskgroup a,
v$asm_disk b
where
a.GROUP_NUMBER(+) = b.GROUP_NUMBER
order by 2,4;
  • Unset environment Variables:
unset ORACLE_BASE 
unset ORACLE_HOME 
unset GI_HOME 
unset ORA_CRS_HOME 
unset TNS_ADMIN
unset ORACLE_SID
unset ORA_NLS10
  • Check active crs version and software version:
## using the current CRS to document current active - and software version
/opt/crs/product/11204/crs/bin/crsctl query crs activeversion
/opt/crs/product/11204/crs/bin/crsctl query crs softwareversion
  • Performing a Standard Upgrade from an Earlier Release
## Use the following procedure to upgrade the cluster from an earlier release:
Start the installer, and select the option to upgrade an existing Oracle Clusterware and Oracle ASM installation.
On the node selection page, select all nodes.
Select installation options as prompted. 
Note: Oracle recommends that you configure root script automation,
so that the sh script can be run automatically during the upgrade.
Run root scripts, using either automatically or manually:

Running root scripts automatically:
TIP: If you have configured root script automation, 
then use the pause between batches to relocate services from the nodes running the previous release to the new release.
Comment Mathijs: I have not decided yet on this automation step. 
In the documentation read as prep for the upgrade you see the option to create multiple batches:
like batch 1 starting node, 
batch 2 all but last node,
batch 3 last node. 
I will use both the automated way for one cluster and then use the below manual (old school method mentioned below) on another cluster.

Running root scripts manually:
If you have not configured root script automation, then when prompted, 
run the rootupgrade.sh script on each node in the cluster that you want to upgrade.

If you run root scripts manually, then run the script on the local node first. 
The script shuts down the earlier release installation, replaces it with the new Oracle Clusterware release, and starts the new Oracle Clusterware installation.
After the script completes successfully, you can run the script in parallel on all nodes except for one, which you select as the last node. 
When the script is run successfully on all the nodes except the last node, run the script on the last node.
After running the sh script on the last node in the cluster, if you are upgrading from a release earlier than Oracle Grid Infrastructure 11g Release 2 (11.2.0.2), 
and left the check box labeled ASMCA checked, which is the default, then Oracle Automatic Storage Management Configuration Assistant ASMCA runs automatically, 
and the Oracle Grid Infrastructure upgrade is complete. 
If you unchecked the box during the interview stage of the upgrade, then ASMCA is not run automatically.

If an earlier release of Oracle Automatic Storage Management (Oracle ASM) is installed, then the installer starts ASMCA to upgrade Oracle ASM to 12c Release 1 (12.1). 
You can choose to upgrade Oracle ASM at this time, or upgrade it later.
Oracle recommends that you upgrade Oracle ASM at the same time that you upgrade Oracle Clusterware. 
Until Oracle ASM is upgraded, Oracle Databases that use Oracle ASM cannot be created and the Oracle ASM management tools in the Oracle Grid Infrastructure 12c Release 1 (12.1) home (for example, srvctl) do not work.

Note: 
Because the Oracle Grid Infrastructure home is in a different location than the former Oracle Clusterware and Oracle ASM homes, 
update any scripts or applications that use utilities, libraries, or other files that reside in the Oracle Clusterware and Oracle ASM homes.
  • Check active crs version and software version:
/opt/crs/product/11204/crs/bin/crsctl query crs activeversion
/opt/crs/product/11204/crs/bin/crsctl query crs softwareversion
  • Post upgrade checks:
 ps -ef|grep d.bin should show daemons started from 12C.

Thoughts on Rollback:

Of course each migration will be as good as its preparation. But still your plan should at least hold the steps for a rollback in case you might not make it to a successful completed task. Below you will find the steps mentioned in general.

On all remote nodes, use the command syntax Grid_home/crs/install/rootcrs.sh -downgrade to stop the 12c Release 1 (12.1).
On the local node use the command syntax Grid_home/crs/install/rootcrs.sh -downgrade -lastnode
On any of the cluster member nodes where the rootupgrade.sh script has run successfully:

cd /u01/app/12.1.0/grid/oui/bin
./runInstaller -nowait -waitforcompletion -ignoreSysPrereqs -updateNodeList
-silent CRS=false ORACLE_HOME=/u01/app/12.1.0/grid

On any of the cluster member nodes where the rootupgrade script has run successfully:
In Old ORACLE_HOME (the earlier Oracle Clusterware installation).$ cd /opt/crs/product/11204/crs/oui/bin/
$ ./runInstaller -nowait -waitforcompletion -ignoreSysPrereqs -updateNodeList -silent CRS=true ORACLE_HOME=/u01/app/crs

Start the Oracle Clusterware stack manually:
On each node, start Oracle Clusterware from the earlier release Oracle Clusterware home:
/opt/crs/product/11204/crs/bin/crsctl start crs

As always thank you for taking an interest in my blog. Happy reading and till the next time.

Mathijs

 

 

Transport Tablespace as Mig to 11.2.0.4 with rman.

Introduction:

For one of the projects the question came in to investigate and set up a  11.2.0.4 Real application cluster database with an extra challenge that the migration had to be done cross-platform from Oracle 10.2.0.3 on Solaris platform to 11.2.0.4.0 on Linux. From application provider came the suggestion to investigate a back-up – restore scenario with an upgrade on the new server ( Linux environment). Due to the fact that the Source environment was 10.20.3 on Solaris and due to fact we were heading towards a Rac cluster environment on on Linux that suggestion was  the first that was send to the dustbin.

Normal export  / import was the second scenario that was explored. Of course this is a valid scenario  but given  the fact that the database was more than 1.x TB not exactly the most favorite way to bring  the data across. But whit scripting and using multiple par-files  and or with moving  partitioned data across in waves would be a  fair plan-b.

From reading though Had put my mind to the use of  transportable tablespaces as a way forward with this challenging question.

Requirements:

As preparation for the job requested to have Nas filesystem mounted between the source Server (MySunServer) holding the 10G database and the target Server (MyLinuxcluster). This Nas filesystem  would hold  the datapumps to be created, to hold the scripts and parfiles  / config files as was suggested  based on Mos Note ( 1389592.1 ). Nas system was / read-writable from both  servers. The  perl scripts that come with the note will support in the transport of the tablespaces but also help in  the convert of big endian to little endian And as a bonus in my case will do the copy into ASM.

Due to the layout of the database in the source environment  Rman was chosen as the best way forward with the scenario.

As a preparation an 110204 Rac database was set up on the target cluster. This  database only to hold the normal tablespaces and a smal temporary tablespace for the users. ( In TTS solution the name of the data tablespaces that come across to the new environment may not exist in the new environment). All data- application users have been pre created on the new environment with a  new – default user tablespace.

Details & Comments

Configuration file for the Perl scripts:

This is a file  that is part of the unzipped file from the Mos note. It needs to be setup to match your specific needs.  Will only show settings  I have used and  its comments:

xtt.properties:
## Reduce Transportable Tablespace Downtime using Incremental Backups
## (Doc ID 1389592.1)

## Properties file for xttdriver.pl

## See documentation below and My Oracle Support Note 1389592.1 for details.
## Tablespaces to transport

## Specify tablespace names in CAPITAL letters.

tablespaces=MYDB_DATA,MYDB_EUC_DATA,MYDB_EUC_INDEX,MYDB_INDEX,MYTS,USERS
##tablespaces=MYCOMPTTS
## Source database platform ID

## platformid

## Source database platform id, obtained from V$DATABASE.PLATFORM_ID

platformid=2

## srclink

## Database link in the destination database that refers to the source

## database. Datafiles will be transferred over this database link using
## dbms_file_transfer.
srclink=TTSLINK

## Location where datafile copies are created during the “-p prepare” step.

## This location must have sufficient free space to hold copies of all
## datafiles being transported.

dfcopydir=/mycomp_mig_db_2_linux/mybox/rman

## backupformat

## Location where incremental backups are created.

backupformat=/mycomp_mig_db_2_linux/mybox/rman

## Destination system file locations

## stageondest

## Location where datafile copies are placed by the user when they are

## transferred manually from the souce system. This location must have
## sufficient free space to hold copies of all datafiles being transported.

stageondest=/mycomp_mig_db_2_linux/mybox/rman

# storageondest

## This parameter is used only when Prepare phase method is RMAN backup.

## Location where the converted datafile copies will be written during the

## "-c conversion of datafiles" step. This is the final location of the
## datafiles where they will be used by the destination database.
storageondest=+MYDBP_DATA01/mydbp/datafile
## backupondest

## Location where converted incremental backups on the destination system

## will be written during the "-r roll forward datafiles" step.

## NOTE: If this is set to an ASM location then define properties

##      asm_home and asm_sid below. If this is set to a file system
##       location, then comment out asm_home and asm_sid below
backupondest=+MYDBP_DATA01/mydbp/datafile

## asm_home, asm_sid

## Grid home and SID for the ASM instance that runs on the destination

asm_home=/opt/crs/product/11204/crs

asm_sid=+ASM1
## Parallel parameters

parallel=8

## rollparallel

## Defines the level of parallelism for the -r roll forward operation.

## If undefined, default value is 0 (serial roll forward).

rollparallel=2
## getfileparallel

## Defines the level of parallelism for the -G operation

getfileparallel=4

## desttmpdir

## This should be defined to same directory as TMPDIR for getting the

## temporary files. The incremental backups will be copied to directory pointed
## by stageondest parameter.
desttmpdir=/mycomp_mig_db_2_linux/MYDBP/scripts

 

Below in a Table format you will see the steps performed with comments.

Steps do qualify for

  • I for Initial steps – activities
  • P for Preparation
  • R for Roll Forward activities
  • T for Transport activities

Server column shows where the action needs to be done.

Step Server What needs 2 b done
I1.3 Source Identify the tablespace(s) in the source database that will be transported ( Application owner needs to support with schema owner information) :
tablespaces=MYDB_DATA,MYDB_EUC_DATA,MYDB_EUC_INDEX,

MYDB_INDEX,MYTS,USERS

I1.5 Source + Target In my case project offered an nfs filesystem which i could use : Nfs filesystem : /mycomp_mig_db_2_linux
I1.6 Source Together with the Mos note cam  this zip file : Unzip rman-xttconvert.zip.
I1.7 Source Tailor the extracted file xtt.properties file on the source system to match your environment.
I1.8 Target As the oracle software owner copy all xttconvert scripts and the modified xtt.properties file to the destination system. This was not needed since we used the nas filesystem.
P1.9 Source + Target On both environments set up this:

export TMPDIR= /mycomp_mig_db_2_linux/MYDBP/scripts.

P2B.1 Source perl xttdriver.pl -p
Note. Do Not use ]$ $ORACLE_HOME/perl/bin/perl this did not work
P2B.2 Source Copy files to destination. N/A since we use NFS
P2B3 Target On the destination system, logged in as the oracle user with the environment (ORACLE_HOME and ORACLE_SID environment variables) pointing to the destination database, copy the rmanconvert.cmd file created in step 2B.1 from the source system and run the convert datafiles step as follows:
[oracle@dest]$ scp oracle@source:/home/oracle/xtt/rmanconvert.cmd /home/oracle/xtt N/A since we use NFS.
perl/bin/perl xttdriver.pl –c
R3.1 Source On the source system, logged in as the oracle user with the environment (ORACLE_HOME and ORACLE_SID environment variables) pointing to the source database, run the create incremental step as follows:
perl xttdriver.pl –I
R3.3 Target [oracle@dest]$ scp oracle@source:/home/oracle/xtt/xttplan.txt /home/oracle/xtt
[oracle@dest]$ scp oracle@source:/home/oracle/xtt/tsbkupmap.txt /home/oracle/xtt
 Since we are using Nas shared filesystem no need to copy with scp  between source and target.
perl xttdriver.pl -r
R3.4 Source On the source system, logged in as the oracle user with the environment (ORACLE_HOME and ORACLE_SID environment variables) pointing to the source database, run the determine new FROM_SCN step as follows:
perl xttdriver.pl –s
R3.5 Source 1.     If you need to bring the files at the destination database closer in sync with the production system, then repeat the Roll Forward phase, starting with step 3.1.
2.     If the files at the destination database are as close as desired to the source database, then proceed to the Transport phase.
T4.0 Source As found in note : Alter Tablespace Read Only Hanging When There Are Active TX In Any Tablespace (Doc ID 554832.1). A restart of the database is required to have no active transactions. Alternative during off hours . Actually during a first test with one dedicated tablespace with only one  object it took more than 7 hrs. Oracle seems to look and wait  for ALL active transactions, not only the ones that would impact   the object in the test tablespace i worked with.
T4.1 Source On the source system, logged in as the oracle user with the environment (ORACLE_HOME and ORACLE_SID environment variables) pointing to the source database, make the tablespaces being transported READ ONLY.
alter tablespace MYDB_DATA read only;
alter tablespace MYDB_EUC_DATA read only;
alter tablespace MYDB_EUC_INDEX read only;
alter tablespace MYDB_INDEX read only;
alter tablespace MYTS read only;
alter tablespace USERS read only;
T4.2 Source Repeat steps 3.1 through 3.3 one last time to create, transfer, convert, and apply the final incremental backup to the destination datafiles.
perl xttdriver.pl -i
T4.2 Target [oracle@dest]$ scp oracle@source:/home/oracle/xtt/xttplan.txt /home/oracle/xtt
[oracle@dest]$ scp oracle@source:/home/oracle/xtt/tsbkupmap.txt /home/oracle/xtt
perl xttdriver.pl –r
..
T4.3 Target On the destination system, logged in as the oracle user with the environment (ORACLE_HOME and ORACLE_SID environment variables) pointing to the destination database, run the generate Data Pump TTS command step as follows:
perl xttdriver.pl –e
The generate Data Pump TTS command step creates a sample Data Pump network_link transportable import command in the file xttplugin.txt. It will hold list of all the TTS you have configured and all its transport_datafiles in details.
Example of that generated file : cat xttplugin.txt
impdp directory=MYDB_XTT_DIR logfile=tts_imp.log \
network_link=TTSLINK.PROD.NL transport_full_check=no \
transport_tablespaces=MYCOMPTTS ,A,B,C\
transport_datafiles=’+MYDBP_DATA01/mycomptts_152.xtf’
Note in our example once edited we chmodded   xttplugin.txt with 744 and ran it as script .
T4.3 Source After the object metadata being transported has been extracted from the source database, the tablespaces in the source database may be made READ WRITE again, if desired.
T4.4 Target At this step, the transported data is READ ONLY in the destination database.  Perform application specific validation to verify the transported data.
Also, run RMAN to check for physical and logical block corruption by running VALIDATE TABLESPACE as follows:
In rman:
validate tablespace MYDB_DATA, MYDB_EUC_DATA, MYDB_EUC_INDEX, MYDB_INDEX, MYTS, USERS check logical;
T4.5 Target alter tablespace MYDB_DATA read write;
alter tablespace MYDB_EUC_DATA read write;
alter tablespace MYDB_EUC_INDEX read write;
alter tablespace MYDB_INDEX read write,
alter tablespace MYTS read write;
alter tablespace USERS read write;
T5 Source + Target Cleanup of NFS filesystem.
Put Source Db in restricted mode as a fallback after the go live for couple of days then put it to tape and decommission it;

Adding Vip Address and Goldengate to Grid Infra structure 11.2.0.3

Introduction:

Earlier this week preparations have been started to add  the Goldengate software to the Grid  infrastructure of 11.2.0.0.3 on  the Billing environment on production. As part of that scenario   I also had to add a Vip address  that is to be used by the Goldengate  Software  as part of high(er) availability. In my concept  Goldengate Daemons are  running on on Node only  by default. During a Node crash ( of course not wanted  nor desired )  or as a way to load balance work on the cluster the Vip address and the Goldengate software need to stop and restart on the other Node. Below you will  find a working example as part of the preparations I have performed.  Some comment has been added to the specific steps.

Commands will be typed in italic in this blog.

Details

## First step will be to be adding the vip address 10.97.242.40 oradb-gg-localp1.dc-lasvegas.us  to the Grid Infra (GI). Note IP address and the description have been defined in the DNS. Once I got feedback that the address was added I was able to perform a nslookup. Of course it was not possible yet to ping the ip  because we first have to add it to the cluster as is done here.

## As root:
/opt/crs/product/11203/crs/bin/appvipcfg create -network=1 -ip=10.97.242.40 -vipname=oradb-gg-localp1.dc-lasvegas.us -user=root

## Once that is in place , grant permissions to Oracle user to work with the vip address:
(As root, allow the Oracle Grid infrastructure software owner (e.g. Oracle) to run the script to start the VIP.)

/opt/crs/product/11203/crs/bin/crsctl setperm resource oradb-gg-localp1.dc-lasvegas.us -u user:oracle:r-x

## Now it is time to start  the Vip:
## As Oracle, start the VIP:
/opt/crs/product/11203/crs/bin/crsctl start resource oradb-gg-localp1.dc-lasvegas.us

##Check our activities:
## As Oracle:

/opt/crs/product/11203/crs/bin/crsctl status resource oradb-gg-localp1.dc-lasvegas.us -p

## In my setup  Goldengate is defined to be able to run on either node one (usapb1hr)  or  on node 2 (usapb2hr) in my four node cluster. And  Since i want to make sure it only runs on those two servers I add placement to restricted.
## As root:

/opt/crs/product/11203/crs/bin/crsctl modify resource oradb-gg-localp1.dc-lasvegas.us  -attr “HOSTING_MEMBERS=usapb1hr usapb2hr”
/opt/crs/product/11203/crs/bin/crsctl modify resource oradb-gg-localp1.dc-lasvegas.us  -attr  “PLACEMENT=restricted”

## As always the taste of the creme brulee is in the details  so let’ s check :
## As Oracle:
/opt/crs/product/11203/crs/bin/crsctl status resource oradb-gg-localp1.dc-lasvegas.us -p

##  Great that worked , now lets relocate the Vip to the other node as a test:
## As Oracle:

/opt/crs/product/11203/crs/bin/crsctl relocate resource oradb-gg-localp1.dc-lasvegas.us

## completed  action with a smile Because it worked as planned.

## As always the taste of the creme brulee is in the details  so let’ s check :
## As Oracle:
/opt/crs/product/11203/crs/bin/crsctl status resource oradb-gg-localp1.dc-lasvegas.us -p

## As part of making sure that setup from scratch was same on all machines ( had the same solution in Pre Prod env. ) let us first remove the existing resource  for Goldengate and then add it to the GI again.

/opt/crs/product/11203/crs/bin/crsctl delete resource myGoldengate

## as Oracle  ( white paper was very specific about that , performed it as root first time ending up with   wrong primary group in the ACL which i checked in the end) . So stick to plan ! And do this als ORACLE. Add the resource to the GI and  put in a relationship to the Vip address that has been created in the GI earlier, AND  inform the cluster about  the action script that is to be used during a relocate – server boot  – node crash . ( This script is in my case a shell script holding conditions like stop, start , status etc   and the correspondig commands in the Goldengate that are to be used by the GI:

/opt/crs/product/11203/crs/bin/crsctl add resource myGoldengate -type cluster_resource -attr “ACTION_SCRIPT=/opt/crs/product/11203/crs/crs/public/gg.local.active, CHECK_INTERVAL=30, START_DEPENDENCIES=’hard(oradb-gg-localp1.dc-lasvegas.us) pullup(oradb-gg-localp1.dc-lasvegas.us)’, STOP_DEPENDENCIES=’hard(oradb-gg-localp1.dc-lasvegas.us)‘”

## Altering  hosting members and placement again ( by default only one node part of hosting_members and placement=balanced by default).

## As root:
/opt/crs/product/11203/crs/bin/crsctl modify resource  myGoldengate -attr “HOSTING_MEMBERS=usapb1hr usapb2hr”

/opt/crs/product/11203/crs/bin/crsctl modify resource  myGoldengate -attr  “PLACEMENT=restricted”

## so in the end you should check it with this:

/opt/crs/product/11203/crs/crs/public [CRS]# crsctl status resource myGoldengate -p

## Time to set  set permission to myGoldengate (altering Ownership to myGoldengate user ( which is my OS user for this).
### As root:
/opt/crs/product/11203/crs/bin/crsctl setperm resource myGoldengate -o myGoldengate

## needed and sometimes forgotten to make sure that the oracle user ( who is also the owner of the Grid infra software on these boxes).
###As root, allow oracle to run the script to start the goldengate_app application.
/opt/crs/product/11203/crs/bin/crsctl setperm resource myGoldengate -u user:oracle:r-x

 

Wrap-up:

All preparations are now in place. During an already scheduled maintenance window  following steps will be performed to bring this scenario to a HA solution for Goldengate.

  • Stop the Goldengate software daemons ( at moment  stopped and started by hand) .
  • Start the  Goldengate resource  via the Grid Infra ( giving her control of status and activities) .
  • Perform checks that Goldengate is starting  its activities .
  • Perform a relocate  of the  Goldengate resource  via the Grid Infra to the other node.
  • Perform checks that Goldengate is starting  its activities .

As an old quote states . Success just loves preparation. With these preparations in place  I feel confident  for the maintenance window to put this Solution  live .

 

As Always ,  happy reading and till next Time ,

 

Mathijs

When PRCD-1027 and PRCD-1229 is spoiling your rainy day

Introduction:

More then one year ago i had set up an Oracle restart environment  with Grid Infra, ASM and Databases all in 11.2.0.2.0 since that was a requirement from vendor at first. Once the server had been handed over to production  I got the request that it should also host EMC  based  Clones and those clones where 11.2.0.3.0. That meant i had to upgrade both Grid infrastructure and the database software and of course the databases as well.

So i geared up , did an upgrade of the GI and the Rdbms software and of course of the local databases in place. After that  the Emc clones had been added and every thing  looked fine .

Until ……….

Error Messages after Server reboot:

Well until the server got rebooted. After that server reboot a first sign that things where wrong was that the databases , did not start via the grid infra structure which was not expected !

So there I was again ready for solving another puzzle and of course people waiting for the DBs to come online so they could work.

## First clue:

I checked the Resource ( the database ) in the cluster with:   crsctl status resource ….  –p

Much to my surprise that showed the wrong oracle home ( it was 11.2.0.2.0 the initial Oracle Home before upgrade). But I was so sure that I had upgraded the database.. What did i miss . Even more strange was that the Cluster agent kept altering my oratab  for the specific database to have the old oracle home ( and  it would almost stick out tongue at me telling  #line has been added by agent ).

## Second clue

When i altered the oratab to show the correct oracle home i could start the database via sqlplus which was indeed my second clue .

After a big face-palm it became clear to me that the cluster was not having correct status in the cluster ware about that Oracle Home ..

## Will srvctl modify do the Job:

srvctl modify database -d mydb -o /opt/oracle/product/11203_ee_64/db

##output:

PRCD-1027 : Failed to retrieve database mydb

PRCD-1229 : An attempt to access configuration of database migoms was rejected because its version 11.2.0.2.0 differs from the program version 11.2.0.3.0. Instead run the program from /opt/oracle/product/11202_ee_64/db

Well that was not expected.  Especially since that other clue was that the db can be started as 1120.3 db when oracle env is put properly in ORATAB.

##Solution:

First I tried :

srvctl modify database -d mydb -o /opt/oracle/product/11203_ee_64/db

but that is wrong as we already saw in this post.

Hmm then i thought of an expression in German. once you start doing it the right way  things will start to work for you :

Plain and simple this is what  I have to do making things right again:

srvctl upgrade database -d  mydb -o /opt/oracle/product/11203_ee_64/db

After that I started mydb via cluster she is happy now.

## Bottom-line ( aka lesson learned ).

If you upgrade your databases in on an oracle Restart /  Rac cluster environment make it part of your upgrade plan to upgrade the information in the clusterlayer of that specific database.

As always,

Happy Reading and till we meet again.

Mathijs