1 | As of 11.2 Grid Infrastructure, the private network configuration is not only stored in OCR but also in the gpnp profile. If the private network is not available or its definition is incorrect, the CRSD process will not start and any subsequent changes to the OCR will be impossible. Therefore care needs to be taken when making modifications to the configuration of the private network. It is important to perform the changes in the correct order. Please also note that manual modification of gpnp profile is not supported. |
Please take a backup of profile.xml on all cluster nodes before proceeding, as grid user: | |
cd $GRID_HOME/gpnp/<hostname>/profiles/peer/ | |
cd /app/oracle/product/12.x.x/grid/gpnp/mysrvrahr/profiles/peer | |
cd /app/oracle/product/12.x.x/grid/gpnp/mysrvrbhr/profiles/peer | |
cp -p profile.xml profile.xml.bk | |
2 | Ensure Oracle Clusterware is running on ALL cluster nodes in the cluster and save current status of resource. |
/app/oracle/product/12.x.x/grid/bin/crsctl check cluster -all | |
/app/oracle/product/12.x.x/grid/bin/crsctl status resource -t>/tmp/beforeNewIps.lst | |
3 | As grid user ( always curious who might that b, to me it was the Oracle user btw.): |
Get the existing information. Here you will see only One interconnect in place For example: | |
/app/oracle/product/12.x.x/grid/bin/oifcfg getif | |
bond1 172.18.112.208 global cluster_interconnect | |
bond0 195.233.190.64 global public | |
## | |
Check the interfaces / subnet address can be identified by command for eth specifically: | |
/app/oracle/product/12.x.x/grid/bin/oifcfg iflist|grep -i eth|sort | |
eth0 172.18.32.0 | |
eth2 192.168.10.0 | |
eth6 192.168.11.0 | |
or check interfaces / subnets in general on OS with oifcfg: | |
/app/oracle/product/12.x.x/grid/bin/oifcfg iflist|sort | |
4 | Add the new cluster_interconnect information: |
/app/oracle/product/12.x.x/grid/bin/oifcfg setif -global eth2/192.168.10.0:cluster_interconnect,asm | |
/app/oracle/product/12.x.x/grid/bin/oifcfg setif -global eth6/192.168.11.0:cluster_interconnect,asm | |
5 | Verify the change: |
/app/oracle/product/12.x.x/grid/bin/oifcfg getif | |
With this information checked and in place it is time for setting up new listeners for asm since the original ASM listener during the installation used eth0 and that eth0 will be dropped – removed from cluster configuration in steps below: Existing listener ASMNET1LSNR will become new one ASMNET122LSNR. srvctl add listener -asmlistener -l ASMNET1221LSNR -subnet 192.168.10.0 (as mentioned this is the eth2 interface that we are going to use). srvctl add listener -asmlistener -l ASMNET1222LSNR -subnet 192.168.11.0 (as mentioned this is the eth6 interface that we are going to use). As always seeing is believing : use crsctl status resource -t to see details. Note: The new ASM listener is created as a resource and it is in a status offline offline on all nodes in the cluster at this point and time. In the next step we will remove the old ASM listener, and use a -f option to prevent errors – messages with regard to dependencies. srvctl update listener -listener ASMNET1LSNR_ASM -asm -remove -force I have checked again with crsctl status resource -t to make sure the old resource is gone now. Removing the old ASM listener In the Mos note there is a little inconsistency because it claims that as a next step the old ASM listener should be stopped. I was able to grep for the listener ( ps -ef|grep -i inherit) and i saw it on OS level on the machine(S). But I am not able to stop that listener since the cluster resource is already gone and lsnrctl did not work. Solution: What I noticed that when I skipped this step and stopped and started the cluster which is mandatory in this scenario, the listener was gone on all nodes. Should have given this command, but that is NOT working: lsnrctl stop ASMNET1LSNR_ASM Check configuration before restarting GI: First command: srvctl config listener -asmlistener Name: ASMNET122LSNR_ASM Type: ASM Listener Owner: oracle Subnet: 192.168.10.0 Home: <CRS home> End points: TCP:1527 Listener is enabled. Listener is individually enabled on nodes: Listener is individually disabled on nodes: Second Command: srvctl config asm ASM home: <CRS home> Password file: +VOTE/orapwASM Backup of Password file: ASM listener: LISTENER ASM instance count: ALL Cluster ASM listener: ASMNET122LSNR_ASM | |
6 | In GridInfrastructure: Shutdown Oracle Clusterware on all nodes and disable the Oracle Clusterware as root user ( in my example i was allowed to sudo ): |
sudo su – | |
./app/oracle/product/12.x.x/grid/bin/crsctl stop crs | |
./app/oracle/product/12.x.x/grid/bin/crsctl disable crs | |
7 | Make the network configuration change at OS level as required, ensure the new interface is available on all nodes after the change. ( check to ping the interfaces on all nodes ). |
for x in 10 11;do for xx in 75 76 77 78;do ping -c2 192.168.${x}.${xx}|egrep ‘icmp_seq|transmitted’;done;echo;done | |
for x in a b c d; do for xx in 1 2;do ping -c2 mysrvr${x}hr-hb$xx|egrep ‘icmp_seq|transmitted’;done;echo;done | |
8 | Restart Oracle Clusterware on all nodes as root user: |
sudo su – | |
./app/oracle/product/12.x.x/grid/bin/crsctl start crs | |
9 | Check |
/app/oracle/product/12.x.x/grid/bin/crsctl check cluster -all | |
/app/oracle/product/12.x.x/grid/bin/crsctl status resource -t>/tmp/afterNewIps.lst | |
sdiff /tmp/afterNewIps.lst /tmp/beforeNewIps.lst | |
Enable Oracle Clusterware on all nodes as root user: | |
./app/oracle/product/12.x.x/grid/bin/crsctl enable crs | |
10 | Remove the old interface if required: |
/app/oracle/product/12.x.x/grid/bin/oifcfg delif -global bond1/172.18.112.208:cluster_interconnect | |
11 | Verify the remove: |
/app/oracle/product/12.x.x/grid/bin/oifcfg getif |
Rac
Missing or Corrupted Spfile in ASM instance in Rac
Introduction:
Old saying is , always expect the unexpected, well this time this was another proof of that. During patching of an 8 node cluster on the first node we came across 2 issues , both requiring a work around. Issue one was that after applying January 2019 Cluster did not start. This workaround is not part of this note btw. The second issue was that once we had the first Workaround in place the asm instance on the first node would not start. This note is explaining the steps followed to create a new spfile for the ASM instance(s) in a rac cluster.
General information:
This was the scenario where we found ourselves in: Patching in a rolling way had started on the first node with January 2019 on 12.2 Oracle (GI and RDBMS) the first node was patched but crs would not start ( and hmm never liked opatchauto a lot 2 b honest). Together with Oracle support a work around was provided but after that the ASM instance still would not start. During compare of environments it showed one very significant memory setting on this cluster. Maybe this would have been MEMORY_* and we are using hugepages on that cluster) that prevented asm from starting once the patching on the first node completed .
As a work around we created a pfile and once the cluster on node one was up we started the asm instance with the pfile ( which was altered by me).
However we could no longer update the spfile for the other asm instances since oracle told us that in a rolling upgrade mode you cannot make changes to the spfile . That meant that on all 8 Nodes we performed the patching , and once crs was up we had to start the asm with a copy of the init.ora we used on 1st node too. In our case 8 nodes with a local copy of the init.ora which did not make us happy at all. That however brought us to below scenario where you need to bring the asm back to using an spfile .
Important note: Since oracle 11.2 the GPNP profile is the key for such change !!!!
From old days , came up with below scenario to create a pfile, alter that file to meet your needs and bring that as a spfile for the asm instance :
Scenario
- Could be used with a missing spfile – or corrupted .
- With an existing spfile with wrong settings , but where you cannot alter since you started patching already ( spfile updates are prohibited in rolling upgrade scenario ) , so maybe best practice is to analyse spfile before.
- Important message: Scenario has the requirement that the FULL cluster stack is down and you will work with ONE node only!!
#### With spget you can check current location of spfile in asmcmd.
ASMCMD [+] > spget
+VOTE/mysrvr18cl/ASMPARAMETERFILE/registry.253.978015605
#### created and altered the pfile on the first node. ( and copied it to all other nodes during the workaround).
oracle@mysrvr1dr:/app/grid/product/12201/grid/dbs []# cd /app/oracle/admin/+ASM1/pfile
oracle@mysrvr1dr:/app/oracle/admin/+ASM1/pfile []# ls -ltr
total 4
-rw-r–r–. 1 oracle dba 2433 Feb 8 09:23 initASM.ora
##### starting cluster 1st attempt (recalled that the cluster needed 2 b in some part of restricted mode for that, so all of cluster was stopped , then below command was issued). But Oracle showed mercy , telling to use the correct syntax:
mysrvr1dr:root:/app/grid/product/12201/grid/bin $ ./crsctl start crs restrict
Parse error:
‘restrict’ is an invalid argument
Usage:
crsctl start crs [-excl [-nocrs | -cssonly]] | [-wait | -waithas | -nowait] | [-noautostart]
Start OHAS on this server
where
-excl Start Oracle Clusterware in exclusive mode
-nocrs Start Oracle Clusterware in exclusive mode without starting CRS
-nowait Do not wait for OHAS to start
-wait Wait until startup is complete and display all progress and status messages
-waithas Wait until startup is complete and display OHASD progress and status messages
-cssonly Start only CSS
-noautostart Start only OHAS
## Then started cluster in exclusive mode , that failed too btw since it tried to start the asm instance, which was still holding the original spfile with the incorrect information.
mysrvr1dr:root:/app/grid/product/12201/grid/bin $ ./crsctl start crs -excl
- CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘mysrvr1dr’
- CRS-2676: Start of ‘ora.cssdmonitor’ on ‘mysrvr1dr’ succeeded
- CRS-2672: Attempting to start ‘ora.cssd’ on ‘mysrvr1dr’
- CRS-2672: Attempting to start ‘ora.diskmon’ on ‘mysrvr1dr’
- CRS-2676: Start of ‘ora.diskmon’ on ‘mysrvr1dr’ succeeded
- CRS-2676: Start of ‘ora.cssd’ on ‘mysrvr1dr’ succeeded
- CRS-2672: Attempting to start ‘ora.ctssd’ on ‘mysrvr1dr’
- CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip’ on ‘mysrvr1dr’
- CRS-2676: Start of ‘ora.ctssd’ on ‘mysrvr1dr’ succeeded
- CRS-2676: Start of ‘ora.cluster_interconnect.haip’ on ‘mysrvr1dr’ succeeded
- CRS-2672: Attempting to start ‘ora.asm’ on ‘mysrvr1dr’
- CRS-2674: Start of ‘ora.asm’ on ‘mysrvr1dr’ failed
- CRS-2672: Attempting to start ‘ora.storage’ on ‘mysrvr1dr’
- ORA-15077: could not locate ASM instance serving a required diskgroup
- CRS-2674: Start of ‘ora.storage’ on ‘mysrvr1dr’ failed
- CRS-2679: Attempting to clean ‘ora.storage’ on ‘mysrvr1dr’
- CRS-2681: Clean of ‘ora.storage’ on ‘mysrvr1dr’ succeeded
- CRS-2673: Attempting to stop ‘ora.cluster_interconnect.haip’ on ‘mysrvr1dr’
- CRS-2677: Stop of ‘ora.cluster_interconnect.haip’ on ‘mysrvr1dr’ succeeded
- CRS-2673: Attempting to stop ‘ora.ctssd’ on ‘mysrvr1dr’
- CRS-2677: Stop of ‘ora.ctssd’ on ‘mysrvr1dr’ succeeded
- CRS-4000: Command Start failed, or completed with errors.
## Next attempt , whole cluster was down , on first node performed
- mysrvr1dr:root:/app/grid/product/12201/grid/bin $ ./crsctl start crs -excl -nocrs
- ## in this nocrs mode we were able to start the +ASM1 instance manually now.
- SQL> STARTUP PFILE=’/app/oracle/admin/+ASM1/pfile/initASM.ora’;
- ## Once ASM had started create the new SPFILE
- create spfile = ‘+VOTE’ from PFILE=’/app/oracle/admin/+ASM1/pfile/initASM.ora’;
- ## in Alert noticed this which is good SINCE this shows gpnp profile had been updated accordingly
- 019-02-08T16:31:32.839547+01:00
- NOTE: updated gpnp profile ASM SPFILE to
- NOTE: header on disk 0 advanced to format #2 using fcn 0.0
- NOTE: header on disk 2 advanced to format #2 using fcn 0.0
- NOTE: updated gpnp profile ASM diskstring: /dev/mapper/ASM_*
- NOTE: updated gpnp profile ASM diskstring: /dev/mapper/ASM_*
- 2019-02-08T16:31:34.381619+01:00
- NOTE: updated gpnp profile ASM SPFILE to +VOTE/mysrvr18cl/ASMPARAMETERFILE/registry.253.999707493
##### Checked our activities in ASMCMD:
- oracle@mysrvr1dr:/app/oracle/admin/+ASM1/pfile [+ASM1]# asmcmd
- [Option -p will be used ]
- ASMCMD [+] > spget
- +VOTE/mysrvr18cl/ASMPARAMETERFILE/registry.253.999707493
### Checked our activities in gpnptool
oracle@mysrvr1dr:/app/oracle/admin/+ASM1/pfile [+ASM1]# gpnptool get
Warning: some command line parameters were defaulted. Resulting command line:
/app/grid/product/12201/grid/bin/gpnptool.bin get -o-
<?xml version=”1.0″ encoding=”UTF-8″?><gpnp:GPnP-Profile Version=”1.0″ xmlns=”http://www.grid-pnp.org/2005/11/gpnp-profile” xmlns:gpnp=”http://www.grid-pnp.org/2005/11/gpnp-profile” xmlns:orcl=”http://www.oracle.com/gpnp/2005/11/gpnp-profile” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd” ProfileSequence=”7″ ClusterUId=”afc024ecfd5ffff8ffbeda0a212bebe1″ ClusterName=”mysrvr18cl” PALocation=””><gpnp:Network-Profile><gpnp:HostNetwork id=”gen” HostName=”*”><gpnp:Network id=”net1″ IP=”198.19.11.0″ Adapter=”bond0″ Use=”public”/><gpnp:Network id=”net2″ IP=”192.168.10.0″ Adapter=”eth3″ Use=”asm,cluster_interconnect”/><gpnp:Network id=”net3″ IP=”192.168.11.0″ Adapter=”eth5″ Use=”cluster_interconnect”/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id=”css” DiscoveryString=”+asm” LeaseDuration=”400″/><orcl:ASM-Profile id=”asm” DiscoveryString=”/dev/mapper/ASM_*” SPFile=”+VOTE/mysrvr18cl/ASMPARAMETERFILE/registry.253.999707493″ Mode=”remote” Extended=”false”/><ds:Signature xmlns:ds=”http://www.w3.org/2000/09/xmldsig#“><ds:SignedInfo><ds:CanonicalizationMethod Algorithm=”http://www.w3.org/2001/10/xml-exc-c14n#“/><ds:SignatureMethod Algorithm=”http://www.w3.org/2000/09/xmldsig#rsa-sha1“/><ds:Reference URI=””><ds:Transforms><ds:Transform Algorithm=”http://www.w3.org/2000/09/xmldsig#enveloped-signature“/><ds:Transform Algorithm=”http://www.w3.org/2001/10/xml-exc-c14n#“> <InclusiveNamespaces xmlns=”http://www.w3.org/2001/10/xml-exc-c14n#” PrefixList=”gpnp orcl xsi”/></ds:Transform></ds:Transforms><ds:DigestMethod Algorithm=”http://www.w3.org/2000/09/xmldsig#sha1“/><ds:DigestValue>QH9UPO559zhufkrc7tFxQts6oF0=</ds:DigestValue></ds:Reference></ds:SignedInfo><ds:SignatureValue>aL2hOnxyLt5YwMcPjGg8LUDx2KD97Y75eLv+
yqvcfQ5O705K8ceQPCnwnsTs4Wn5E1jNeYCEzXnrVp5zM3hMbz9LdEEP2GKk9XJInQprWc39z7JKxm4uEw
NX3Ocs54FqxP1JdBX7PRiMh/
ePd8CoJIVtIaVMD29giX078uGwXcQ=</ds:SignatureValue></ds:Signature></gpnp:GPnP-Profile>
### since we have started cluster with -excl –nocrs time to stop the cluster and start it normally
mysrvr1dr:root:/app/grid/product/12201/grid/bin $ ./crsctl stop crs
- CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘mysrvr1dr’
- CRS-2673: Attempting to stop ‘ora.crsd’ on ‘mysrvr1dr’
- CRS-2677: Stop of ‘ora.crsd’ on ‘mysrvr1dr’ succeeded
- CRS-2673: Attempting to stop ‘ora.cluster_interconnect.haip’ on ‘mysrvr1dr’
- CRS-2673: Attempting to stop ‘ora.crf’ on ‘mysrvr1dr’
- CRS-2673: Attempting to stop ‘ora.drivers.acfs’ on ‘mysrvr1dr’
- CRS-2673: Attempting to stop ‘ora.gpnpd’ on ‘mysrvr1dr’
- CRS-2673: Attempting to stop ‘ora.mdnsd’ on ‘mysrvr1dr’
- CRS-2677: Stop of ‘ora.drivers.acfs’ on ‘mysrvr1dr’ succeeded
- CRS-2677: Stop of ‘ora.cluster_interconnect.haip’ on ‘mysrvr1dr’ succeeded
- CRS-2677: Stop of ‘ora.crf’ on ‘mysrvr1dr’ succeeded
- CRS-2677: Stop of ‘ora.gpnpd’ on ‘mysrvr1dr’ succeeded
- CRS-2673: Attempting to stop ‘ora.ctssd’ on ‘mysrvr1dr’
- CRS-2673: Attempting to stop ‘ora.storage’ on ‘mysrvr1dr’
- CRS-2677: Stop of ‘ora.storage’ on ‘mysrvr1dr’ succeeded
- CRS-2677: Stop of ‘ora.mdnsd’ on ‘mysrvr1dr’ succeeded
- CRS-2677: Stop of ‘ora.ctssd’ on ‘mysrvr1dr’ succeeded
- CRS-2673: Attempting to stop ‘ora.cssd’ on ‘mysrvr1dr’
### starting cluster normally on first node in normal mode
mysrvr1dr:root:/app/grid/product/12201/grid/bin $ ./crsctl start crs
### Had small issue , so decided to stop the cluster on node 1 with force option
mysrvr1dr:root:/app/grid/product/12201/grid/bin $ ./crsctl stop crs -f
- CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘mysrvr1dr’
- CRS-2673: Attempting to stop ‘ora.mdnsd’ on ‘mysrvr1dr’
- CRS-2673: Attempting to stop ‘ora.gpnpd’ on ‘mysrvr1dr’
- CRS-2677: Stop of ‘ora.mdnsd’ on ‘mysrvr1dr’ succeeded
- CRS-2677: Stop of ‘ora.gpnpd’ on ‘mysrvr1dr’ succeeded
- CRS-2673: Attempting to stop ‘ora.ctssd’ on ‘mysrvr1dr’
- CRS-2673: Attempting to stop ‘ora.evmd’ on ‘mysrvr1dr’
- CRS-2673: Attempting to stop ‘ora.asm’ on ‘mysrvr1dr’
- CRS-2673: Attempting to stop ‘ora.drivers.acfs’ on ‘mysrvr1dr’
- CRS-2677: Stop of ‘ora.drivers.acfs’ on ‘mysrvr1dr’ succeeded
- CRS-2677: Stop of ‘ora.ctssd’ on ‘mysrvr1dr’ succeeded
- CRS-2677: Stop of ‘ora.evmd’ on ‘mysrvr1dr’ succeeded
- CRS-2677: Stop of ‘ora.asm’ on ‘mysrvr1dr’ succeeded
- CRS-2673: Attempting to stop ‘ora.cluster_interconnect.haip’ on ‘mysrvr1dr’
- CRS-2677: Stop of ‘ora.cluster_interconnect.haip’ on ‘mysrvr1dr’ succeeded
- CRS-2673: Attempting to stop ‘ora.cssd’ on ‘mysrvr1dr’
- CRS-2677: Stop of ‘ora.cssd’ on ‘mysrvr1dr’ succeeded
- CRS-2673: Attempting to stop ‘ora.gipcd’ on ‘mysrvr1dr’
- CRS-2677: Stop of ‘ora.gipcd’ on ‘mysrvr1dr’ succeeded
- CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘mysrvr1dr’ has completed
- CRS-4133: Oracle High Availability Services has been stopped.
#### Time to start the cluster in normal mode for all nodes
mysrvr1dr:root:/root $ cd /app/grid/product/12201/grid/bin
mysrvr1dr:root:/app/grid/product/12201/grid/bin $ ./crsctl start cluster -all
- CRS-2672: Attempting to start ‘ora.evmd’ on ‘mysrvr6dr’
- CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘mysrvr6dr’
- CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘mysrvr4dr’
- CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘mysrvr2dr’
- CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘mysrvr3dr’
- CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘mysrvr8dr’
- CRS-2672: Attempting to start ‘ora.evmd’ on ‘mysrvr2dr’
- CRS-2672: Attempting to start ‘ora.evmd’ on ‘mysrvr4dr’
- CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘mysrvr7dr’
- CRS-2672: Attempting to start ‘ora.evmd’ on ‘mysrvr3dr’
- CRS-2672: Attempting to start ‘ora.evmd’ on ‘mysrvr8dr’
- CRS-2672: Attempting to start ‘ora.evmd’ on ‘mysrvr7dr’
- CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘mysrvr5dr’
- CRS-2672: Attempting to start ‘ora.evmd’ on ‘mysrvr5dr’
- CRS-2676: Start of ‘ora.cssdmonitor’ on ‘mysrvr4dr’ succeeded
- CRS-2676: Start of ‘ora.cssdmonitor’ on ‘mysrvr8dr’ succeeded
- CRS-2672: Attempting to start ‘ora.cssd’ on ‘mysrvr4dr’
- CRS-2672: Attempting to start ‘ora.diskmon’ on ‘mysrvr4dr’
- CRS-2672: Attempting to start ‘ora.cssd’ on ‘mysrvr8dr’
- CRS-2676: Start of ‘ora.cssdmonitor’ on ‘mysrvr2dr’ succeeded
- CRS-2672: Attempting to start ‘ora.diskmon’ on ‘mysrvr8dr’
- CRS-2676: Start of ‘ora.cssdmonitor’ on ‘mysrvr3dr’ succeeded
- CRS-2676: Start of ‘ora.cssdmonitor’ on ‘mysrvr6dr’ succeeded
- CRS-2676: Start of ‘ora.cssdmonitor’ on ‘mysrvr7dr’ succeeded
- CRS-2672: Attempting to start ‘ora.cssd’ on ‘mysrvr2dr’
- CRS-2672: Attempting to start ‘ora.cssd’ on ‘mysrvr6dr’
- CRS-2672: Attempting to start ‘ora.diskmon’ on ‘mysrvr2dr’
- CRS-2672: Attempting to start ‘ora.cssd’ on ‘mysrvr3dr’
- CRS-2672: Attempting to start ‘ora.diskmon’ on ‘mysrvr6dr’
- CRS-2672: Attempting to start ‘ora.cssd’ on ‘mysrvr7dr’
- CRS-2672: Attempting to start ‘ora.diskmon’ on ‘mysrvr3dr’
- CRS-2676: Start of ‘ora.diskmon’ on ‘mysrvr4dr’ succeeded
- CRS-2672: Attempting to start ‘ora.diskmon’ on ‘mysrvr7dr’
- CRS-2676: Start of ‘ora.diskmon’ on ‘mysrvr8dr’ succeeded
- CRS-2676: Start of ‘ora.diskmon’ on ‘mysrvr2dr’ succeeded
- CRS-2676: Start of ‘ora.diskmon’ on ‘mysrvr6dr’ succeeded
- CRS-2676: Start of ‘ora.diskmon’ on ‘mysrvr3dr’ succeeded
- CRS-2676: Start of ‘ora.diskmon’ on ‘mysrvr7dr’ succeeded
- CRS-2676: Start of ‘ora.cssdmonitor’ on ‘mysrvr5dr’ succeeded
- CRS-2672: Attempting to start ‘ora.cssd’ on ‘mysrvr5dr’
- CRS-2672: Attempting to start ‘ora.diskmon’ on ‘mysrvr5dr’
- CRS-2676: Start of ‘ora.diskmon’ on ‘mysrvr5dr’ succeeded
- CRS-2676: Start of ‘ora.evmd’ on ‘mysrvr6dr’ succeeded
- CRS-2676: Start of ‘ora.evmd’ on ‘mysrvr2dr’ succeeded
- CRS-2676: Start of ‘ora.evmd’ on ‘mysrvr4dr’ succeeded
- CRS-2676: Start of ‘ora.evmd’ on ‘mysrvr8dr’ succeeded
- CRS-2676: Start of ‘ora.evmd’ on ‘mysrvr3dr’ succeeded
- CRS-2676: Start of ‘ora.evmd’ on ‘mysrvr7dr’ succeeded
- CRS-2676: Start of ‘ora.evmd’ on ‘mysrvr5dr’ succeeded
- CRS-2676: Start of ‘ora.cssd’ on ‘mysrvr8dr’ succeeded
- CRS-2672: Attempting to start ‘ora.ctssd’ on ‘mysrvr8dr’
- CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip’ on ‘mysrvr8dr’
- CRS-2676: Start of ‘ora.cssd’ on ‘mysrvr2dr’ succeeded
- CRS-2672: Attempting to start ‘ora.ctssd’ on ‘mysrvr2dr’
- CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip’ on ‘mysrvr2dr’
- CRS-2676: Start of ‘ora.cssd’ on ‘mysrvr5dr’ succeeded
- CRS-2672: Attempting to start ‘ora.ctssd’ on ‘mysrvr5dr’
- CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip’ on ‘mysrvr5dr’
- CRS-2676: Start of ‘ora.ctssd’ on ‘mysrvr8dr’ succeeded
- CRS-2676: Start of ‘ora.ctssd’ on ‘mysrvr2dr’ succeeded
- CRS-2676: Start of ‘ora.cssd’ on ‘mysrvr7dr’ succeeded
- CRS-2672: Attempting to start ‘ora.ctssd’ on ‘mysrvr7dr’
- CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip’ on ‘mysrvr7dr’
- CRS-2676: Start of ‘ora.ctssd’ on ‘mysrvr5dr’ succeeded
- CRS-2676: Start of ‘ora.cssd’ on ‘mysrvr4dr’ succeeded
- CRS-2672: Attempting to start ‘ora.ctssd’ on ‘mysrvr4dr’
- CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip’ on ‘mysrvr4dr’
- CRS-2676: Start of ‘ora.cssd’ on ‘mysrvr3dr’ succeeded
- CRS-2676: Start of ‘ora.cssd’ on ‘mysrvr6dr’ succeeded
- CRS-2672: Attempting to start ‘ora.ctssd’ on ‘mysrvr3dr’
- CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip’ on ‘mysrvr3dr’
- CRS-2672: Attempting to start ‘ora.ctssd’ on ‘mysrvr6dr’
- CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip’ on ‘mysrvr6dr’
- CRS-2676: Start of ‘ora.ctssd’ on ‘mysrvr7dr’ succeeded
- CRS-2676: Start of ‘ora.cluster_interconnect.haip’ on ‘mysrvr8dr’ succeeded
- CRS-2672: Attempting to start ‘ora.asm’ on ‘mysrvr8dr’
- CRS-2676: Start of ‘ora.asm’ on ‘mysrvr8dr’ succeeded
- CRS-2672: Attempting to start ‘ora.storage’ on ‘mysrvr8dr’
- CRS-2676: Start of ‘ora.ctssd’ on ‘mysrvr4dr’ succeeded
- CRS-2676: Start of ‘ora.ctssd’ on ‘mysrvr3dr’ succeeded
- CRS-2676: Start of ‘ora.ctssd’ on ‘mysrvr6dr’ succeeded
- CRS-2676: Start of ‘ora.cluster_interconnect.haip’ on ‘mysrvr2dr’ succeeded
- CRS-2672: Attempting to start ‘ora.asm’ on ‘mysrvr2dr’
- CRS-2676: Start of ‘ora.asm’ on ‘mysrvr2dr’ succeeded
- CRS-2672: Attempting to start ‘ora.storage’ on ‘mysrvr2dr’
- CRS-2676: Start of ‘ora.storage’ on ‘mysrvr8dr’ succeeded
- CRS-2672: Attempting to start ‘ora.crsd’ on ‘mysrvr8dr’
- CRS-2676: Start of ‘ora.crsd’ on ‘mysrvr8dr’ succeeded
- CRS-2676: Start of ‘ora.storage’ on ‘mysrvr2dr’ succeeded
- CRS-2672: Attempting to start ‘ora.crsd’ on ‘mysrvr2dr’
- CRS-2676: Start of ‘ora.cluster_interconnect.haip’ on ‘mysrvr5dr’ succeeded
- CRS-2672: Attempting to start ‘ora.asm’ on ‘mysrvr5dr’
- CRS-2676: Start of ‘ora.asm’ on ‘mysrvr5dr’ succeeded
- CRS-2672: Attempting to start ‘ora.storage’ on ‘mysrvr5dr’
- CRS-2676: Start of ‘ora.crsd’ on ‘mysrvr2dr’ succeeded
- CRS-2676: Start of ‘ora.storage’ on ‘mysrvr5dr’ succeeded
- CRS-2672: Attempting to start ‘ora.crsd’ on ‘mysrvr5dr’
- CRS-2676: Start of ‘ora.cluster_interconnect.haip’ on ‘mysrvr7dr’ succeeded
- CRS-2672: Attempting to start ‘ora.asm’ on ‘mysrvr7dr’
- CRS-2676: Start of ‘ora.asm’ on ‘mysrvr7dr’ succeeded
- CRS-2672: Attempting to start ‘ora.storage’ on ‘mysrvr7dr’
- CRS-2676: Start of ‘ora.crsd’ on ‘mysrvr5dr’ succeeded
- CRS-2676: Start of ‘ora.storage’ on ‘mysrvr7dr’ succeeded
- CRS-2672: Attempting to start ‘ora.crsd’ on ‘mysrvr7dr’
- CRS-2676: Start of ‘ora.cluster_interconnect.haip’ on ‘mysrvr6dr’ succeeded
- CRS-2672: Attempting to start ‘ora.asm’ on ‘mysrvr6dr’
- CRS-2676: Start of ‘ora.asm’ on ‘mysrvr6dr’ succeeded
- CRS-2672: Attempting to start ‘ora.storage’ on ‘mysrvr6dr’
- CRS-2676: Start of ‘ora.cluster_interconnect.haip’ on ‘mysrvr4dr’ succeeded
- CRS-2672: Attempting to start ‘ora.asm’ on ‘mysrvr4dr’
- CRS-2676: Start of ‘ora.asm’ on ‘mysrvr4dr’ succeeded
- CRS-2672: Attempting to start ‘ora.storage’ on ‘mysrvr4dr’
- CRS-2676: Start of ‘ora.cluster_interconnect.haip’ on ‘mysrvr3dr’ succeeded
- CRS-2672: Attempting to start ‘ora.asm’ on ‘mysrvr3dr’
- CRS-2676: Start of ‘ora.asm’ on ‘mysrvr3dr’ succeeded
- CRS-2672: Attempting to start ‘ora.storage’ on ‘mysrvr3dr’
- CRS-2676: Start of ‘ora.crsd’ on ‘mysrvr7dr’ succeeded
- CRS-2676: Start of ‘ora.storage’ on ‘mysrvr6dr’ succeeded
- CRS-2672: Attempting to start ‘ora.crsd’ on ‘mysrvr6dr’
- CRS-2676: Start of ‘ora.storage’ on ‘mysrvr4dr’ succeeded
- CRS-2672: Attempting to start ‘ora.crsd’ on ‘mysrvr4dr’
- CRS-2676: Start of ‘ora.storage’ on ‘mysrvr3dr’ succeeded
- CRS-2672: Attempting to start ‘ora.crsd’ on ‘mysrvr3dr’
- CRS-2676: Start of ‘ora.crsd’ on ‘mysrvr6dr’ succeeded
- CRS-2676: Start of ‘ora.crsd’ on ‘mysrvr4dr’ succeeded
- CRS-2676: Start of ‘ora.crsd’ on ‘mysrvr3dr’ succeeded
- CRS-4690: Oracle Clusterware is already running on ‘mysrvr1dr’ à fine since we kept cluster running on node 1
CRS-4000: Command Start failed, or completed with errors.
### checks performed :
On each node: ps -ef|grep d.bin
On each node: crsctl stat res -t -init
On a node: crsctl check cluster -all
Happy reading,
And till next time
Mathijs
Reconfigure Rac 12.2 Gridinfra network
Summary.
In 12.2 Grid infrastructure Oracle has altered the concept of ASM to flex-ASM as a default. This blog will take a focus on re-configuring the Oracle Rac 12.2 Grid infrastructure network component parts like the interconnect, the public or to change the interface to do-not-use, whenever that applies / is an improvement to the situation at hand. Read carefully in full before performing it on one of your clusters. Baseline for this action will be a document on Mos (How to Modify Private Network Information in Oracle Clusterware (Doc ID 283684.1))
Details:
As with any change, when going through input – processing – output it is important to have a clear picture of the situation as is. So a first and very mandatory step should be to check with the oifcfg getif command how things are before starting the changes:
When entering the command information with regard to the known network interfaces in the Rac cluster similar to below should be showing:
oracle@mysrvr1dr:/app/oracle/stage/27468969 [+ASM1]# oifcfg getif bond0 198.19.11.0 global public eth0 10.217.210.0 global cluster_interconnect,asm eth2 192.168.10.0 global cluster_interconnect eth7 192.168.11.0 global cluster_interconnect
Here bond0 will be used used as public, eth0 at the moment is holding activities for the cluster interconnect and for asm, eth2 and eth7 are dedicated to the interconnect. Eth0 is defined as admin lan for various activities. In this setup the cluster is unstable, nodes are being evicted. Time to perform steps to stabilize it.
From the Mos note, looking at Case IV. Changing private network interface name, subnet or netmask. For 12c Oracle Clusterware with Flex ASM.
Precaution, taking backup of profile.xml on each node.
Take a backup of profile.xml on all cluster nodes before proceeding, as grid user. In this specific case this is the user that has installed the Grid Infrastructure ( in this scenario that was the oracle user):
Command:
$ cd $GRID_HOME/gpnp/<hostname>/profiles/peer/ $ cp -p profile.xml profile.xml.bk
cd /app/grid/product/12201/grid/gpnp/mysrvr1dr/profiles/peer cp -p profile.xml profile.xml.bk cd /app/grid/product/12201/grid/gpnp/mysrvr2dr/profiles/peer cp -p profile.xml profile.xml.bk cd /app/grid/product/12201/grid/gpnp/mysrvr3dr/profiles/peer cp -p profile.xml profile.xml.bk cd /app/grid/product/12201/grid/gpnp/mysrvr4dr/profiles/peer cp -p profile.xml profile.xml.bk cd /app/grid/product/12201/grid/gpnp/mysrvr5dr/profiles/peer cp -p profile.xml profile.xml.bk cd /app/grid/product/12201/grid/gpnp/mysrvr6dr/profiles/peer cp -p profile.xml profile.xml.bk cd /app/grid/product/12201/grid/gpnp/mysrvr7dr/profiles/peer cp -p profile.xml profile.xml.bk cd /app/grid/product/12201/grid/gpnp/mysrvr8dr/profiles/peer cp -p profile.xml profile.xml.bk
Altering the interconnect:
One of the interconnects should be altered to make sure that the ASM listener is able to communicate using that interface to. In this scenario eth2 was used to do so. When doing this take note of the ip since it will be needed to configure a new ASM listener.
oifcfg setif -global eth2/192.168.10.0:cluster_interconnect,asm oifcfg setif -global eth7/192.168.11.0:cluster_interconnect
Now eth2 shows that it setup for interconnect and asm (only one interconnect should be setup to combine cluster_interconnect+asm).
peer [+ASM1]# oifcfg getif bond0 198.19.11.0 global public eth0 10.217.210.0 global cluster_interconnect,asm eth2 192.168.10.0 global cluster_interconnect,asm eth7 192.168.11.0 global cluster_interconnect
With this information checked and in place it is time for setting up new listener for asm since the original ASM listener during the installation used eth0 and that eth0 will be dropped – removed from cluster configuration in steps below:
Existing listener ASMNET1LSNR will become new one ASMNET122LSNR. srvctl add listener -asmlistener -l ASMNET122LSNR -subnet 192.168.10.0 (as mentioned this is the eth2 interface that we are going to use).
As always seeing is believing : use crsctl status resource -t to see details similar to below. The new ASM listener is created as a resource and it is in a status offline offline on all nodes in the cluster at this point and time :
-------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------- ora.ASMNET122LSNR_ASM.lsnr OFFLINE OFFLINE mysrvr1dr STABLE OFFLINE OFFLINE mysrvr2dr STABLE OFFLINE OFFLINE mysrvr3dr STABLE OFFLINE OFFLINE mysrvr4dr STABLE OFFLINE OFFLINE mysrvr5dr STABLE OFFLINE OFFLINE mysrvr6dr STABLE OFFLINE OFFLINE mysrvr7dr STABLE OFFLINE OFFLINE mysrvr8dr STABLE
In the next step we will remove the old ASM listener, and use a -f option to prevent errors – messages with regard to dependencies.
srvctl update listener -listener ASMNET1LSNR_ASM -asm -remove -force
I have checked again with crsctl status resource -t to make sure the old resource is gone now.
Removing the old ASM listener
In the Mos note there is a little inconsistency because it claims that as a next step the old ASM listener should be stopped. I was able to grep for the listener ( ps -ef|grep -i inherit) and i saw it on OS level on the machine(S). But I am not able to stop that listener since the cluster resource is already gone and lsnrctl did not work. Solution: What I noticed that when I skipped this step and stopped and started the cluster which is mandatory in this scenario, the listener was gone on all nodes.
Should have given this command, but that is NOT working: lsnrctl stop ASMNET1LSNR_ASM
Check configuration before restarting GI:
First command: srvctl config listener -asmlistener Name: ASMNET122LSNR_ASM Type: ASM Listener Owner: oracle Subnet: 192.168.10.0 Home: <CRS home> End points: TCP:1527 Listener is enabled. Listener is individually enabled on nodes: Listener is individually disabled on nodes: Second Command: srvctl config asm ASM home: <CRS home> Password file: +VOTE/orapwASM Backup of Password file: ASM listener: LISTENER ASM instance count: ALL Cluster ASM listener: ASMNET122LSNR_ASM
Both results look great so time to move to the next step (restarting the Grid Infra structure on all nodes).
Restarting Grid infrastructure on all Nodes:
For this next step you have to become root (or sudo su – ) to do the next steps. First and importantly make sure that the Grid infra structure is not restarting automatically should a cluster node perform a reboot (disable crs) , then stop the Grid infrastructure software:
As root /app/grid/product/12201/grid/bin/crsctl disable crs /app/grid/product/12201/grid/bin/crsctl stop crs To be done on: mysrvr[1-8]dr
Checking network configuration on all nodes.
mysrvr1dr:root:/root $ ifconfig -a
Starting cluster again:
As root /app/grid/product/12201/grid/bin/crsctl enable crs /app/grid/product/12201/grid/bin/crsctl start crs To be done on: mysrvr[1-8]dr
Final checks:
oifcfg getif bond0 198.19.11.0 global public eth0 10.217.210.0 global cluster_interconnect,asm eth2 192.168.10.0 global cluster_interconnect,asm eth7 192.168.11.0 global cluster_interconnect
Time to delete eth0
Since eth0 is admin lan, and after our reconfigure steps, time to get rid of the eth0 (remove it from the Grid infra structure).
oifcfg delif -global eth0/10.217.210.0
And a last check again:
oifcfg getif bond0 198.19.11.0 global public eth2 192.168.10.0 global cluster_interconnect,asm eth7 192.168.11.0 global cluster_interconnect
Happy reading, and till we meet again,
Mathijs.
Install & Patch 12.2 GI in Oracle Restart
Introduction:
Where would we be without challenges. I have become team member of a project team for a new billing environment and this team is aiming to use ( and go live) with Oracle 12.2 Grind Infra structure and Database version). The information of this article will become a baseline for the installation of several Oracle environments on Linux. Oracle is referring to this as Oracle Restart. Next in line after that (and I love it) will be Real application clusters to be set up.
General Preparations 12.2 Grid Kata:
## Identifying ORACLE_BASE and layout of Grid Infrastructure.
echo $ORACLE_BASE /app/oracle echo $ORACLE_HOME /app/grid/product/12.2.0.1/grid ## Identifying ORACLE_BASE and Db software echo $ORACLE_BASE /app/oracle echo $ORACLE_HOME /app/oracle/product/12.2.0.1/db
## So for the 12.2 layout which is in scope for the actions on a Restart or Rac environment:
+ASM1 /app/grid/product/12.2.0.1/grid CRS /app/grid/product/12.2.0.1/grid -MGMTDB /app/grid/product/12.2.0.1/grid MYDB /app/oracle/product/12.2.0.1/db
## Checking Red Hat release:
oracle@mysrvr1hr:/dev/mapper []# cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.9 (Santiago)
## Oracle restart installation for 12.2 instructions to be found:
Interesting point is that in 12.2 the famous, well known runInstaller is replaced by ./gridSetup.sh (when opening runInstaller even ran into errors (oui-10133 wen running runInstaller in 12.2). Second point of interest will be that you have to pre-create the directory where the software will be running.
## Preparations for Installation:
- On the server where you will install the Grid infrastructure create the directory where you want to install the software (the location you will later on call Your ORACLE_HOME). On the source server and in my specific case that meant that: I had to do mkdir -p /app/grid/product/12.2.0.1/grid on the source server.
- From Solltau: oracle@myhost:/opt/oracle/Odrive/depot/software/oracle/12c/GI []# scp linuxx64_12201_grid_home.zip oracle@mysrvr1hr:/app/grid/product/12.2.0.1/grid
- UNSET your environment variables if any on the installation box:
unset ORACLE_BASE unset ORACLE_HOME unset GI_HOME unset ORA_CRS_HOME unset TNS_ADMIN unset ORACLE_SID unset ORA_NLS10
echo $ORACLE_BASE etc.
- ## Check zip file in the destination that will also become your ORACLE_HOME for this install.
oracle@mysrvr1hr:/app/grid/product/12.2.0.1/grid []# ls -ltr total 2924504 -rw-r--r--. 1 oracle dba 2994687209 Jan 3 16:28 linuxx64_12201_grid_home.zip
Make sure you use the future ORACLE_HOME destination to unzip the file. This is mandatory because compared to previous Installations where you can alter the Software installed directory, during the 12.2 installation there will NOT be an option to choose the destination where to setup this installation. ## make sure you are in the directory /app/grid/product/12.2.0.1/grid and extract the zip!!
- ##Once the Zip file is extracted fire below script:
./gridSetup.sh
In one of the following detailed screens make sure you change “change directory path” to discover the disks that you will be using for this installation. In my case this meant that the Linux Admin colleague has set up – has labelled dedicated Luns (disks) as a preparation to my actions.
root # ls -lH /dev/mapper/ASM_*
brw-rw----. 1 oracle dba 253, 6 Dec 22 16:01 /dev/mapper/ASM_ACFS_035_001 brw-rw----. 1 oracle dba 253, 33 Dec 22 16:01 /dev/mapper/ASM_OCRM_008_001 brw-rw----. 1 oracle dba 253, 34 Dec 22 16:01 /dev/mapper/ASM_OCRM_008_002 brw-rw----. 1 oracle dba 253, 25 Dec 22 16:01 /dev/mapper/ASM_VOTE_008_001 brw-rw----. 1 oracle dba 253, 26 Dec 22 16:01 /dev/mapper/ASM_VOTE_008_002 brw-rw----. 1 oracle dba 253, 30 Dec 22 16:01 /dev/mapper/ASM_VOTE_008_003
## Once you have made all the selections needed below screen will appear:
Once you selected install you will be updated by next progress screen:
## In a separate screen as the Root user ./root.sh has to run which will show:
mysrvr1hr:root:/app/grid/product/12.2.0.1/grid # ./root.sh
Performing root user operation. The following environment variables are set as: ORACLE_OWNER= oracle ORACLE_HOME= /app/grid/product/12.2.0.1/grid Enter the full pathname of the local bin directory: [/usr/local/bin]: The contents of "dbhome" have not changed. No need to overwrite. The contents of "oraenv" have not changed. No need to overwrite. The contents of "coraenv" have not changed. No need to overwrite. Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created Finished running generic part of root script. Now product-specific root actions will be performed. Using configuration parameter file: /app/grid/product/12.2.0.1/grid/crs/install/crsconfig_params The log of current session can be found at: /app/oracle/crsdata/mysrvr1hr/crsconfig/roothas_2018-01-03_05-02-27PM.log ## logging details of root.sh : LOCAL ADD MODE Creating OCR keys for user 'oracle', privgrp 'dba'.. Operation successful. LOCAL ONLY MODE Successfully accumulated necessary OCR keys. Creating OCR keys for user 'root', privgrp 'root'.. Operation successful. CRS-4664: Node mysrvr1hr successfully pinned. 2018/01/03 17:02:50 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.conf' CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'mysrvr1hr' CRS-2673: Attempting to stop 'ora.evmd' on 'mysrvr1hr' CRS-2677: Stop of 'ora.evmd' on 'mysrvr1hr' succeeded CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'mysrvr1hr' has completed CRS-4133: Oracle High Availability Services has been stopped. CRS-4123: Oracle High Availability Services has been started. mysrvr1hr 2018/01/03 17:03:44 /app/grid/product/12.2.0.1/grid/cdata/mysrvr1hr/backup_20180103_170344.olr 0 2018/01/03 17:03:49 CLSRSC-327: Successfully configured Oracle Restart for a standalone server mysrvr1hr:root:/app/grid/product/12.2.0.1/grid #
Resolving possible issues: 12.2 GI standalone : [INS-20802] Automatic Storage Management Configuration Assistant failed (Doc ID 2277224.1)
## The Installation will create the ASM instance with a default spfile. Due to company standards and due to best practice (knowing that size does matter and default settings will not do well in a heavily used environment) you should connect to the ASM instance and alter below values:
## Specific setup for asm Instance
ALTER SYSTEM SET memory_max_target=4096M SCOPE=SPFILE; ALTER SYSTEM SET memory_target=1536M SCOPE=SPFILE; ALTER SYSTEM SET large_pool_size=100M SCOPE=SPFILE; ALTER SYSTEM SET shared_pool_size=512M SCOPE=BOTH; ALTER SYSTEM SET shared_pool_reserved_size=100M SCOPE=SPFILE; ## Nothing to do with performance but mandatory due to Standards. ALTER SYSTEM SET audit_file_dest='/app/oracle/+ASM/admin/adump' SCOPE=SPFILE; ALTER SYSTEM SET background_dump_dest='/app/oracle/diag/asm/+asm/+ASM/trace' SCOPE=BOTH;
## Company Standards as with regard to listener:
- Log destination: /app/oracle/diag/tnslsnr/mysrvr1hr/listener
- One listener per vip
## So I have added a listener with the netca tool running from the GridInfrastructurehome.
## /app/oracle/diag/tnslsnr/<servername>/<listenername>/trace
oracle@mysrvr1hr:/app/grid/product/12.2.0.1/grid/network/admin [+ASM]# lsnrctl status LISTENER_MYSRVR1HR
## Deinstallation when needed. As always you might need a way out (back again) .
Note: For upgrades from previous releases, if you want to uninstall the previous release Grid home, then perform the following steps:
- Log in as the root user.
- Manually change the permissions of the previous release Grid home (see below).
- Run the /app/grid/product/12.2.0.1/grid/deinstall/deinstall command (as oracle User).
For example, on Grid Infrastructure for a standalone server: # chown -R oracle:dba /app/grid/product/12.2.0.1 # chmod -R 775 /app/grid/product/12.2.0.1 In this example: /u01/app/oracle/product/11.2.0/grid is the previous release Oracle Grid Infrastructure for a standalone server home oracle is the Oracle Grid Infrastructure installation owner user dba is the name of the Oracle Inventory group (OINSTALL group) For example, on Oracle Database: # chown -R oracle:dba /app/oracle/product/12.2.0.1 # chmod -R 775 /app/oracle/product/12.2.0.1
If all is well Time to start Patching the Environment !
## patching : GI : p26737266_122010_Linux-x86-64.zip
## oracle@soltau2:/opt/oracle/Odrive/depot/software/oracle/patches/Linuxx86 []# scp p26737266_122010_Linux-x86-64.zip oracle@mysrvr1hr:/app/grid/product/12.2.0.1/stage
## check current situation with opatch before patching.
opatch lsinventory -detail -oh /app/grid/product/12.2.0.1/grid
This shows:
Oracle Interim Patch Installer version 12.2.0.1.6 Copyright (c) 2018, Oracle Corporation. All rights reserved. Oracle Home : /app/grid/product/12.2.0.1/grid Central Inventory : /app/oraInventory from : /app/grid/product/12.2.0.1/grid/oraInst.loc OPatch version : 12.2.0.1.6 OUI version : 12.2.0.1.4 Log file location : /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatch/opatch2018-01-05_14-39-27PM_1.log Lsinventory Output file location : /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatch/lsinv/lsinventory2018-01-05_14-39-27PM.txt -------------------------------------------------------------------------------- Local Machine Information:: Hostname: mysrvr1hr.mydomain ARU platform id: 226 ARU platform description:: Linux x86-64 Installed Top-level Products (1): Oracle Grid Infrastructure 12c 12.2.0.1.0 There are 1 products installed in this Oracle Home. Installed Products (99): etc. . . There are 99 products installed in this Oracle Home. There are no Interim patches installed in this Oracle Home. -------------------------------------------------------------------------------- OPatch succeeded.
## Use opatch to check for conflicts:
$ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /app/grid/product/12.2.0.1/stage/26737266/26710464 $ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /app/grid/product/12.2.0.1/stage/26737266/26925644 $ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /app/grid/product/12.2.0.1/stage/26737266/26737232 $ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /app/grid/product/12.2.0.1/stage/26737266/26839277 $ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /app/grid/product/12.2.0.1/stage/26737266/26928563
## This did not show any conflicts
## Next step will be : use opatch check for space requirements (you would not want to end up installation failing due to lacking storage:
For Grid Infrastructure Home, as home user:
Create file /tmp/patch_list_gihome.txt with the following content: cat /tmp/patch_list_gihome.txt3. /app/grid/product/12.2.0.1/stage/26737266/269285634. /app/grid/product/12.2.0.1/stage/26737266/268392775. /app/grid/product/12.2.0.1/stage/26737266/267372326. /app/grid/product/12.2.0.1/stage/26737266/269256447. /app/grid/product/12.2.0.1/stage 26737266/26710464
Run the opatch command to check if enough free space is available in the Grid Infrastructure Home:
$ORACLE_HOME/OPatch/opatch prereq CheckSystemSpace -phBaseFile /tmp/patch_list_gihome.txt
## this shows
oracle@mysrvr1hr:/app/grid/product/12.2.0.1/stage [+ASM]# $ORACLE_HOME/OPatch/opatch prereq CheckSystemSpace -phBaseFile /tmp/patch_list_gihome.txt Oracle Interim Patch Installer version 12.2.0.1.6 Copyright (c) 2018, Oracle Corporation. All rights reserved. PREREQ session Oracle Home : /app/grid/product/12.2.0.1/grid Central Inventory : /app/oraInventory from : /app/grid/product/12.2.0.1/grid/oraInst.loc OPatch version : 12.2.0.1.6 OUI version : 12.2.0.1.4 Log file location : /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatch/opatch2018-01-05_14-55-06PM_1.log Invoking prereq "checksystemspace" Prereq "checkSystemSpace" passed. OPatch succeeded.
## To patch only the GI home:
# opatchauto apply /app/grid/product/12.2.0.1/stage/26737266 -oh /app/grid/product/12.2.0.1/grid
## failed with:
OPATCHAUTO-72046: Invalid wallet parameters. OPATCHAUTO-72046: The wallet path or wallet password provided is not valid. OPATCHAUTO-72046: Please provide valid wallet information. opatchauto bootstrapping failed with error code 46.
## Thank you Mos for elaborating.
OPATCHAUTO-72046: Invalid wallet parameters (Doc ID 2150070.1) |
opatchauto command is not being run as root user. Opatchauto for Grid PSUs should always be run as root user.
## So as the root user :
/app/grid/product/12.2.0.1/grid/OPatch/opatchauto apply /app/grid/product/12.2.0.1/stage/26737266 -oh /app/grid/product/12.2.0.1/grid
## and it failed again !!!
mysrvr1hr:root:/root # /app/grid/product/12.2.0.1/grid/OPatch/opatchauto apply /app/grid/product/12.2.0.1/stage/26737266 -oh /app/grid/product/12.2.0.1/grid System initialization log file is /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchautodb/systemconfig2018-01-05_03-09-09PM.log. Session log file is /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/opatchauto2018-01-05_03-09-12PM.log The id for this session is 5LQ1 [init:init] Executing OPatchAutoBinaryAction action on home /app/grid/product/12.2.0.1/grid Executing OPatch prereq operations to verify patch applicability on SIHA Home........ [init:init] OPatchAutoBinaryAction action completed on home /app/grid/product/12.2.0.1/grid with failure Execution of [OPatchAutoBinaryAction] patch action failed, check log for more details. Failures: Patch Target : mysrvr1hr->/app/grid/product/12.2.0.1/grid Type[siha] Details: [ ---------------------------Patching Failed--------------------------------- Command execution failed during patching in home: /app/grid/product/12.2.0.1/grid, host: mysrvr1hr. Command failed: /app/grid/product/12.2.0.1/grid/OPatch/opatchauto apply /app/grid/product/12.2.0.1/stage/26737266 -oh /app/grid/product/12.2.0.1/grid -target_type has -binary -invPtrLoc /app/grid/product/12.2.0.1/grid/oraInst.loc -persistresult /app/grid/product/12.2.0.1/grid/OPatch/auto/dbsessioninfo/sessionresult_analyze_mysrvr1hr_siha.ser -analyze -online Command failure output: ==Following patches FAILED in analysis for apply: Patch: /app/grid/product/12.2.0.1/stage/26737266/26925644 Log: /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-05_15-09-16PM_1.log Reason: Failed during Analysis: CheckNApplyReport Failed, [ Prerequisite Status: FAILED, Prerequisite output: The details are: Prerequisite check "CheckMinimumOPatchVersion" failed.] Failed during Analysis: CheckMinimumOPatchVersion Failed, [ Prerequisite Status: FAILED, Prerequisite output: The details are: The OPatch being used has version 12.2.0.1.6 while the following patch(es) require higher versions: Patch 26710464 requires OPatch version 12.2.0.1.7. Please download latest OPatch from My Orac ... etc. . . OPatchAuto failed. opatchauto failed with error code 42 mysrvr1hr:root:/root #
## So I downloaded latest opatch version and parked it in a temporary directory on that targeted server:
unzip p6880880_122011_Linux-x86-64.zip -d /app/grid/product/12.2.0.1/grid
## now Opatch shows:
oracle@mysrvr1hr:/app/grid/product/12.2.0.1/opatch [+ASM]# opatch version OPatch Version: 12.2.0.1.11 OPatch succeeded.
## Sometimes you just have to be patient to hear the lambs being silent:
## Next run as the root user :
/app/grid/product/12.2.0.1/grid/OPatch/opatchauto apply /app/grid/product/12.2.0.1/stage/26737266 -oh /app/grid/product/12.2.0.1/grid
## logfiles:
oracle@mysrvr1hr:/app/oracle/crsdata/mysrvr1hr/crsconfig -rw-rw----. 1 oracle dba 17364 Jan 5 15:35 hapatch_2018-01-05_03-34-42PM.log -rw-rw----. 1 oracle dba 23725 Jan 5 15:42 hapatch_2018-01-05_03-42-41PM.log ##showed mysrvr1hr:root:/root # /app/grid/product/12.2.0.1/grid/OPatch/opatchauto apply /app/grid/product/12.2.0.1/stage/26737266 -oh /app/grid/product/12.2.0.1/grid OPatchauto session is initiated at Fri Jan 5 15:33:54 2018 System initialization log file is /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchautodb/systemconfig2018-01-05_03-33-58PM.log. Session log file is /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/opatchauto2018-01-05_03-34-02PM.log The id for this session is XLE2 Executing OPatch prereq operations to verify patch applicability on home /app/grid/product/12.2.0.1/grid Patch applicability verified successfully on home /app/grid/product/12.2.0.1/grid Bringing down CRS service on home /app/grid/product/12.2.0.1/grid Prepatch operation log file location: /app/oracle/crsdata/mysrvr1hr/crsconfig/hapatch_2018-01-05_03-34-42PM.log CRS service brought down successfully on home /app/grid/product/12.2.0.1/grid Start applying binary patch on home /app/grid/product/12.2.0.1/grid Binary patch applied successfully on home /app/grid/product/12.2.0.1/grid Starting CRS service on home /app/grid/product/12.2.0.1/grid Postpatch operation log file location: /app/oracle/crsdata/mysrvr1hr/crsconfig/hapatch_2018-01-05_03-42-41PM.log CRS service started successfully on home /app/grid/product/12.2.0.1/grid OPatchAuto successful. --------------------------------Summary-------------------------------- Patching is completed successfully. Please find the summary as follows: Host:mysrvr1hr SIHA Home:/app/grid/product/12.2.0.1/grid Summary: ==Following patches were SUCCESSFULLY applied: Patch: /app/grid/product/12.2.0.1/stage/26737266/26710464 Log: /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-05_15-35-04PM_1.log Patch: /app/grid/product/12.2.0.1/stage/26737266/26737232 Log: /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-05_15-35-04PM_1.log Patch: /app/grid/product/12.2.0.1/stage/26737266/26839277 Log: /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-05_15-35-04PM_1.log Patch: /app/grid/product/12.2.0.1/stage/26737266/26925644 Log: /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-05_15-35-04PM_1.log Patch: /app/grid/product/12.2.0.1/stage/26737266/26928563 Log: /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-05_15-35-04PM_1.log OPatchauto session completed at Fri Jan 5 15:43:05 2018 Time taken to complete the session 9 minutes, 11 seconds
Happy Dba , Installed 12.2 GI and Patched it with RU October 2017.
Thanks for reading and till we meet again,
Mathijs.
Altering the Hearbeat in Oracle Rac environment.
Introduction
When being asked for a part of job description of a dba I would like to take a moment, smile and reply: to serve and to protect the cluster the databases and the data of course. In the matter at hand this will have to mean that this missions/assignment will be all about increasing High availability and improve the functionality of the cluster interconnect between nodes and that all for one price: doubling your cluster interconnect ips on the various layers.
And to be honest this assignment is even more of interest because after all how often is there a need or a challenge to alter IP addresses once you have setup the cluster. Hmm only valid options I could think of would be action where there is a action like lifting and shifting the servers to other rooms or as in this case because we simply want to improve availability.
Some days before the action was scheduled, I looked into the matter of the required dedicated ips together with the colleagues from team. In the file in /etc/hosts being the source for that we looked for ips which had hb in the alias of the naming convention which is according to standards on OS level. This information was used to setup the ips on the OS level. Once that task was completed a first and very important test had to be: can you ping these new and dedicated ips from every node which is part of the cluster. And since this is a happy flow scenario that was of course what happened. During the maintenance window itself detailed steps as seen below have been performed to make these addresses known / usable on the cluster layer in the grid infrastructure.
Detailed Plan
In this scenario we are using a 4 node Rac cluster on RH with 11.2.0.4 Grid Infrastructure. When looking on the OS level this is what we found already present in the hosts file. And even better these ips where all available and not in use.
grep -i hb /etc/hosts • 10.124.10.75 mysrvrahr-hb1.dc.nl mysrvrahr-hb1 • 10.124.11.75 mysrvrahr-hb2.dc.nl mysrvrahr-hb2 • 10.124.10.76 mysrvrbhr-hb1.dc.nl mysrvrbhr-hb1 • 10.124.11.76 mysrvrbhr-hb2.dc.nl mysrvrbhr-hb2 • 10.124.10.77 mysrvrchr-hb1.dc.nl mysrvrchr-hb1 • 10.124.11.77 mysrvrchr-hb2.dc.nl mysrvrchr-hb2 • 10.124.10.78 mysrvrdhr-hb1.dc.nl mysrvrdhr-hb1 • 10.124.11.78 mysrvrdhr-hb2.dc.nl mysrvrdhr-hb2
The steps below have been followed based on a great Mos note in order to complete the tasks that are needed to make the Grid infrastructure (cluster) aware of the new ips. The scenario is running through a number of steps to be well prepared but also of course to be on the save side before and during the changes on the cluster layer. It is like hmm paying respect and being brave but cautious .
Preparation steps:
As of 11.2 Grid Infrastructure, the private network configuration is not only stored in OCR but also in the gpnp profile. Documentation was very clear on this: If the private network is not available or its definition is incorrect, the CRSD process will not start and any subsequent changes to the OCR will be impossible.
- Therefore care needs to be taken when making modifications to the configuration of the private network.
- It is important to perform the changes in the correct order.
Note that manual modification of gpnp profile is not supported so it is best to stick to proper actions and not go into hacking mode!
So let’s take a backup of profile.xml on all cluster nodes before proceeding:
As grid user ( in my case the oracle user) , move to the correct directory( cd $GRID_HOME/gpnp/<hostname>/profiles/peer/).
cd /app/oracle/product/11.2.0/grid/gpnp/mysrvrahr/profiles/peer cd /app/oracle/product/11.2.0/grid/gpnp/mysrvrbhr/profiles/peer cd /app/oracle/product/11.2.0/grid/gpnp/mysrvrchr/profiles/peer cd /app/oracle/product/11.2.0/grid/gpnp/mysrvrdhr/profiles/peer During startup of cluster Oracle is relying on this very important xml file for specific data like spfile / diskgroups and of course IPS. cp -p profile.xml profile.xml.bk
Ensure Oracle Clusterware is running on ALL cluster nodes in the cluster and save current status of resource. (Better save then sorry and to make sure you know about the health of the cluster and its resources as they have been defined. So do check the cluster and save the current status of resources in a file as a pre-change image.
/app/oracle/product/11.2.0/grid/bin/crsctl check cluster -all /app/oracle/product/11.2.0/grid/bin/crsctl status resource -t>/tmp/beforeNewIps.lst
As grid user( in my case the oracle user): Get the existing information.
showing which interfaces are defined in the cluster.
##below you will see that the current (single) cluster interconnect is set up at 64.18.112.208
/app/oracle/product/11.2.0/grid/bin/oifcfg getif
bond1 64.18.112.208 global cluster_interconnect
bond0 195.233.190.64 global public
The command iflist will show you the network information known on the OS. Showing defined all ( or specific ) ips. Check the interfaces / subnet address can be identified by command for eth specifically:
/app/oracle/product/11.2.0/grid/bin/oifcfg iflist|grep -i eth|sort 64.18.32.0 eth0 10.124.10.0 eth2 10.124.11.0 eth6
or
## check interfaces / subnets in general: /app/oracle/product/11.2.0/grid/bin/oifcfg iflist|sort
Since we now have a good picture of the status of the cluster and since we know more about the ips being used (oifcfg getif) and about the ips being present on the system (oifcfg iflist) all things set to Add the new cluster_interconnect information. As you can see definition of both eth2 Address and eth6. And with the -global parameter the information is shared in the complete cluster on all nodes
/app/oracle/product/11.2.0/grid/bin/oifcfg setif -global eth2/10.124.10.0:cluster_interconnect /app/oracle/product/11.2.0/grid/bin/oifcfg setif -global eth6/10.124.11.0:cluster_interconnect
Of course there cannot be a change without verifying it. So i checked on all nodes with below command.
/app/oracle/product/11.2.0/grid/bin/oifcfg getif
Since we are using 11GR2 Grid Infrastructure below steps are to be followed now: Shutdown Oracle Cluster ware on all nodes and disable the Oracle Cluster ware as root
Action is to be performed as the root user: sudo su - ./app/oracle/product/11.2.0/grid/bin/crsctl stop crs ./app/oracle/product/11.2.0/grid/bin/crsctl disable crs
In this specific scenario my Linux brothers in arms had already made the network configuration change at OS level as was required and that great job was seen in the oifcfg iflist command. They made sure that the new interfaces were available on all nodes after their change.
(check to ping the interfaces on all nodes with script kindly provided by Linux team member). for x in 10 11;do for xx in 75 76 77 78;do ping -c2 10.124.${x}.${xx}|egrep 'icmp_seq|transmitted';done;echo;done for x in a b c d; do for xx in 1 2;do ping -c2 mysrvr${x}hr-hb$xx|egrep 'icmp_seq|transmitted';done;echo;done
Well all went well and has been checked so it is time to restart Oracle Cluster ware and once completed enable Oracle Cluster ware again.
On all nodes in the cluster:
## as root user: sudo su - /app/oracle/product/11.2.0/grid/bin/crsctl start crs
Seeing = believing in this matter so after some time Check:
/app/oracle/product/11.2.0/grid/bin/crsctl check cluster -all
In the step below we are checking the status of the resources in the cluster again and adding that information to a file. This “post” operation file is then being used to compare the status of the cluster resources before and after.
/app/oracle/product/11.2.0/grid/bin/crsctl status resource -t>/tmp/afterNewIps.lst sdiff /tmp/afterNewIps.lst /tmp/beforeNewIps.lst
This compare showed me that a 10G RAC database resource and its services needed my intention, so via the cluster commands i checked and observed their status after starting them with srvctl command as the oracle user. Once completed I ran another check as described and ah happy me all resource in the post status file were in a similar status ( online online) as in the pre status file.
as root user: sudo su - /app/oracle/product/11.2.0/grid/bin/crsctl enable crs
Time to wrap up this scenario. As part of housekeeping remove the old interface:
/app/oracle/product/11.2.0/grid/bin/oifcfg delif -global bond1/64.18.112.208:cluster_interconnect
Verified the environment one more time.
/app/oracle/product/11.2.0/grid/bin/oifcfg getif
Cluster ware proved already of course but checked the databases and listeners as a last sanity check and that completed the tasks for this time.
Time to inform Apps team that they can perform their sanity checks and start the applications again.
Happy reading and till next Time.
Mathijs
Upgrade to 12C GridInfra lessons learned
Introduction:
Not sure if it was word from a wise Dba or just from a fortune cookie (might even have been from a Pink Panther movie). It said always expect the unexpected and as an add-on success just loves preparation.
This week one of my tasks was to upgrade a 4 node Oracle Rac cluster from 11.2.0.4 to 12c (12.1.0.2.0) grid infrastructure. And even though I came well prepared (see also detailed other blog for that ( https://mathijsbruggink.com/2017/05/01/upgrading-11g-gridinfra-to-12c-in-linux/) several small surprises occurred which will be used as a lesson learned in upcoming upgrades of the grid infra structure. Also I would like to offer some timeline as with regard to how long the upgrade process really took.
Lessons learned:
- During the preparations needed to order extra disks for ASM storage for the Grid Infrastructure management repository (GIMR). When i started the runInstaller as a first check if all was well prepared noticed that the installer software is indeed most likely looking for a diskgroup called +OCR or +VOTING. This could be a trap if you had not extended one of them ( but instead a +GRID diskgroup ). So when preparing look for either OCR or VOTING ( best both if present) to add extra disks ( and have some disks at spare).
- During the start of maintenance window the Linux colleague mentioned that he would have to stop the Hyperion services. This activity took some 45 minutes of the change window. Will have to find out if this was a justified claim to stop those services and will need to add an extra step to the pre-checks to find out about other services – daemons that are running on the cluster that might be impacted when doing an upgrade.
- Purpose of rootupgrade.sh after the installation part via the runinstaller completes and the upgrade part via the runinstaller commences. Rootupgrade.sh will perform the actual ASM upgrade, will configure the OLR (local registry) amongst other things.
Timelines:
Every change on a test or production environment will have to come with a plan with regard to an estimated time needed how long the change will take. First and most important of course choose the strategy, will a rolling window be used (thus minimizing impact since at least one node will be up ( thinking about a kind of batch where first batch will hold first node , second batch holding node 2 and 3 in my 4 node example, and a last batch holding the last node)).
Start of Change Window : 20:00 CET ( 6:00 UTC ) .
According to Linux expert Hyperion services needed to be stopped before we could continue.
Start of installation: 20:45 CET.
Started the runInstaller on the first node. Software was deployed to first node and all the nodes in the cluster (4 Node Rac).
Upgrade part of the existing 11.2.0.4 GridInfra structure:
21:30 – 21:58 on the first Node (MYSRVR09hr) the rootupgrade.sh was started. (used the manual upgrade ( still a bit hmm unwilling to leave it all to the automated option), this means set up a root session on first node and run: ./app/grid/product/`1102/grid/rootupgrade.sh).
In the runInstaller it was offered to automate and to run the rootupgrade.sh in parallel on Node number 2 and 3. So in separate windows but to me it felt better to open a terminal session as root in parallel to run the script on each server.
22:06 – 22:13 on MYSRVR10hr : ./app/grid/product/`1102/grid/rootupgrade.sh
22:06 – 22:24 on MYSRVR11hr : ./app/grid/product/`1102/grid/rootupgrade.sh
On the last node MYSRVR12hr:
22:28 – 22:48 ./app/grid/product/`1102/grid/rootupgrade.sh
After that install continued with the Grid Infrastructure management repository (GIMR) database and once completed i ran a number sanity checks in the cluster:
22:50 23:55
At 23:59:59 Reported mission completed.
Happy reading and till next time,
Mathijs
Upgrading 11G GridInfra to 12C in Linux
Introduction:
With spring 2017 around new initiatives are developed. As a preparation to start doing Database upgrades to 12C it will be a mandatory step to upgrade the Cluster-ware ( Grid-Infrastructure) first before doing the database part. So in this case very happy me that finally the time has come that one of the customers requests to upgrade a number of Clusters to 12C Grid-infrastructure. In this document will share thoughts , and my plan to tackle this interesting puzzle. Since the first Cluster upgrade will happen pretty soon (this week) the document might evolve with the lessons learned of that first upgrade. Happy reading in advance.
Preparations:
It could be some text of a fortune cookie but every success just loves preparation so in this case that will not be any different. First thing to do was to identify a scope of clusters that had to be upgraded. Together with customer an inventory list had to be created and in the end 10 Clusters have been defined as part of scope for this action. 8 Test clusters and 2 production environments . Interesting detail will be that all Clusters have been patched pretty recently all holding 11.2.0.4 Grid infrastructure with some extra challenge that the below Operating system will come in two flavors (being Red Hat Linux server release 5.11 (Tikanga) and 6.5 (Santiago). Curious in advance already to see if these different versions of Red Hat will have an influence of the steps to be performed. In the details below you will find more details on detailed preparations and actions of the upgrade.
Operating System:
One of the first steps to investigate is of course to find out if the Operating versions at hand are supported ones for the Upgrade. Oracle support confirmed that even though it would be recommended to upgrade the 5.11 Red Hat version first to Red Hat 7, it should work with the 5.11 version at hand. The 6.5 Os version was okay anyhow. The project decided however that an OS upgrade of the 5.11 boxes would delay things so upgrading the OS will be done in a different project.
Storage:
Before even considering to run the upgrade of the grid-infrastructure some extra time needs to be spend to investigate the storage in place in the Cluster for such upgrade. Often the Oracle software is first set up locally on each box on Volume group VG0 but with the out-of-place-installation these days that might become a challenge if there is not enough local storage present anymore in the box. Due to standards those root disk become nearly untouchable. For my project this storage requirement has been defined as an absolute minimum which means there will be a need for extra local storage per node or even for San storage per node which will be presented as required mount points to me. If such storage would not (or no longer be present locally) I have to request and received additional storage for it.
/app/grid | /app/oracle | /var/opt/oracle | /tmp | San 4 lvm dbs |
50GB | 70GB | 32M | 1GB |
Short explain for this:
/app/grid : 12C Grid-Infra software will be installed. /app/oracle: For the 12C Database software. /var/opt/oracle and /tmp: required minimum space. San 4 lvm dbs: will be setup for 4GB mountpoints, for each Instance on the local node in order to hold logfiles.
When migrating to 12C and coming from 11G please be informed that you might need extra storage in your OCR – VOTING disk group due to a new feature as well. This new repository database will have to be implemented during the upgrade. This Grid Infrastructure management repository (GIMR) database has become mandatory in Oracle GI 12.1.0.2. Data files associated with it will be created in same diskgroup as OCR or voting. (Average growth per day per node = app 750 MB so a 4 node cluster would lead at default retention of 3 days to app 9 GB storage requirement in OCR or VOTING diskgroup). A fortunate Note is that retention can be changed. Well in my case this means that more ASM disks will need to be added to the specific disk group. At work most OCR and VOTING diskgroups are set up as bare minimum ( in normal redundancy with three disks each like 4 GB each). ( extra info on this topic: https://blogs.oracle.com/UPGRADE/entry/grid_infrastructure_management_repository_gimr)
Detailed preparations and health checks.
One of the quotes in IT sometimes is that you should not touch a well running system. Well in this case I would like to add but if you do, come well prepared. In this case i have put the focus on the three below tools to prove that the current system is in a good shape to run the upgrade which is also to be regarded as a health check of the environment. These preps are based on the Mos note (1579762.1) from from reading Chapter 13 in the great book “Expert Oracle Rac 12C” by Syed Jaffar Hussain, Tariq Farooq,Riyaj Shamsudeen and Kai Yu. ( ISBN-13 (electronic): 978-1-4302-5045-6).
- Opatch
- RACcheck: Orachk
- Runcluvfy
Opatch
Using opatch in order to make sure that the Orainventory is in good shape on all nodes in the cluster. Command issued is investiging the current gridinfrastructure:
opatch lsinventory -oh /opt/crs/product/11204/crs -detail
-oh means for the specific ORACLE_HOME.
-detail shows all details.
RACcheck: Orachk
I have looked on Metalink and Downloaded and installed this tool on the cluster (nodes).
orachk Version |
12.2.0.1.2_20161215 |
Following Quick start guide for this tool:
http://docs.oracle.com/cd/E68491_01/OEXUG/quick-start-guide.htm#OEXUG-GUID-CB4224DA-F389-4E9C-AB6A-C57F46A80C61
Clear information to be found In mos :
ORAchk Upgrade Readiness Assessment (Doc ID 1457357.1)
With the tool downloaded below steps have been performed:
According to documentation the tool needs to be copied, unpacked (and installed) in suptools subdirectory of the cluster software installation.
scp orachk.zip oracle@mysrvr23hr:/opt/crs/product/11204/crs/suptools scp orachk.zip oracle@mysrvr24hr:/opt/crs/product/11204/crs/suptools
Once unzipped the tool can run in two modes, a pre upgrade mode and a post upgrade mode:
./orachk u -o pre |tee Orachk_pre_20170124.log ./orachk u -o post |tee Orachk_post_20170124.log Note: the tee command will also create a log file holding all the steps – progress information during run time. Note: /opt/oracle/.orachk should be empty before stat otherwise:‘Another instance of orachk is running on:: # message.
Runcluvfy
Working with runcluvfy is like meeting an old friend again. Yet each time it is a bit of struggle to find optimal syntax – parameters to be used for your set up.
#Wrong setup was ./runcluvfy.sh stage -pre crsinst -upgrade -n mysrvr23hr,mysrvr24hr -rolling -fixup -src_crshome /opt/crs/product/11204/crs -dest_home /app/grid/product/12102/grid -dest_version 12.1.0 -verbose
## working version ./runcluvfy.sh stage -pre crsinst -n mysrvr23hr,mysrvr24hr -verbose|tee runcluvfy_20170130_pre.lst Or ./runcluvfy.sh stage -pre crsinst -upgrade -rolling -src_crshome /opt/crs/product/11204/crs -dest_crshome /app/grid/product/12102/grid -dest_version 12.1.0.2.0 -verbose|tee runcluvfy_20170130_preUpgrade.lst
Upgrade steps:
Now it will become to plan and set up your upgrade steps after the confidence build on the preparation. In the upgrade multiple approaches will be possible. But my goal in this is plain and simple, minimum Impact on Cluster and on the databases hosted on that cluster so I will be aiming for this Scenario: rolling upgrade ASM + Clusterware. A baseline for such will be the below URL:
https://docs.oracle.com/database/121/CWLIN/procstop.htm#CWLIN10001
Working according to company standards will require to use following specific settings for an $ORACLE_BASE, $ORACLE_HOME for the GI installation and a different $ORACLE_HOME for the database software.
oracle@mysrvrhr:/home/oracle [CRS]# echo $ORACLE_BASE /app/oracle oracle@mysrvrhr:/home/oracle [CRS]# echo $ORACLE_HOME /app/grid/product/12102/grid oracle@mysrvrhr:/home/oracle [MYDB1]# echo $ORACLE_HOME /app/oracle/product/12102/db
Below in the bullets will go through the steps and comment where needed.
- Due to Grid Infrastructure management repository (GIMR) database I had to add larger disks to VOTING diskgroup to have enough storage in place (the steps on how to add the new disks and drop the old ones are too detailed for this blog (after all it is a blog and not a book 🙂 so I will have to blog about that in a separate blog).
- Check /tmp because upgrade requires at least 1GB present in /tmp. Either clean up or have /tmp extended. (use ls -lSh command).
- check ocr integrity by :
cluvfy comp ocr -n all -verbose
- Check backup of ocr and voting disk in the cluster:
ocrconfig -showbackup
Note: this command can be performed as ORACLE user and will shows info similar to the information below. Interesting aspect here was that I issued the command on the first node ( but the automated back-ups are all on node 11hr).
oracle@mysrvr09hr:/opt/oracle [CRS]# ocrconfig -showbackup mysrvr11hr 2017/04/21 05:20:36 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup00.ocr mysrvr11hr 2017/04/21 01:20:29 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup01.ocr mysrvr11hr 2017/04/20 21:20:07 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup02.ocr mysrvr11hr 2017/04/20 01:19:42 /opt/crs/product/11204/crs/cdata/mysrvr03cl/day.ocr mysrvr11hr 2017/04/12 17:16:11 /opt/crs/product/11204/crs/cdata/mysrvr03cl/week.ocr PROT-25: Manual backups for the Oracle Cluster Registry are not available
- As the root user Run a Manual Backup of the OCR information. Run the ocrconfig -manualbackup command on a node where the Oracle Cluster-ware stack is up and running to force Oracle Cluster-ware to perform a backup of OCR at any time, rather than wait for the automatic backup. Note: The -manualbackup option is especially useful when you want to obtain a binary backup on demand, such as before you make changes to OCR. The OLR only supports manual backups. NOTE: In 11gR2, the voting files are backed up automatically as part of OCR. Oracle recommends NOT used dd command to backup or restore as this can lead to loss of the voting disk.
mysrvr09hr:root:/root # cd /opt/crs/product/11204/crs/bin/ mysrvr09hr:root:/opt/crs/product/11204/crs/bin # ./ocrconfig -manualbackup mysrvr11hr 2017/04/21 09:12:40 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup_20170421_091240.ocr ## Checking a second time will now also show a manual backup 2 b in place: mysrvr09hr:root:/opt/crs/product/11204/crs/bin # ./ocrconfig -showbackup mysrvr11hr 2017/04/21 05:20:36 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup00.ocr mysrvr11hr 2017/04/21 01:20:29 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup01.ocr mysrvr11hr 2017/04/20 21:20:07 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup02.ocr mysrvr11hr 2017/04/20 01:19:42 /opt/crs/product/11204/crs/cdata/mysrvr03cl/day.ocr mysrvr11hr 2017/04/12 17:16:11 /opt/crs/product/11204/crs/cdata/mysrvr03cl/week.ocr mysrvr11hr 2017/04/21 09:12:40 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup_20170421_091240.ocr Last line is now showing the manual backup (since it is showing the format (backup_yyyymmdd_hhmmss.ocr)
- Check Location of OCR and Voting Disk (need to be in a diskgroup )
##How: cat /etc/oracle/ocr.loc
## Shows output similiar to this ## (if ocr is already mirrored in other Diskgroup with normal Redundancy) #Device/file getting replaced by device +OCR ocrconfig_loc=+VOTE ocrmirrorconfig_loc=+OCR
##How: crsctl query css votedisk ## Will show 3 voting disks in Disk group Vote due to Normal redundancy (and 3 Disk) ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 36b26f862b9a4f54bfba3096e3d50afa (/dev/mapper/asm-vote01) [VOTE] 2. ONLINE 9d45d791c1124febbf0a093d5a185c13 (/dev/mapper/asm-vote02) [VOTE] 3. ONLINE 1b7e510a302e4f03bfdea942d55d7067 (/dev/mapper/asm-vote03) [VOTE] Located 3 voting disk(s).
## check in ASM: select a.name dg_name, a.GROUP_NUMBER dg_number, a.state dg_state, b.DISK_NUMBER d_number, b.name d_name, b.mount_status d_mount_status, b.header_status d_header_status, b.mode_status d_mode_status, b.state d_state, b.FAILGROUP d_failgroup, b.path d_path from v$asm_diskgroup a, v$asm_disk b where a.GROUP_NUMBER(+) = b.GROUP_NUMBER order by 2,4;
- Unset environment Variables:
unset ORACLE_BASE unset ORACLE_HOME unset GI_HOME unset ORA_CRS_HOME unset TNS_ADMIN unset ORACLE_SID unset ORA_NLS10
- Check active crs version and software version:
## using the current CRS to document current active - and software version /opt/crs/product/11204/crs/bin/crsctl query crs activeversion /opt/crs/product/11204/crs/bin/crsctl query crs softwareversion
- Performing a Standard Upgrade from an Earlier Release
## Use the following procedure to upgrade the cluster from an earlier release: Start the installer, and select the option to upgrade an existing Oracle Clusterware and Oracle ASM installation. On the node selection page, select all nodes. Select installation options as prompted. Note: Oracle recommends that you configure root script automation, so that the sh script can be run automatically during the upgrade. Run root scripts, using either automatically or manually: Running root scripts automatically: TIP: If you have configured root script automation, then use the pause between batches to relocate services from the nodes running the previous release to the new release. Comment Mathijs: I have not decided yet on this automation step. In the documentation read as prep for the upgrade you see the option to create multiple batches: like batch 1 starting node, batch 2 all but last node, batch 3 last node. I will use both the automated way for one cluster and then use the below manual (old school method mentioned below) on another cluster. Running root scripts manually: If you have not configured root script automation, then when prompted, run the rootupgrade.sh script on each node in the cluster that you want to upgrade. If you run root scripts manually, then run the script on the local node first. The script shuts down the earlier release installation, replaces it with the new Oracle Clusterware release, and starts the new Oracle Clusterware installation. After the script completes successfully, you can run the script in parallel on all nodes except for one, which you select as the last node. When the script is run successfully on all the nodes except the last node, run the script on the last node. After running the sh script on the last node in the cluster, if you are upgrading from a release earlier than Oracle Grid Infrastructure 11g Release 2 (11.2.0.2), and left the check box labeled ASMCA checked, which is the default, then Oracle Automatic Storage Management Configuration Assistant ASMCA runs automatically, and the Oracle Grid Infrastructure upgrade is complete. If you unchecked the box during the interview stage of the upgrade, then ASMCA is not run automatically. If an earlier release of Oracle Automatic Storage Management (Oracle ASM) is installed, then the installer starts ASMCA to upgrade Oracle ASM to 12c Release 1 (12.1). You can choose to upgrade Oracle ASM at this time, or upgrade it later. Oracle recommends that you upgrade Oracle ASM at the same time that you upgrade Oracle Clusterware. Until Oracle ASM is upgraded, Oracle Databases that use Oracle ASM cannot be created and the Oracle ASM management tools in the Oracle Grid Infrastructure 12c Release 1 (12.1) home (for example, srvctl) do not work. Note: Because the Oracle Grid Infrastructure home is in a different location than the former Oracle Clusterware and Oracle ASM homes, update any scripts or applications that use utilities, libraries, or other files that reside in the Oracle Clusterware and Oracle ASM homes.
- Check active crs version and software version:
/opt/crs/product/11204/crs/bin/crsctl query crs activeversion /opt/crs/product/11204/crs/bin/crsctl query crs softwareversion
- Post upgrade checks:
ps -ef|grep d.bin should show daemons started from 12C.
Thoughts on Rollback:
Of course each migration will be as good as its preparation. But still your plan should at least hold the steps for a rollback in case you might not make it to a successful completed task. Below you will find the steps mentioned in general.
On all remote nodes, use the command syntax Grid_home/crs/install/rootcrs.sh -downgrade to stop the 12c Release 1 (12.1). |
On the local node use the command syntax Grid_home/crs/install/rootcrs.sh -downgrade -lastnode |
On any of the cluster member nodes where the rootupgrade.sh script has run successfully:
cd /u01/app/12.1.0/grid/oui/bin |
On any of the cluster member nodes where the rootupgrade script has run successfully: In Old ORACLE_HOME (the earlier Oracle Clusterware installation).$ cd /opt/crs/product/11204/crs/oui/bin/ $ ./runInstaller -nowait -waitforcompletion -ignoreSysPrereqs -updateNodeList -silent CRS=true ORACLE_HOME=/u01/app/crs Start the Oracle Clusterware stack manually: |
As always thank you for taking an interest in my blog. Happy reading and till the next time.
Mathijs
Transport Tablespace as Mig to 11.2.0.4 with rman.
Introduction:
For one of the projects the question came in to investigate and set up a 11.2.0.4 Real application cluster database with an extra challenge that the migration had to be done cross-platform from Oracle 10.2.0.3 on Solaris platform to 11.2.0.4.0 on Linux. From application provider came the suggestion to investigate a back-up – restore scenario with an upgrade on the new server ( Linux environment). Due to the fact that the Source environment was 10.20.3 on Solaris and due to fact we were heading towards a Rac cluster environment on on Linux that suggestion was the first that was send to the dustbin.
Normal export / import was the second scenario that was explored. Of course this is a valid scenario but given the fact that the database was more than 1.x TB not exactly the most favorite way to bring the data across. But whit scripting and using multiple par-files and or with moving partitioned data across in waves would be a fair plan-b.
From reading though Had put my mind to the use of transportable tablespaces as a way forward with this challenging question.
Requirements:
As preparation for the job requested to have Nas filesystem mounted between the source Server (MySunServer) holding the 10G database and the target Server (MyLinuxcluster). This Nas filesystem would hold the datapumps to be created, to hold the scripts and parfiles / config files as was suggested based on Mos Note ( 1389592.1 ). Nas system was / read-writable from both servers. The perl scripts that come with the note will support in the transport of the tablespaces but also help in the convert of big endian to little endian And as a bonus in my case will do the copy into ASM.
Due to the layout of the database in the source environment Rman was chosen as the best way forward with the scenario.
As a preparation an 110204 Rac database was set up on the target cluster. This database only to hold the normal tablespaces and a smal temporary tablespace for the users. ( In TTS solution the name of the data tablespaces that come across to the new environment may not exist in the new environment). All data- application users have been pre created on the new environment with a new – default user tablespace.
Details & Comments
Configuration file for the Perl scripts:
This is a file that is part of the unzipped file from the Mos note. It needs to be setup to match your specific needs. Will only show settings I have used and its comments:
xtt.properties:
## Reduce Transportable Tablespace Downtime using Incremental Backups
## (Doc ID 1389592.1)
## Properties file for xttdriver.pl
## See documentation below and My Oracle Support Note 1389592.1 for details.
## Tablespaces to transport
## Specify tablespace names in CAPITAL letters.
tablespaces=MYDB_DATA,MYDB_EUC_DATA,MYDB_EUC_INDEX,MYDB_INDEX,MYTS,USERS
##tablespaces=MYCOMPTTS
## Source database platform ID
## platformid
## Source database platform id, obtained from V$DATABASE.PLATFORM_ID
platformid=2
## srclink
## Database link in the destination database that refers to the source
## database. Datafiles will be transferred over this database link using
## dbms_file_transfer.
srclink=TTSLINK
## Location where datafile copies are created during the “-p prepare” step.
## This location must have sufficient free space to hold copies of all
## datafiles being transported.
dfcopydir=/mycomp_mig_db_2_linux/mybox/rman
## backupformat
## Location where incremental backups are created.
backupformat=/mycomp_mig_db_2_linux/mybox/rman
## Destination system file locations
## stageondest
## Location where datafile copies are placed by the user when they are
## transferred manually from the souce system. This location must have
## sufficient free space to hold copies of all datafiles being transported.
stageondest=/mycomp_mig_db_2_linux/mybox/rman
# storageondest
## This parameter is used only when Prepare phase method is RMAN backup.
## Location where the converted datafile copies will be written during the
## "-c conversion of datafiles" step. This is the final location of the
## datafiles where they will be used by the destination database.
storageondest=+MYDBP_DATA01/mydbp/datafile
## backupondest
## Location where converted incremental backups on the destination system
## will be written during the "-r roll forward datafiles" step.
## NOTE: If this is set to an ASM location then define properties
## asm_home and asm_sid below. If this is set to a file system
## location, then comment out asm_home and asm_sid below
backupondest=+MYDBP_DATA01/mydbp/datafile
## asm_home, asm_sid
## Grid home and SID for the ASM instance that runs on the destination
asm_home=/opt/crs/product/11204/crs
asm_sid=+ASM1
## Parallel parameters
parallel=8
## rollparallel
## Defines the level of parallelism for the -r roll forward operation.
## If undefined, default value is 0 (serial roll forward).
rollparallel=2
## getfileparallel
## Defines the level of parallelism for the -G operation
getfileparallel=4
## desttmpdir
## This should be defined to same directory as TMPDIR for getting the
## temporary files. The incremental backups will be copied to directory pointed
## by stageondest parameter.
desttmpdir=/mycomp_mig_db_2_linux/MYDBP/scripts
Below in a Table format you will see the steps performed with comments.
Steps do qualify for
- I for Initial steps – activities
- P for Preparation
- R for Roll Forward activities
- T for Transport activities
Server column shows where the action needs to be done.
Step | Server | What needs 2 b done |
I1.3 | Source | Identify the tablespace(s) in the source database that will be transported ( Application owner needs to support with schema owner information) : |
tablespaces=MYDB_DATA,MYDB_EUC_DATA,MYDB_EUC_INDEX,
MYDB_INDEX,MYTS,USERS |
||
I1.5 | Source + Target | In my case project offered an nfs filesystem which i could use : Nfs filesystem : /mycomp_mig_db_2_linux |
I1.6 | Source | Together with the Mos note cam this zip file : Unzip rman-xttconvert.zip. |
I1.7 | Source | Tailor the extracted file xtt.properties file on the source system to match your environment. |
I1.8 | Target | As the oracle software owner copy all xttconvert scripts and the modified xtt.properties file to the destination system. This was not needed since we used the nas filesystem. |
P1.9 | Source + Target | On both environments set up this:
export TMPDIR= /mycomp_mig_db_2_linux/MYDBP/scripts. |
P2B.1 | Source | perl xttdriver.pl -p |
Note. Do Not use ]$ $ORACLE_HOME/perl/bin/perl this did not work | ||
P2B.2 | Source | Copy files to destination. N/A since we use NFS |
P2B3 | Target | On the destination system, logged in as the oracle user with the environment (ORACLE_HOME and ORACLE_SID environment variables) pointing to the destination database, copy the rmanconvert.cmd file created in step 2B.1 from the source system and run the convert datafiles step as follows: |
[oracle@dest]$ scp oracle@source:/home/oracle/xtt/rmanconvert.cmd /home/oracle/xtt N/A since we use NFS. | ||
perl/bin/perl xttdriver.pl –c | ||
R3.1 | Source | On the source system, logged in as the oracle user with the environment (ORACLE_HOME and ORACLE_SID environment variables) pointing to the source database, run the create incremental step as follows: |
perl xttdriver.pl –I | ||
R3.3 | Target | [ |
Since we are using Nas shared filesystem no need to copy with scp between source and target. | ||
perl xttdriver.pl -r | ||
R3.4 | Source | On the source system, logged in as the oracle user with the environment (ORACLE_HOME and ORACLE_SID environment variables) pointing to the source database, run the determine new FROM_SCN step as follows: |
perl xttdriver.pl –s | ||
R3.5 | Source | 1. If you need to bring the files at the destination database closer in sync with the production system, then repeat the Roll Forward phase, starting with step 3.1. |
2. If the files at the destination database are as close as desired to the source database, then proceed to the Transport phase. | ||
T4.0 | Source | As found in note : Alter Tablespace Read Only Hanging When There Are Active TX In Any Tablespace (Doc ID 554832.1). A restart of the database is required to have no active transactions. Alternative during off hours . Actually during a first test with one dedicated tablespace with only one object it took more than 7 hrs. Oracle seems to look and wait for ALL active transactions, not only the ones that would impact the object in the test tablespace i worked with. |
T4.1 | Source | On the source system, logged in as the oracle user with the environment (ORACLE_HOME and ORACLE_SID environment variables) pointing to the source database, make the tablespaces being transported READ ONLY. |
alter tablespace MYDB_DATA read only; | ||
alter tablespace MYDB_EUC_DATA read only; | ||
alter tablespace MYDB_EUC_INDEX read only; | ||
alter tablespace MYDB_INDEX read only; | ||
alter tablespace MYTS read only; | ||
alter tablespace USERS read only; | ||
T4.2 | Source | Repeat steps 3.1 through 3.3 one last time to create, transfer, convert, and apply the final incremental backup to the destination datafiles. |
perl xttdriver.pl -i | ||
T4.2 | Target | [oracle@dest]$ scp oracle@source:/home/oracle/xtt/xttplan.txt /home/oracle/xtt |
[oracle@dest]$ scp oracle@source:/home/oracle/xtt/tsbkupmap.txt /home/oracle/xtt | ||
perl xttdriver.pl –r | ||
.. | ||
T4.3 | Target | On the destination system, logged in as the oracle user with the environment (ORACLE_HOME and ORACLE_SID environment variables) pointing to the destination database, run the generate Data Pump TTS command step as follows: |
perl xttdriver.pl –e | ||
The generate Data Pump TTS command step creates a sample Data Pump network_link transportable import command in the file xttplugin.txt. It will hold list of all the TTS you have configured and all its transport_datafiles in details. | ||
Example of that generated file : cat xttplugin.txt | ||
impdp directory=MYDB_XTT_DIR logfile=tts_imp.log \ | ||
network_link=TTSLINK.PROD.NL transport_full_check=no \ | ||
transport_tablespaces=MYCOMPTTS ,A,B,C\ | ||
transport_datafiles=’+MYDBP_DATA01/mycomptts_152.xtf’ | ||
Note in our example once edited we chmodded xttplugin.txt with 744 and ran it as script . | ||
T4.3 | Source | After the object metadata being transported has been extracted from the source database, the tablespaces in the source database may be made READ WRITE again, if desired. |
T4.4 | Target | At this step, the transported data is READ ONLY in the destination database. Perform application specific validation to verify the transported data. |
Also, run RMAN to check for physical and logical block corruption by running VALIDATE TABLESPACE as follows: | ||
In rman: | ||
validate tablespace MYDB_DATA, MYDB_EUC_DATA, MYDB_EUC_INDEX, MYDB_INDEX, MYTS, USERS check logical; | ||
T4.5 | Target | alter tablespace MYDB_DATA read write; |
alter tablespace MYDB_EUC_DATA read write; | ||
alter tablespace MYDB_EUC_INDEX read write; | ||
alter tablespace MYDB_INDEX read write, | ||
alter tablespace MYTS read write; | ||
alter tablespace USERS read write; | ||
T5 | Source + Target | Cleanup of NFS filesystem. |
Put Source Db in restricted mode as a fallback after the go live for couple of days then put it to tape and decommission it; |
Adding Vip Address and Goldengate to Grid Infra structure 11.2.0.3
Introduction:
Earlier this week preparations have been started to add the Goldengate software to the Grid infrastructure of 11.2.0.0.3 on the Billing environment on production. As part of that scenario I also had to add a Vip address that is to be used by the Goldengate Software as part of high(er) availability. In my concept Goldengate Daemons are running on on Node only by default. During a Node crash ( of course not wanted nor desired ) or as a way to load balance work on the cluster the Vip address and the Goldengate software need to stop and restart on the other Node. Below you will find a working example as part of the preparations I have performed. Some comment has been added to the specific steps.
Commands will be typed in italic in this blog.
Details
## First step will be to be adding the vip address 10.97.242.40 oradb-gg-localp1.dc-lasvegas.us to the Grid Infra (GI). Note IP address and the description have been defined in the DNS. Once I got feedback that the address was added I was able to perform a nslookup. Of course it was not possible yet to ping the ip because we first have to add it to the cluster as is done here.
## As root:
/opt/crs/product/11203/crs/bin/appvipcfg create -network=1 -ip=10.97.242.40 -vipname=oradb-gg-localp1.dc-lasvegas.us -user=root
## Once that is in place , grant permissions to Oracle user to work with the vip address:
(As root, allow the Oracle Grid infrastructure software owner (e.g. Oracle) to run the script to start the VIP.)
/opt/crs/product/11203/crs/bin/crsctl setperm resource oradb-gg-localp1.dc-lasvegas.us -u user:oracle:r-x
## Now it is time to start the Vip:
## As Oracle, start the VIP:
/opt/crs/product/11203/crs/bin/crsctl start resource oradb-gg-localp1.dc-lasvegas.us
##Check our activities:
## As Oracle:
/opt/crs/product/11203/crs/bin/crsctl status resource oradb-gg-localp1.dc-lasvegas.us -p
## In my setup Goldengate is defined to be able to run on either node one (usapb1hr) or on node 2 (usapb2hr) in my four node cluster. And Since i want to make sure it only runs on those two servers I add placement to restricted.
## As root:
/opt/crs/product/11203/crs/bin/crsctl modify resource oradb-gg-localp1.dc-lasvegas.us -attr “HOSTING_MEMBERS=usapb1hr usapb2hr”
/opt/crs/product/11203/crs/bin/crsctl modify resource oradb-gg-localp1.dc-lasvegas.us -attr “PLACEMENT=restricted”
## As always the taste of the creme brulee is in the details so let’ s check :
## As Oracle:
/opt/crs/product/11203/crs/bin/crsctl status resource oradb-gg-localp1.dc-lasvegas.us -p
## Great that worked , now lets relocate the Vip to the other node as a test:
## As Oracle:
/opt/crs/product/11203/crs/bin/crsctl relocate resource oradb-gg-localp1.dc-lasvegas.us
## completed action with a smile Because it worked as planned.
## As always the taste of the creme brulee is in the details so let’ s check :
## As Oracle:
/opt/crs/product/11203/crs/bin/crsctl status resource oradb-gg-localp1.dc-lasvegas.us -p
## As part of making sure that setup from scratch was same on all machines ( had the same solution in Pre Prod env. ) let us first remove the existing resource for Goldengate and then add it to the GI again.
/opt/crs/product/11203/crs/bin/crsctl delete resource myGoldengate
## as Oracle ( white paper was very specific about that , performed it as root first time ending up with wrong primary group in the ACL which i checked in the end) . So stick to plan ! And do this als ORACLE. Add the resource to the GI and put in a relationship to the Vip address that has been created in the GI earlier, AND inform the cluster about the action script that is to be used during a relocate – server boot – node crash . ( This script is in my case a shell script holding conditions like stop, start , status etc and the correspondig commands in the Goldengate that are to be used by the GI:
/opt/crs/product/11203/crs/bin/crsctl add resource myGoldengate -type cluster_resource -attr “ACTION_SCRIPT=/opt/crs/product/11203/crs/crs/public/gg.local.active, CHECK_INTERVAL=30, START_DEPENDENCIES=’hard(oradb-gg-localp1.dc-lasvegas.us) pullup(oradb-gg-localp1.dc-lasvegas.us)’, STOP_DEPENDENCIES=’hard(oradb-gg-localp1.dc-lasvegas.us)‘”
## Altering hosting members and placement again ( by default only one node part of hosting_members and placement=balanced by default).
## As root:
/opt/crs/product/11203/crs/bin/crsctl modify resource myGoldengate -attr “HOSTING_MEMBERS=usapb1hr usapb2hr”
/opt/crs/product/11203/crs/bin/crsctl modify resource myGoldengate -attr “PLACEMENT=restricted”
## so in the end you should check it with this:
/opt/crs/product/11203/crs/crs/public [CRS]# crsctl status resource myGoldengate -p
## Time to set set permission to myGoldengate (altering Ownership to myGoldengate user ( which is my OS user for this).
### As root:
/opt/crs/product/11203/crs/bin/crsctl setperm resource myGoldengate -o myGoldengate
## needed and sometimes forgotten to make sure that the oracle user ( who is also the owner of the Grid infra software on these boxes).
###As root, allow oracle to run the script to start the goldengate_app application.
/opt/crs/product/11203/crs/bin/crsctl setperm resource myGoldengate -u user:oracle:r-x
Wrap-up:
All preparations are now in place. During an already scheduled maintenance window following steps will be performed to bring this scenario to a HA solution for Goldengate.
- Stop the Goldengate software daemons ( at moment stopped and started by hand) .
- Start the Goldengate resource via the Grid Infra ( giving her control of status and activities) .
- Perform checks that Goldengate is starting its activities .
- Perform a relocate of the Goldengate resource via the Grid Infra to the other node.
- Perform checks that Goldengate is starting its activities .
As an old quote states . Success just loves preparation. With these preparations in place I feel confident for the maintenance window to put this Solution live .
As Always , happy reading and till next Time ,
Mathijs
When PRCD-1027 and PRCD-1229 is spoiling your rainy day
Introduction:
More then one year ago i had set up an Oracle restart environment with Grid Infra, ASM and Databases all in 11.2.0.2.0 since that was a requirement from vendor at first. Once the server had been handed over to production I got the request that it should also host EMC based Clones and those clones where 11.2.0.3.0. That meant i had to upgrade both Grid infrastructure and the database software and of course the databases as well.
So i geared up , did an upgrade of the GI and the Rdbms software and of course of the local databases in place. After that the Emc clones had been added and every thing looked fine .
Until ……….
Error Messages after Server reboot:
Well until the server got rebooted. After that server reboot a first sign that things where wrong was that the databases , did not start via the grid infra structure which was not expected !
So there I was again ready for solving another puzzle and of course people waiting for the DBs to come online so they could work.
## First clue:
I checked the Resource ( the database ) in the cluster with: crsctl status resource …. –p
Much to my surprise that showed the wrong oracle home ( it was 11.2.0.2.0 the initial Oracle Home before upgrade). But I was so sure that I had upgraded the database.. What did i miss . Even more strange was that the Cluster agent kept altering my oratab for the specific database to have the old oracle home ( and it would almost stick out tongue at me telling #line has been added by agent ).
## Second clue
When i altered the oratab to show the correct oracle home i could start the database via sqlplus which was indeed my second clue .
After a big face-palm it became clear to me that the cluster was not having correct status in the cluster ware about that Oracle Home ..
## Will srvctl modify do the Job:
srvctl modify database -d mydb -o /opt/oracle/product/11203_ee_64/db
##output:
PRCD-1027 : Failed to retrieve database mydb
PRCD-1229 : An attempt to access configuration of database migoms was rejected because its version 11.2.0.2.0 differs from the program version 11.2.0.3.0. Instead run the program from /opt/oracle/product/11202_ee_64/db
Well that was not expected. Especially since that other clue was that the db can be started as 1120.3 db when oracle env is put properly in ORATAB.
##Solution:
First I tried :
srvctl modify database -d mydb -o /opt/oracle/product/11203_ee_64/db
but that is wrong as we already saw in this post.
Hmm then i thought of an expression in German. once you start doing it the right way things will start to work for you :
Plain and simple this is what I have to do making things right again:
srvctl upgrade database -d mydb -o /opt/oracle/product/11203_ee_64/db
After that I started mydb via cluster she is happy now.
## Bottom-line ( aka lesson learned ).
If you upgrade your databases in on an oracle Restart / Rac cluster environment make it part of your upgrade plan to upgrade the information in the clusterlayer of that specific database.
As always,
Happy Reading and till we meet again.
Mathijs