Altering the Hearbeat in Oracle Rac environment.


When being asked for a part of job description of a dba  I would like to take a moment, smile and reply:  to serve and to protect the cluster the databases and the data of course. In the matter at hand this will have to mean that this missions/assignment will be all about increasing High availability and improve the functionality of the cluster interconnect between nodes and that all for one price: doubling  your cluster interconnect ips on the various layers.

And to be honest this assignment is even more of interest because after all how often is there a need or a challenge to alter IP addresses once you have setup the cluster. Hmm only valid options  I could think of would be action where there is a action like lifting and shifting the servers to other rooms  or as in this case because we simply want to improve availability.

Some days before the action was scheduled, I looked into the matter of the required dedicated ips together with the colleagues from team. In the file in /etc/hosts being the source for that we looked for ips which had  hb in the alias of the naming convention which is according to standards on OS level. This information  was used to setup the ips on the OS level. Once that task was completed a first and very important test had to be: can you ping these new and dedicated ips from every node which is part of the cluster. And since this is a happy flow scenario that was of course what happened. During the maintenance window itself detailed steps as seen below have been performed to make these addresses known / usable on the cluster layer in the grid infrastructure.

Detailed Plan

In this scenario we are using a 4 node Rac cluster on RH with Grid Infrastructure. When looking on the OS level this is what we found already present in the hosts file. And even better these ips where all available and not in use.

grep -i hb /etc/hosts
 • mysrvrahr-hb1
 • mysrvrahr-hb2
 • mysrvrbhr-hb1
 • mysrvrbhr-hb2
 • mysrvrchr-hb1
 • mysrvrchr-hb2
 • mysrvrdhr-hb1
 • mysrvrdhr-hb2

The steps below have been followed based on a great Mos note in order to complete the tasks that are needed to make the Grid infrastructure (cluster) aware of the new ips. The scenario is running through a number of steps to be well prepared but also of course to be on the save side before and during the changes on the cluster layer.  It is like hmm paying respect and being brave but cautious .

Preparation steps:
As of 11.2 Grid Infrastructure, the private network configuration is not only stored in OCR but also in the gpnp profile. Documentation was very clear on this: If the private network is not available or its definition is incorrect, the CRSD process will not start and any subsequent changes to the OCR will be impossible.

  • Therefore care needs to be taken when making modifications to the configuration of the private network.
  • It is important to perform the changes in the correct order.

Note that manual modification of gpnp profile is not supported so it is best to stick to proper actions and not go into hacking mode!

So let’s take a backup of profile.xml on all cluster nodes before proceeding:

As grid user ( in my case the oracle user) , move to the correct directory( cd $GRID_HOME/gpnp/<hostname>/profiles/peer/).

cd /app/oracle/product/11.2.0/grid/gpnp/mysrvrahr/profiles/peer 
cd /app/oracle/product/11.2.0/grid/gpnp/mysrvrbhr/profiles/peer 
cd /app/oracle/product/11.2.0/grid/gpnp/mysrvrchr/profiles/peer 
cd /app/oracle/product/11.2.0/grid/gpnp/mysrvrdhr/profiles/peer 

During startup of cluster Oracle is relying on this very important xml file
for specific data like spfile / diskgroups and of course IPS. 

cp -p profile.xml profile.xml.bk

Ensure Oracle Clusterware is running on ALL cluster nodes in the cluster and save current status of resource. (Better save then sorry and to make sure you know about the health of  the cluster and its resources as they have been defined. So do check the cluster and save the current status of resources in a file as a pre-change image.

/app/oracle/product/11.2.0/grid/bin/crsctl check cluster -all 
/app/oracle/product/11.2.0/grid/bin/crsctl status resource -t>/tmp/beforeNewIps.lst

As grid user( in my case the oracle user): Get the existing information.

showing which interfaces are defined in the cluster.
##below you will see that the current (single) cluster interconnect is set up at

/app/oracle/product/11.2.0/grid/bin/oifcfg getif
bond1  global  cluster_interconnect
bond0  global  public

The command iflist will show you the network information known on the OS. Showing defined all ( or specific ) ips. Check the interfaces / subnet address can be identified by command for eth specifically:

/app/oracle/product/11.2.0/grid/bin/oifcfg iflist|grep -i eth|sort eth0 eth2 eth6


## check  interfaces / subnets in general:
 /app/oracle/product/11.2.0/grid/bin/oifcfg iflist|sort

Since we now have a good picture of the status of the cluster and since we know more about the ips being used (oifcfg getif) and about the ips being present on the system (oifcfg iflist) all things set to Add the new cluster_interconnect information. As you can see definition of both eth2 Address and eth6. And with the -global parameter the information is shared in the complete cluster on all nodes

/app/oracle/product/11.2.0/grid/bin/oifcfg setif -global eth2/ 
/app/oracle/product/11.2.0/grid/bin/oifcfg setif -global eth6/

Of course there cannot be a change without verifying it. So i checked on all nodes with below command.

/app/oracle/product/11.2.0/grid/bin/oifcfg getif  

Since we are using 11GR2 Grid Infrastructure below steps are to be followed now: Shutdown Oracle Cluster ware on all nodes and disable the Oracle Cluster ware as root

Action is to be performed as the root user: 
sudo su -  
./app/oracle/product/11.2.0/grid/bin/crsctl stop crs 
./app/oracle/product/11.2.0/grid/bin/crsctl disable crs

In this specific scenario my Linux brothers in arms had already made the network configuration change at OS level as was required and that great job was seen in the oifcfg iflist command. They made sure that the new interfaces were available on all nodes after their change.

(check to ping the interfaces on all nodes with script kindly provided by Linux team member). 
for x in 10 11;do for xx in 75 76 77 78;do ping -c2 10.124.${x}.${xx}|egrep 'icmp_seq|transmitted';done;echo;done 
for x in a b c d; do for xx in 1 2;do ping -c2 mysrvr${x}hr-hb$xx|egrep 'icmp_seq|transmitted';done;echo;done 

Well all went well and has been checked so it is time to restart Oracle Cluster ware and once completed enable Oracle Cluster ware again.

On all nodes in the cluster:

## as root user: 
sudo su -  
/app/oracle/product/11.2.0/grid/bin/crsctl start crs

Seeing  = believing in this matter so after some time Check:

/app/oracle/product/11.2.0/grid/bin/crsctl check cluster -all 

In the step below we are checking the status of the resources in the cluster again and adding that information to  a file. This “post” operation file is then being used to compare the status of the cluster resources before and after.

/app/oracle/product/11.2.0/grid/bin/crsctl status resource -t>/tmp/afterNewIps.lst 
sdiff /tmp/afterNewIps.lst /tmp/beforeNewIps.lst

This compare showed me that a 10G RAC database resource and its services needed my intention, so via the cluster commands i checked and observed their status after starting them with srvctl command as the oracle user.  Once completed I ran another check as described and ah happy me all resource in the post status file were in a similar status ( online online) as in the pre status file.

as root user: 
sudo su -  
/app/oracle/product/11.2.0/grid/bin/crsctl enable crs

Time to wrap up this scenario. As part of housekeeping remove the old interface:

/app/oracle/product/11.2.0/grid/bin/oifcfg delif -global bond1/

Verified the environment one more time.

/app/oracle/product/11.2.0/grid/bin/oifcfg getif

Cluster ware proved already of course but checked the databases and listeners as a last sanity check and that completed the tasks for this time.

Time to inform Apps team that they can perform their sanity checks and start the applications again.

Happy reading and till next Time.


Upgrade to 12C GridInfra lessons learned


Not sure if it was word from a wise Dba or just from a fortune cookie (might even have been from a Pink Panther movie). It said always expect the unexpected and as an add-on success just loves preparation.

This week one of my tasks was to upgrade a 4 node Oracle Rac cluster from to 12c ( grid infrastructure. And even though I came well prepared (see also detailed other blog for that ( several small surprises occurred which will be used as a lesson learned in upcoming upgrades of  the grid infra structure. Also I would like to offer some timeline as with regard to how long the upgrade process really took.

Lessons learned:

  • During the preparations needed to order extra disks for ASM storage for the Grid Infrastructure management repository (GIMR). When i started the runInstaller as a first check if all was well prepared  noticed that the installer software is indeed most likely looking for a diskgroup called +OCR or +VOTING. This could be a trap if you had not extended one of them ( but instead a +GRID diskgroup ). So when preparing look for either OCR or VOTING ( best both if present) to add extra disks ( and have some disks at spare).
  • During the start of maintenance window the Linux colleague mentioned that he would have to stop the Hyperion services. This activity took some 45 minutes of the change window. Will have to find out if this was a justified claim to stop those services and will need to add an extra step to the pre-checks to find out about other services – daemons that are running on the cluster that might be impacted when doing an upgrade.
  • Purpose of after the installation part via the runinstaller completes and the upgrade part via the runinstaller commences. will perform the actual ASM upgrade, will configure the OLR (local registry) amongst other things.


Every change on a test  or production environment will have to come with a plan with regard to an estimated time needed how long the change will take. First and most important of course choose the strategy, will  a rolling window be used (thus minimizing impact since at least one node will be up ( thinking about a kind of batch where first batch will hold first node , second batch holding node 2 and 3 in my  4 node example, and a last batch holding the last node)).

Start of Change Window : 20:00 CET ( 6:00 UTC ) .
According to Linux expert Hyperion services needed to be stopped before we could continue.

Start of installation: 20:45 CET.
Started the runInstaller on the first node. Software was deployed to first node and all the nodes in the cluster (4 Node Rac).

Upgrade part of the existing GridInfra structure:
21:30 – 21:58 on the first Node (MYSRVR09hr) the was started. (used the manual upgrade ( still a bit hmm unwilling to leave it all to the automated option), this means set up a root session on first node and run: ./app/grid/product/`1102/grid/

In the runInstaller it was offered to automate and to run the in parallel on Node number 2 and 3. So in separate windows but to me it felt better to open a terminal session as root in parallel  to run the script on each server.

22:06 – 22:13 on MYSRVR10hr : ./app/grid/product/`1102/grid/
22:06 – 22:24 on MYSRVR11hr : ./app/grid/product/`1102/grid/

On the last node MYSRVR12hr:
22:28 – 22:48 ./app/grid/product/`1102/grid/

After that install continued with the Grid Infrastructure management repository (GIMR) database  and  once completed  i ran a number sanity checks in the cluster:

22:50 23:55

At 23:59:59 Reported mission completed.

Happy reading and till next time,




Upgrading 11G GridInfra to 12C in Linux


With spring 2017 around new initiatives are developed. As a preparation to start doing Database upgrades to 12C  it will be a mandatory step to upgrade the Cluster-ware ( Grid-Infrastructure) first before doing the database part. So in  this case very happy me that finally the time has come that one of the customers requests to upgrade a number of Clusters to 12C Grid-infrastructure.  In  this document will share thoughts , and my plan to tackle this interesting puzzle. Since the first Cluster upgrade will happen pretty soon (this week) the document might evolve with the lessons learned of that first upgrade. Happy reading in advance.


It could be some text of a fortune cookie but every success just loves preparation so in this case that will not be any different. First thing to do was to identify a scope of clusters that had to be upgraded. Together with customer an inventory  list had to be created and in the end 10 Clusters have been defined as part of scope for this action. 8 Test clusters and 2 production environments . Interesting detail will be that all Clusters have been patched pretty recently all holding Grid infrastructure with some extra challenge that the below Operating system will come in two flavors (being Red Hat Linux server release 5.11 (Tikanga) and 6.5 (Santiago). Curious in advance already to see if these different versions of Red Hat will have an influence of the steps to be performed. In the details below you will find more details on detailed preparations and actions of the upgrade.

Operating System:

One of the first steps to investigate is of course to find out if the Operating versions at hand are supported ones for the Upgrade. Oracle support confirmed that even though it would be recommended to upgrade the 5.11 Red Hat version first to Red Hat 7, it should work with the 5.11 version at hand. The 6.5 Os version was okay anyhow. The project decided however that an OS upgrade of the 5.11 boxes would delay things so upgrading the OS will be done in a different project.


Before even considering to run the upgrade of the grid-infrastructure some extra time needs to be spend to investigate the storage in place in the Cluster for such upgrade. Often the Oracle software is first set up locally on each box on Volume group VG0 but with the out-of-place-installation these days that might become a challenge if  there is not enough local storage present anymore in the box. Due to standards those root disk become nearly untouchable. For my project this storage requirement has been defined as an absolute minimum which means there will  be a need for extra local storage per node or even for San storage per node which  will be presented as required mount points to me. If such storage would not (or no longer be present locally)  I have to request and received additional storage for it.

/app/grid /app/oracle /var/opt/oracle /tmp San 4 lvm dbs
50GB 70GB 32M 1GB

Short explain for this:

/app/grid : 12C Grid-Infra software will be installed.
/app/oracle: For the 12C Database software.
/var/opt/oracle and /tmp: required minimum space.
San 4 lvm dbs:  will be setup for 4GB mountpoints, for each Instance on the local node in order to hold logfiles.

When migrating to 12C and coming  from 11G  please be informed that you might need extra storage in your OCR – VOTING disk group due to a new feature as well. This new repository database will have to be implemented during the upgrade. This Grid Infrastructure management repository (GIMR) database has become mandatory in Oracle GI Data files associated with it will be created in same diskgroup as OCR or voting.  (Average growth per day per node = app 750 MB so a 4 node cluster would lead at default retention of 3 days to app 9 GB storage requirement in OCR  or VOTING diskgroup).  A fortunate Note is that retention can be changed. Well in my case this means that more ASM disks will need to be added  to the specific disk group. At work most OCR and VOTING diskgroups are set up as bare minimum ( in normal redundancy with three disks each like 4 GB each). ( extra info on this topic:

Detailed preparations and health checks.

One of the quotes in IT sometimes is that you should not touch a well running system. Well in this case I would like to add but if you do, come well prepared.  In this case i have put the focus on the three below tools to prove that the current system is in a good shape to run the upgrade which is also to be regarded as a health check of the environment. These preps are based on the Mos note (1579762.1) from from reading Chapter 13 in the great book “Expert Oracle Rac 12C”  by Syed Jaffar Hussain, Tariq Farooq,Riyaj Shamsudeen and Kai Yu. ( ISBN-13 (electronic): 978-1-4302-5045-6).

  • Opatch
  • RACcheck: Orachk
  • Runcluvfy


Using opatch in order to make sure that the Orainventory is in good shape on all nodes in the cluster. Command issued is investiging the current gridinfrastructure:

opatch lsinventory -oh /opt/crs/product/11204/crs -detail

-oh means for the specific ORACLE_HOME.

-detail shows all details.

RACcheck: Orachk

I have looked on Metalink and Downloaded and installed this tool on the cluster (nodes).

orachk Version

Following Quick start guide for this tool:

Clear information to be found In mos :

ORAchk Upgrade Readiness Assessment (Doc ID 1457357.1)

With the  tool downloaded below steps have been performed:

According to documentation the tool needs to be copied, unpacked (and installed) in suptools subdirectory of the cluster software installation.

scp oracle@mysrvr23hr:/opt/crs/product/11204/crs/suptools
scp oracle@mysrvr24hr:/opt/crs/product/11204/crs/suptools

Once unzipped the tool can run in two modes, a pre upgrade mode and a post upgrade mode:

./orachk u -o pre |tee Orachk_pre_20170124.log
./orachk u -o post |tee Orachk_post_20170124.log

Note: the tee command will also create a log file holding all the steps – progress information during run time.
Note: /opt/oracle/.orachk should be empty before stat otherwise:‘Another instance of orachk is running on:: #  message.


Working with runcluvfy  is like meeting an old friend again. Yet each time it is a bit of struggle to find optimal syntax – parameters to be used for your set up.

#Wrong setup was
./ stage -pre crsinst -upgrade -n mysrvr23hr,mysrvr24hr -rolling -fixup -src_crshome /opt/crs/product/11204/crs -dest_home /app/grid/product/12102/grid -dest_version 12.1.0 -verbose
## working version
./ stage -pre crsinst -n mysrvr23hr,mysrvr24hr -verbose|tee runcluvfy_20170130_pre.lst
./ stage -pre crsinst -upgrade -rolling -src_crshome /opt/crs/product/11204/crs -dest_crshome /app/grid/product/12102/grid -dest_version -verbose|tee runcluvfy_20170130_preUpgrade.lst

Upgrade steps:

Now it will become to plan and set up your upgrade steps after the confidence build on the preparation. In the upgrade multiple approaches will be possible.  But my goal in this is plain and simple, minimum Impact on Cluster and on the databases hosted on that cluster so I will be aiming for this Scenario:  rolling upgrade ASM + Clusterware. A baseline for such will be the below URL:

Working according to company standards will require to use following specific settings for an $ORACLE_BASE, $ORACLE_HOME for the GI installation and a different $ORACLE_HOME for the database software.

oracle@mysrvrhr:/home/oracle [CRS]# echo $ORACLE_BASE
oracle@mysrvrhr:/home/oracle [CRS]# echo $ORACLE_HOME

oracle@mysrvrhr:/home/oracle [MYDB1]# echo $ORACLE_HOME

Below in the bullets will go through the steps and comment where needed.

  • Due to Grid Infrastructure management repository (GIMR) database I had to add larger disks to VOTING diskgroup to have enough storage in place (the steps on how to add the new disks and drop the old ones are too detailed for this blog (after all it is a blog and not a book 🙂 so I will have to blog about that in a separate blog).
  • Check /tmp because upgrade requires at least 1GB present in /tmp. Either clean up or have  /tmp extended. (use ls -lSh  command).
  •  check ocr integrity by :
cluvfy comp ocr -n all -verbose
  • Check backup of ocr and voting disk in the cluster:
    ocrconfig -showbackup

Note: this command can be performed as ORACLE user and will shows info similar to the information below.  Interesting aspect here was that I issued the command on the first node ( but the automated back-ups are all on  node 11hr).

oracle@mysrvr09hr:/opt/oracle [CRS]# ocrconfig -showbackup
mysrvr11hr 2017/04/21 05:20:36 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup00.ocr
mysrvr11hr 2017/04/21 01:20:29 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup01.ocr
mysrvr11hr 2017/04/20 21:20:07 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup02.ocr
mysrvr11hr 2017/04/20 01:19:42 /opt/crs/product/11204/crs/cdata/mysrvr03cl/day.ocr
mysrvr11hr 2017/04/12 17:16:11 /opt/crs/product/11204/crs/cdata/mysrvr03cl/week.ocr
PROT-25: Manual backups for the Oracle Cluster Registry are not available
  • As the root user Run a Manual Backup of the OCR information. Run the ocrconfig -manualbackup command on a node where the Oracle Cluster-ware stack is up and running to force Oracle Cluster-ware to perform a backup of OCR at any time, rather than wait for the automatic backup.  Note: The -manualbackup option is especially useful when you want to obtain a binary backup on demand, such as before you make changes to OCR. The OLR only supports manual backups. NOTE: In 11gR2, the voting files are backed up automatically as part of OCR. Oracle recommends NOT used dd command to backup or restore as this can lead to loss of the voting disk.
mysrvr09hr:root:/root # cd /opt/crs/product/11204/crs/bin/
mysrvr09hr:root:/opt/crs/product/11204/crs/bin # ./ocrconfig -manualbackup
mysrvr11hr 2017/04/21 09:12:40 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup_20170421_091240.ocr

## Checking a second time will now also show a manual backup 2 b in place:
mysrvr09hr:root:/opt/crs/product/11204/crs/bin # ./ocrconfig -showbackup
mysrvr11hr 2017/04/21 05:20:36 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup00.ocr
mysrvr11hr 2017/04/21 01:20:29 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup01.ocr
mysrvr11hr 2017/04/20 21:20:07 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup02.ocr
mysrvr11hr 2017/04/20 01:19:42 /opt/crs/product/11204/crs/cdata/mysrvr03cl/day.ocr
mysrvr11hr 2017/04/12 17:16:11 /opt/crs/product/11204/crs/cdata/mysrvr03cl/week.ocr
mysrvr11hr 2017/04/21 09:12:40 /opt/crs/product/11204/crs/cdata/mysrvr03cl/backup_20170421_091240.ocr

Last line is now showing the manual backup
(since it is showing the format (backup_yyyymmdd_hhmmss.ocr)
  • Check Location of OCR and Voting Disk (need to be in a diskgroup )
cat /etc/oracle/ocr.loc
## Shows output similiar to this
## (if ocr is already mirrored in other Diskgroup with normal Redundancy)
#Device/file getting replaced by device +OCR



crsctl query css votedisk

## Will show 3 voting disks in Disk group Vote due to Normal redundancy (and 3 Disk)
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
 1. ONLINE 36b26f862b9a4f54bfba3096e3d50afa (/dev/mapper/asm-vote01) [VOTE]
 2. ONLINE 9d45d791c1124febbf0a093d5a185c13 (/dev/mapper/asm-vote02) [VOTE]
 3. ONLINE 1b7e510a302e4f03bfdea942d55d7067 (/dev/mapper/asm-vote03) [VOTE]
Located 3 voting disk(s).
## check in ASM:
select dg_name,
a.GROUP_NUMBER dg_number,
a.state dg_state,
b.DISK_NUMBER d_number, d_name,
b.state d_state,
b.path d_path
v$asm_diskgroup a,
v$asm_disk b
order by 2,4;
  • Unset environment Variables:
unset GI_HOME 
unset ORA_NLS10
  • Check active crs version and software version:
## using the current CRS to document current active - and software version
/opt/crs/product/11204/crs/bin/crsctl query crs activeversion
/opt/crs/product/11204/crs/bin/crsctl query crs softwareversion
  • Performing a Standard Upgrade from an Earlier Release
## Use the following procedure to upgrade the cluster from an earlier release:
Start the installer, and select the option to upgrade an existing Oracle Clusterware and Oracle ASM installation.
On the node selection page, select all nodes.
Select installation options as prompted. 
Note: Oracle recommends that you configure root script automation,
so that the sh script can be run automatically during the upgrade.
Run root scripts, using either automatically or manually:

Running root scripts automatically:
TIP: If you have configured root script automation, 
then use the pause between batches to relocate services from the nodes running the previous release to the new release.
Comment Mathijs: I have not decided yet on this automation step. 
In the documentation read as prep for the upgrade you see the option to create multiple batches:
like batch 1 starting node, 
batch 2 all but last node,
batch 3 last node. 
I will use both the automated way for one cluster and then use the below manual (old school method mentioned below) on another cluster.

Running root scripts manually:
If you have not configured root script automation, then when prompted, 
run the script on each node in the cluster that you want to upgrade.

If you run root scripts manually, then run the script on the local node first. 
The script shuts down the earlier release installation, replaces it with the new Oracle Clusterware release, and starts the new Oracle Clusterware installation.
After the script completes successfully, you can run the script in parallel on all nodes except for one, which you select as the last node. 
When the script is run successfully on all the nodes except the last node, run the script on the last node.
After running the sh script on the last node in the cluster, if you are upgrading from a release earlier than Oracle Grid Infrastructure 11g Release 2 (, 
and left the check box labeled ASMCA checked, which is the default, then Oracle Automatic Storage Management Configuration Assistant ASMCA runs automatically, 
and the Oracle Grid Infrastructure upgrade is complete. 
If you unchecked the box during the interview stage of the upgrade, then ASMCA is not run automatically.

If an earlier release of Oracle Automatic Storage Management (Oracle ASM) is installed, then the installer starts ASMCA to upgrade Oracle ASM to 12c Release 1 (12.1). 
You can choose to upgrade Oracle ASM at this time, or upgrade it later.
Oracle recommends that you upgrade Oracle ASM at the same time that you upgrade Oracle Clusterware. 
Until Oracle ASM is upgraded, Oracle Databases that use Oracle ASM cannot be created and the Oracle ASM management tools in the Oracle Grid Infrastructure 12c Release 1 (12.1) home (for example, srvctl) do not work.

Because the Oracle Grid Infrastructure home is in a different location than the former Oracle Clusterware and Oracle ASM homes, 
update any scripts or applications that use utilities, libraries, or other files that reside in the Oracle Clusterware and Oracle ASM homes.
  • Check active crs version and software version:
/opt/crs/product/11204/crs/bin/crsctl query crs activeversion
/opt/crs/product/11204/crs/bin/crsctl query crs softwareversion
  • Post upgrade checks:
 ps -ef|grep d.bin should show daemons started from 12C.

Thoughts on Rollback:

Of course each migration will be as good as its preparation. But still your plan should at least hold the steps for a rollback in case you might not make it to a successful completed task. Below you will find the steps mentioned in general.

On all remote nodes, use the command syntax Grid_home/crs/install/ -downgrade to stop the 12c Release 1 (12.1).
On the local node use the command syntax Grid_home/crs/install/ -downgrade -lastnode
On any of the cluster member nodes where the script has run successfully:

cd /u01/app/12.1.0/grid/oui/bin
./runInstaller -nowait -waitforcompletion -ignoreSysPrereqs -updateNodeList
-silent CRS=false ORACLE_HOME=/u01/app/12.1.0/grid

On any of the cluster member nodes where the rootupgrade script has run successfully:
In Old ORACLE_HOME (the earlier Oracle Clusterware installation).$ cd /opt/crs/product/11204/crs/oui/bin/
$ ./runInstaller -nowait -waitforcompletion -ignoreSysPrereqs -updateNodeList -silent CRS=true ORACLE_HOME=/u01/app/crs

Start the Oracle Clusterware stack manually:
On each node, start Oracle Clusterware from the earlier release Oracle Clusterware home:
/opt/crs/product/11204/crs/bin/crsctl start crs

As always thank you for taking an interest in my blog. Happy reading and till the next time.




Transport Tablespace as Mig to with rman.


For one of the projects the question came in to investigate and set up a Real application cluster database with an extra challenge that the migration had to be done cross-platform from Oracle on Solaris platform to on Linux. From application provider came the suggestion to investigate a back-up – restore scenario with an upgrade on the new server ( Linux environment). Due to the fact that the Source environment was 10.20.3 on Solaris and due to fact we were heading towards a Rac cluster environment on on Linux that suggestion was  the first that was send to the dustbin.

Normal export  / import was the second scenario that was explored. Of course this is a valid scenario  but given  the fact that the database was more than 1.x TB not exactly the most favorite way to bring  the data across. But whit scripting and using multiple par-files  and or with moving  partitioned data across in waves would be a  fair plan-b.

From reading though Had put my mind to the use of  transportable tablespaces as a way forward with this challenging question.


As preparation for the job requested to have Nas filesystem mounted between the source Server (MySunServer) holding the 10G database and the target Server (MyLinuxcluster). This Nas filesystem  would hold  the datapumps to be created, to hold the scripts and parfiles  / config files as was suggested  based on Mos Note ( 1389592.1 ). Nas system was / read-writable from both  servers. The  perl scripts that come with the note will support in the transport of the tablespaces but also help in  the convert of big endian to little endian And as a bonus in my case will do the copy into ASM.

Due to the layout of the database in the source environment  Rman was chosen as the best way forward with the scenario.

As a preparation an 110204 Rac database was set up on the target cluster. This  database only to hold the normal tablespaces and a smal temporary tablespace for the users. ( In TTS solution the name of the data tablespaces that come across to the new environment may not exist in the new environment). All data- application users have been pre created on the new environment with a  new – default user tablespace.

Details & Comments

Configuration file for the Perl scripts:

This is a file  that is part of the unzipped file from the Mos note. It needs to be setup to match your specific needs.  Will only show settings  I have used and  its comments:
## Reduce Transportable Tablespace Downtime using Incremental Backups
## (Doc ID 1389592.1)

## Properties file for

## See documentation below and My Oracle Support Note 1389592.1 for details.
## Tablespaces to transport

## Specify tablespace names in CAPITAL letters.

## Source database platform ID

## platformid

## Source database platform id, obtained from V$DATABASE.PLATFORM_ID


## srclink

## Database link in the destination database that refers to the source

## database. Datafiles will be transferred over this database link using
## dbms_file_transfer.

## Location where datafile copies are created during the “-p prepare” step.

## This location must have sufficient free space to hold copies of all
## datafiles being transported.


## backupformat

## Location where incremental backups are created.


## Destination system file locations

## stageondest

## Location where datafile copies are placed by the user when they are

## transferred manually from the souce system. This location must have
## sufficient free space to hold copies of all datafiles being transported.


# storageondest

## This parameter is used only when Prepare phase method is RMAN backup.

## Location where the converted datafile copies will be written during the

## "-c conversion of datafiles" step. This is the final location of the
## datafiles where they will be used by the destination database.
## backupondest

## Location where converted incremental backups on the destination system

## will be written during the "-r roll forward datafiles" step.

## NOTE: If this is set to an ASM location then define properties

##      asm_home and asm_sid below. If this is set to a file system
##       location, then comment out asm_home and asm_sid below

## asm_home, asm_sid

## Grid home and SID for the ASM instance that runs on the destination


## Parallel parameters


## rollparallel

## Defines the level of parallelism for the -r roll forward operation.

## If undefined, default value is 0 (serial roll forward).

## getfileparallel

## Defines the level of parallelism for the -G operation


## desttmpdir

## This should be defined to same directory as TMPDIR for getting the

## temporary files. The incremental backups will be copied to directory pointed
## by stageondest parameter.


Below in a Table format you will see the steps performed with comments.

Steps do qualify for

  • I for Initial steps – activities
  • P for Preparation
  • R for Roll Forward activities
  • T for Transport activities

Server column shows where the action needs to be done.

Step Server What needs 2 b done
I1.3 Source Identify the tablespace(s) in the source database that will be transported ( Application owner needs to support with schema owner information) :


I1.5 Source + Target In my case project offered an nfs filesystem which i could use : Nfs filesystem : /mycomp_mig_db_2_linux
I1.6 Source Together with the Mos note cam  this zip file : Unzip
I1.7 Source Tailor the extracted file file on the source system to match your environment.
I1.8 Target As the oracle software owner copy all xttconvert scripts and the modified file to the destination system. This was not needed since we used the nas filesystem.
P1.9 Source + Target On both environments set up this:

export TMPDIR= /mycomp_mig_db_2_linux/MYDBP/scripts.

P2B.1 Source perl -p
Note. Do Not use ]$ $ORACLE_HOME/perl/bin/perl this did not work
P2B.2 Source Copy files to destination. N/A since we use NFS
P2B3 Target On the destination system, logged in as the oracle user with the environment (ORACLE_HOME and ORACLE_SID environment variables) pointing to the destination database, copy the rmanconvert.cmd file created in step 2B.1 from the source system and run the convert datafiles step as follows:
[oracle@dest]$ scp oracle@source:/home/oracle/xtt/rmanconvert.cmd /home/oracle/xtt N/A since we use NFS.
perl/bin/perl –c
R3.1 Source On the source system, logged in as the oracle user with the environment (ORACLE_HOME and ORACLE_SID environment variables) pointing to the source database, run the create incremental step as follows:
perl –I
R3.3 Target [oracle@dest]$ scp oracle@source:/home/oracle/xtt/xttplan.txt /home/oracle/xtt
[oracle@dest]$ scp oracle@source:/home/oracle/xtt/tsbkupmap.txt /home/oracle/xtt
 Since we are using Nas shared filesystem no need to copy with scp  between source and target.
perl -r
R3.4 Source On the source system, logged in as the oracle user with the environment (ORACLE_HOME and ORACLE_SID environment variables) pointing to the source database, run the determine new FROM_SCN step as follows:
perl –s
R3.5 Source 1.     If you need to bring the files at the destination database closer in sync with the production system, then repeat the Roll Forward phase, starting with step 3.1.
2.     If the files at the destination database are as close as desired to the source database, then proceed to the Transport phase.
T4.0 Source As found in note : Alter Tablespace Read Only Hanging When There Are Active TX In Any Tablespace (Doc ID 554832.1). A restart of the database is required to have no active transactions. Alternative during off hours . Actually during a first test with one dedicated tablespace with only one  object it took more than 7 hrs. Oracle seems to look and wait  for ALL active transactions, not only the ones that would impact   the object in the test tablespace i worked with.
T4.1 Source On the source system, logged in as the oracle user with the environment (ORACLE_HOME and ORACLE_SID environment variables) pointing to the source database, make the tablespaces being transported READ ONLY.
alter tablespace MYDB_DATA read only;
alter tablespace MYDB_EUC_DATA read only;
alter tablespace MYDB_EUC_INDEX read only;
alter tablespace MYDB_INDEX read only;
alter tablespace MYTS read only;
alter tablespace USERS read only;
T4.2 Source Repeat steps 3.1 through 3.3 one last time to create, transfer, convert, and apply the final incremental backup to the destination datafiles.
perl -i
T4.2 Target [oracle@dest]$ scp oracle@source:/home/oracle/xtt/xttplan.txt /home/oracle/xtt
[oracle@dest]$ scp oracle@source:/home/oracle/xtt/tsbkupmap.txt /home/oracle/xtt
perl –r
T4.3 Target On the destination system, logged in as the oracle user with the environment (ORACLE_HOME and ORACLE_SID environment variables) pointing to the destination database, run the generate Data Pump TTS command step as follows:
perl –e
The generate Data Pump TTS command step creates a sample Data Pump network_link transportable import command in the file xttplugin.txt. It will hold list of all the TTS you have configured and all its transport_datafiles in details.
Example of that generated file : cat xttplugin.txt
impdp directory=MYDB_XTT_DIR logfile=tts_imp.log \
network_link=TTSLINK.PROD.NL transport_full_check=no \
transport_tablespaces=MYCOMPTTS ,A,B,C\
Note in our example once edited we chmodded   xttplugin.txt with 744 and ran it as script .
T4.3 Source After the object metadata being transported has been extracted from the source database, the tablespaces in the source database may be made READ WRITE again, if desired.
T4.4 Target At this step, the transported data is READ ONLY in the destination database.  Perform application specific validation to verify the transported data.
Also, run RMAN to check for physical and logical block corruption by running VALIDATE TABLESPACE as follows:
In rman:
validate tablespace MYDB_DATA, MYDB_EUC_DATA, MYDB_EUC_INDEX, MYDB_INDEX, MYTS, USERS check logical;
T4.5 Target alter tablespace MYDB_DATA read write;
alter tablespace MYDB_EUC_DATA read write;
alter tablespace MYDB_EUC_INDEX read write;
alter tablespace MYDB_INDEX read write,
alter tablespace MYTS read write;
alter tablespace USERS read write;
T5 Source + Target Cleanup of NFS filesystem.
Put Source Db in restricted mode as a fallback after the go live for couple of days then put it to tape and decommission it;

Adding Vip Address and Goldengate to Grid Infra structure


Earlier this week preparations have been started to add  the Goldengate software to the Grid  infrastructure of on  the Billing environment on production. As part of that scenario   I also had to add a Vip address  that is to be used by the Goldengate  Software  as part of high(er) availability. In my concept  Goldengate Daemons are  running on on Node only  by default. During a Node crash ( of course not wanted  nor desired )  or as a way to load balance work on the cluster the Vip address and the Goldengate software need to stop and restart on the other Node. Below you will  find a working example as part of the preparations I have performed.  Some comment has been added to the specific steps.

Commands will be typed in italic in this blog.


## First step will be to be adding the vip address  to the Grid Infra (GI). Note IP address and the description have been defined in the DNS. Once I got feedback that the address was added I was able to perform a nslookup. Of course it was not possible yet to ping the ip  because we first have to add it to the cluster as is done here.

## As root:
/opt/crs/product/11203/crs/bin/appvipcfg create -network=1 -ip= -user=root

## Once that is in place , grant permissions to Oracle user to work with the vip address:
(As root, allow the Oracle Grid infrastructure software owner (e.g. Oracle) to run the script to start the VIP.)

/opt/crs/product/11203/crs/bin/crsctl setperm resource -u user:oracle:r-x

## Now it is time to start  the Vip:
## As Oracle, start the VIP:
/opt/crs/product/11203/crs/bin/crsctl start resource

##Check our activities:
## As Oracle:

/opt/crs/product/11203/crs/bin/crsctl status resource -p

## In my setup  Goldengate is defined to be able to run on either node one (usapb1hr)  or  on node 2 (usapb2hr) in my four node cluster. And  Since i want to make sure it only runs on those two servers I add placement to restricted.
## As root:

/opt/crs/product/11203/crs/bin/crsctl modify resource  -attr “HOSTING_MEMBERS=usapb1hr usapb2hr”
/opt/crs/product/11203/crs/bin/crsctl modify resource  -attr  “PLACEMENT=restricted”

## As always the taste of the creme brulee is in the details  so let’ s check :
## As Oracle:
/opt/crs/product/11203/crs/bin/crsctl status resource -p

##  Great that worked , now lets relocate the Vip to the other node as a test:
## As Oracle:

/opt/crs/product/11203/crs/bin/crsctl relocate resource

## completed  action with a smile Because it worked as planned.

## As always the taste of the creme brulee is in the details  so let’ s check :
## As Oracle:
/opt/crs/product/11203/crs/bin/crsctl status resource -p

## As part of making sure that setup from scratch was same on all machines ( had the same solution in Pre Prod env. ) let us first remove the existing resource  for Goldengate and then add it to the GI again.

/opt/crs/product/11203/crs/bin/crsctl delete resource myGoldengate

## as Oracle  ( white paper was very specific about that , performed it as root first time ending up with   wrong primary group in the ACL which i checked in the end) . So stick to plan ! And do this als ORACLE. Add the resource to the GI and  put in a relationship to the Vip address that has been created in the GI earlier, AND  inform the cluster about  the action script that is to be used during a relocate – server boot  – node crash . ( This script is in my case a shell script holding conditions like stop, start , status etc   and the correspondig commands in the Goldengate that are to be used by the GI:

/opt/crs/product/11203/crs/bin/crsctl add resource myGoldengate -type cluster_resource -attr “ACTION_SCRIPT=/opt/crs/product/11203/crs/crs/public/, CHECK_INTERVAL=30, START_DEPENDENCIES=’hard( pullup(’, STOP_DEPENDENCIES=’hard(‘”

## Altering  hosting members and placement again ( by default only one node part of hosting_members and placement=balanced by default).

## As root:
/opt/crs/product/11203/crs/bin/crsctl modify resource  myGoldengate -attr “HOSTING_MEMBERS=usapb1hr usapb2hr”

/opt/crs/product/11203/crs/bin/crsctl modify resource  myGoldengate -attr  “PLACEMENT=restricted”

## so in the end you should check it with this:

/opt/crs/product/11203/crs/crs/public [CRS]# crsctl status resource myGoldengate -p

## Time to set  set permission to myGoldengate (altering Ownership to myGoldengate user ( which is my OS user for this).
### As root:
/opt/crs/product/11203/crs/bin/crsctl setperm resource myGoldengate -o myGoldengate

## needed and sometimes forgotten to make sure that the oracle user ( who is also the owner of the Grid infra software on these boxes).
###As root, allow oracle to run the script to start the goldengate_app application.
/opt/crs/product/11203/crs/bin/crsctl setperm resource myGoldengate -u user:oracle:r-x



All preparations are now in place. During an already scheduled maintenance window  following steps will be performed to bring this scenario to a HA solution for Goldengate.

  • Stop the Goldengate software daemons ( at moment  stopped and started by hand) .
  • Start the  Goldengate resource  via the Grid Infra ( giving her control of status and activities) .
  • Perform checks that Goldengate is starting  its activities .
  • Perform a relocate  of the  Goldengate resource  via the Grid Infra to the other node.
  • Perform checks that Goldengate is starting  its activities .

As an old quote states . Success just loves preparation. With these preparations in place  I feel confident  for the maintenance window to put this Solution  live .


As Always ,  happy reading and till next Time ,



When PRCD-1027 and PRCD-1229 is spoiling your rainy day


More then one year ago i had set up an Oracle restart environment  with Grid Infra, ASM and Databases all in since that was a requirement from vendor at first. Once the server had been handed over to production  I got the request that it should also host EMC  based  Clones and those clones where That meant i had to upgrade both Grid infrastructure and the database software and of course the databases as well.

So i geared up , did an upgrade of the GI and the Rdbms software and of course of the local databases in place. After that  the Emc clones had been added and every thing  looked fine .

Until ……….

Error Messages after Server reboot:

Well until the server got rebooted. After that server reboot a first sign that things where wrong was that the databases , did not start via the grid infra structure which was not expected !

So there I was again ready for solving another puzzle and of course people waiting for the DBs to come online so they could work.

## First clue:

I checked the Resource ( the database ) in the cluster with:   crsctl status resource ….  –p

Much to my surprise that showed the wrong oracle home ( it was the initial Oracle Home before upgrade). But I was so sure that I had upgraded the database.. What did i miss . Even more strange was that the Cluster agent kept altering my oratab  for the specific database to have the old oracle home ( and  it would almost stick out tongue at me telling  #line has been added by agent ).

## Second clue

When i altered the oratab to show the correct oracle home i could start the database via sqlplus which was indeed my second clue .

After a big face-palm it became clear to me that the cluster was not having correct status in the cluster ware about that Oracle Home ..

## Will srvctl modify do the Job:

srvctl modify database -d mydb -o /opt/oracle/product/11203_ee_64/db


PRCD-1027 : Failed to retrieve database mydb

PRCD-1229 : An attempt to access configuration of database migoms was rejected because its version differs from the program version Instead run the program from /opt/oracle/product/11202_ee_64/db

Well that was not expected.  Especially since that other clue was that the db can be started as 1120.3 db when oracle env is put properly in ORATAB.


First I tried :

srvctl modify database -d mydb -o /opt/oracle/product/11203_ee_64/db

but that is wrong as we already saw in this post.

Hmm then i thought of an expression in German. once you start doing it the right way  things will start to work for you :

Plain and simple this is what  I have to do making things right again:

srvctl upgrade database -d  mydb -o /opt/oracle/product/11203_ee_64/db

After that I started mydb via cluster she is happy now.

## Bottom-line ( aka lesson learned ).

If you upgrade your databases in on an oracle Restart /  Rac cluster environment make it part of your upgrade plan to upgrade the information in the clusterlayer of that specific database.

As always,

Happy Reading and till we meet again.


When your Restart / Rac Node screams : [ohasd(1749)]CRS-0004:logging terminated


During  one of those days on a Monday where  a hot line ( ticket ) service was provided team received a ticket that all instances  of an Oracle Restart (  where down. I assisted in doing  troubleshooting  which is always fun because that gives a bit of a Sherlock Holmes idea investigating issues at hand .  After logging on to the box  I saw  that the ASM Instance was up  but a  “crsctl check  has”  showed that no communication with has daemon could be established.  So the puzzle became a bit more interesting at that moment .


  • Environment is an Oracle restart environment on Linux.
  • Grid Infra with Databases running on it
  • Grid infra has been installed in: /opt/crs/product/11203/crs/
  • Environment went down during the weekend ( fortunately it was a Test environment  with a standard office time SLA)

It was time to  take a peek in the alert-log of the cluster on that node.  Since my installation was done in: /opt/crs/product/11203/crs/  that meant I had to look for the log-file in the subdirectory:  /opt/crs/product/11203/crs/log/mysrvr01/   ( where in  the cluster mysrvr01 is the name of the node  i was working on ) to see the alert-file. In there I found this showing what was going on:

2014-10-18 10:47:37.060:
[ohasd(1749)]CRS-0004:logging terminated for the process. log file: “/opt/crs/product/11203/crs/log/mysrvr01/ohasd/ohasd.log”
2014-10-18 10:47:37.128:
[ohasd(1749)]CRS-10000:CLSU-00100: Operating System function: mkdir failed with error data: 28
CLSU-00101: Operating System error message: No space left on device
CLSU-00103: error location: authprep6
CLSU-00104: additional error information: failed to make dir /opt/crs/product/11203/crs/auth/ohasd/mysrvr01/A4107671
2014-10-18 10:47:37.146:
[ohasd(1749)]CRS-10000:CLSU-00100: Operating System function: mkdir failed with error data: 28
CLSU-00101: Operating System error message: No space left on device
CLSU-00103: error location: authprep6
CLSU-00104: additional error information: failed to make dir /opt/crs/product/11203/crs/auth/ohasd/mysrvr01/A5139099

With this as a clue dear Watson it was a lot easier to proceed.  Started hunting large files in the file system of the  /opt/crs/product/11203/crs/  and found a flood of XML files  that had not  been erased for ages there . Which made me realize that a clean up job in cron would become handy for this as well.

After  removing the xmls and checking the mount point again  I logged in as Root  and first issued a  crsctl start has.

After some time i noticed in the alert log of the node :
014-10-20 08:26:50.847:
[/opt/crs/product/11203/crs/bin/oraagent.bin(32461)]CRS-5818:Aborted command ‘check’ for resource ‘ora.DATA1.dg’. Details at (:CRSAGF00113:) {0:0:2} in /opt/crs/product/11203/crs/log/mysrvr01/agent/ohasd/oraagent_oracle/oraagent_oracle.log.
2014-10-20 08:26:50.847:
[/opt/crs/product/11203/crs/bin/oraagent.bin(32461)]CRS-5818:Aborted command ‘check’ for resource ‘ora.DUMMY_DATA.dg’. Details at (:CRSAGF00113:) {0:0:2} in /opt/crs/product/11203/crs/log/mysrvr01/agent/ohasd/oraagent_oracle/oraagent_oracle.log.
2014-10-20 08:26:50.847:
[/opt/crs/product/11203/crs/bin/oraagent.bin(32461)]CRS-5818:Aborted command ‘check’ for resource ‘ora.FRA1.dg’. Details at (:CRSAGF00113:) {0:0:2} in /opt/crs/product/11203/crs/log/mysrvr01/agent/ohasd/oraagent_oracle/oraagent_oracle.log.
2014-10-20 08:27:52.189:
[evmd(4105)]CRS-1401:EVMD started on node mysrvr01.
2014-10-20 08:27:53.685:
[cssd(4193)]CRS-1713:CSSD daemon is started in local-only mode
2014-10-20 08:27:55.581:
[ohasd(32421)]CRS-2767:Resource state recovery not attempted for ‘ora.diskmon’ as its target state is OFFLINE
2014-10-20 08:28:02.233:
[cssd(4193)]CRS-1601:CSSD Reconfiguration complete. Active nodes are mysrvr01 .

When I checked again 🙂 ah good news not only ASM instance running but also all Instances present on the box .  Another puzzle solved.

Happy reading and till next Time ,