Reconfigure Rac 12.2 Gridinfra network

Summary.

In 12.2 Grid infrastructure Oracle has altered the concept of ASM to flex-ASM as a default. This blog will take a focus on re-configuring the Oracle Rac 12.2 Grid infrastructure network component parts like the interconnect, the public or to change the interface to do-not-use, whenever that applies / is an improvement to the situation at hand. Read carefully in full before performing it on one of your clusters. Baseline for this action will be a document on Mos (How to Modify Private Network Information in Oracle Clusterware (Doc ID 283684.1))

Details:

As with any change, when going through input – processing – output it is important to have a clear picture of the situation as is. So a first and  very mandatory step should be to check with the oifcfg getif command how things are before starting the changes:

When entering the command information with regard to the known network interfaces in the Rac cluster similar to below should be showing:

oracle@mysrvr1dr:/app/oracle/stage/27468969 [+ASM1]# oifcfg getif

bond0  198.19.11.0  global  public
eth0  10.217.210.0  global  cluster_interconnect,asm
eth2  192.168.10.0  global  cluster_interconnect
eth7  192.168.11.0  global  cluster_interconnect

Here bond0 will be used used as public, eth0 at the moment is holding activities for the cluster interconnect and for asm, eth2 and eth7 are dedicated to  the interconnect. Eth0 is defined as admin lan for various activities. In this setup the cluster is unstable, nodes are being evicted. Time to perform steps to stabilize it.

From the Mos note, looking at Case IV. Changing private network interface name, subnet or netmask. For 12c Oracle Clusterware with Flex ASM.

Precaution, taking backup of profile.xml on each node.

Take a backup of profile.xml on all cluster nodes before proceeding, as grid user. In this specific case this is the user that has installed the Grid Infrastructure ( in this scenario that was the  oracle user):
Command:

$ cd $GRID_HOME/gpnp/<hostname>/profiles/peer/
 $ cp -p profile.xml profile.xml.bk
cd /app/grid/product/12201/grid/gpnp/mysrvr1dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr2dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr3dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr4dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr5dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr6dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr7dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr8dr/profiles/peer
cp -p profile.xml profile.xml.bk

Altering the interconnect:

One of the interconnects should be altered to make sure that the ASM listener is able to communicate using that interface to. In this scenario eth2 was used to do so. When doing this take note of the ip since it will be needed to configure a new ASM listener.

oifcfg setif -global eth2/192.168.10.0:cluster_interconnect,asm
oifcfg setif -global eth7/192.168.11.0:cluster_interconnect

Now eth2  shows that it setup for interconnect and asm (only one interconnect should be setup to combine cluster_interconnect+asm).

peer [+ASM1]# oifcfg getif

bond0  198.19.11.0  global  public
eth0  10.217.210.0  global  cluster_interconnect,asm
eth2  192.168.10.0  global  cluster_interconnect,asm
eth7  192.168.11.0  global  cluster_interconnect

With this information checked and in place it is time for setting up new listener for asm since the original ASM listener during the installation used eth0 and that eth0 will be dropped  – removed from cluster configuration in steps below:

Existing listener ASMNET1LSNR  will become new one ASMNET122LSNR.

srvctl add listener -asmlistener -l ASMNET122LSNR -subnet 192.168.10.0
(as mentioned this is the eth2 interface that we are going to use).

As always seeing is believing : use crsctl status resource -t to see details similar to below. The new ASM listener is created as a resource and it is in a status offline offline on all nodes in the cluster at this point and time :

--------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------
ora.ASMNET122LSNR_ASM.lsnr
               OFFLINE OFFLINE      mysrvr1dr                 STABLE
               OFFLINE OFFLINE      mysrvr2dr                 STABLE
               OFFLINE OFFLINE      mysrvr3dr                 STABLE
               OFFLINE OFFLINE      mysrvr4dr                 STABLE
               OFFLINE OFFLINE      mysrvr5dr                 STABLE
               OFFLINE OFFLINE      mysrvr6dr                 STABLE
               OFFLINE OFFLINE      mysrvr7dr                 STABLE
               OFFLINE OFFLINE      mysrvr8dr                 STABLE

In the next step we will remove the old ASM listener, and use a -f option to prevent errors – messages with regard to dependencies.

srvctl update listener -listener ASMNET1LSNR_ASM -asm -remove -force

I have checked again with crsctl status resource -t to make sure the old resource is gone now.

Removing the old ASM listener

In the Mos note there is a little inconsistency because it claims  that as a next step the old ASM listener should be stopped.  I was able to grep for the listener ( ps -ef|grep -i inherit)  and i saw it on OS level on the machine(S). But I am not able to stop that listener  since the cluster resource is already gone and lsnrctl did not work. Solution: What I noticed that when I skipped this step and stopped and started the cluster which is mandatory in this scenario, the listener was gone on all nodes.

Should have given this command, but that is NOT working:
lsnrctl stop ASMNET1LSNR_ASM

Check configuration before restarting GI:

First command:
srvctl config listener -asmlistener

Name: ASMNET122LSNR_ASM
Type: ASM Listener
Owner: oracle
Subnet: 192.168.10.0
Home: <CRS home>
End points: TCP:1527
Listener is enabled.
Listener is individually enabled on nodes:
Listener is individually disabled on nodes:

Second Command:
srvctl config asm

ASM home: <CRS home>
Password file: +VOTE/orapwASM
Backup of Password file:
ASM listener: LISTENER
ASM instance count: ALL
Cluster ASM listener: ASMNET122LSNR_ASM

Both results look great so time to move to the next step (restarting the Grid Infra structure on all nodes).

Restarting Grid infrastructure on all Nodes:

For this next step you have to become root (or sudo su – ) to do the next steps. First and importantly make sure that the Grid infra structure is not restarting automatically should a cluster node perform a reboot (disable crs) , then stop the Grid infrastructure software:

As root

/app/grid/product/12201/grid/bin/crsctl disable crs
/app/grid/product/12201/grid/bin/crsctl stop crs
To be done on: mysrvr[1-8]dr

Checking network configuration on all nodes.

mysrvr1dr:root:/root $ ifconfig -a

Starting cluster again:

As root

/app/grid/product/12201/grid/bin/crsctl enable crs
/app/grid/product/12201/grid/bin/crsctl start crs

To be done on: mysrvr[1-8]dr

Final checks:

oifcfg getif

bond0  198.19.11.0  global  public
eth0  10.217.210.0  global  cluster_interconnect,asm
eth2  192.168.10.0  global  cluster_interconnect,asm
eth7  192.168.11.0  global  cluster_interconnect

Time to delete eth0

Since eth0 is admin lan, and after our reconfigure steps, time  to get rid of the eth0 (remove it from the Grid infra structure).

oifcfg delif -global eth0/10.217.210.0 

And a last check again:

oifcfg getif

bond0  198.19.11.0  global  public
eth2  192.168.10.0  global  cluster_interconnect,asm
eth7  192.168.11.0  global  cluster_interconnect

Happy reading, and till we meet again,

Mathijs.

 

 

Some things happened (while installing 12.2 GI in a Rac cluster).

Summary:

One of the fine people in the Oracle community once shared  a thought where he told about repairing a bike once and repeating the plan of approach N times. As a lesson learned from  that scenario it is best to see each bike as a new challenge which deserves a new and fresh approach. In this blog I will describe a number of things i came across when setting up Grid Infra structure 12.2 with January 2018 PSU on a multi-node cluster.

Details – Things to look after:

  • Locating the log files of the installation can make a world of difference. Make sure you understand and find  the location of it and have it tailed during all of install.  In my case in this directory will find subdirs and log file for example: /app/oraInventory/logs/GridSetupActions2018-04-26_09-39-53AM.
  • In the past you always had  one destination to unzip your Software.zip and during installation the runInstaller would ask for an installation location  during setup. With 12.2 ( and in Oracle 18 Grid infra) that is no longer the case. Create the subdirectory where the software is supposed to be installed and unzip your files there as a first step.
  • runInstaller is no more … In order to start the installation process you will have to find this command:./gridSetup.sh
  • When Installing as in my case on Red Hat  Linux 7.4 with a Patched Kernel you might come across  ACFS-9154: Loading ‘oracleoks.ko’ driver.  >  modprobe: ERROR: could not insert ‘oracleoks’: Unknown symbol in module, or unknown parameter (see dmesg) >  ACFS-9109: oracleoks.ko driver failed to load. >  ACFS-9178: Return code = USM_FAIL >  ACFS-9177: Return from ‘ld usm drvs’ >  ACFS-9428: Failed to load ADVM/ACFS drivers. A system reboot is recommended. You can Solve that by running the gridSetup.sh with parameters which will install the Patch(es) first and then run the commands:./gridSetup.sh -applyPSU /app/grid/product/12201/grid/27100009.  Translated this means that the psu patch needs to be applied first and then the gridSetup can start its setup.
  • Images during setup have changed. In my case I have selected  this one. Which also brought me FLEX-ASM as per default in 12.2.

 

2018-05-06_074150

When installing GI as a standalone cluster in the follow-up screens you are asked to add the nodes of your cluster either as a Hub or as a Leave. Thus differentiating by default which nodes should have a dedicated ASM instance ( Hub) and which nodes will communicate remotely with one of the Hub-Asm instances. After install i learned that in 12.2 as a default 3 ASM instances will be created  no matter how many nodes  there are in your cluster.

  • Scan listener: Make sure as a preparation that the colleagues from Linux team have added the 3 Ips for your cluster in the Dns and try a nslookup first before installing. During installation when you have to  add the clustername ( here presented as mycluster) ,  the installing tool will also show the scan-name (and most likely you will have to alter it anyhow to meet with the information in  dns needed for the setup).( in dns mycluster-scan.prod.nl) needs to be present as 3 IP addresses

2018-05-06_075326

  • In the clusternode screen you will add all the nodes in your cluster. In this case I intended to set up each node as a hub ( thus expecting that there would be 8 asm instances in place too ( which was not the case but that is elaborated in other topic.
  • On this screen you add the nodes using the add button.
  • On this screen you can set up SSH connectivity between all the nodes. On the web it was not clear to me in various blogs and in the documentation neither, what is the preferred way to do this. I had the tool setup ssh connectivity between all nodes and i was happy with result.
  • Once completed press next and the tool will show something like “validating node readiness”.

2018-05-06_080405

  • In the specify Network Interface usage screen:
  • Best practice  / Lessons learned: Make sure you have consulted with  the Linux team about the interfaces. In my specific case > ETH0 is admin lan . You should put it to do not use. Eth2 and Eth7 are the private interconnects. Make sure that only one of the is have the option Private, ASM.  (In a flex asm cluster ASM needs a way to communicate via its dedicated listener. Since as per default you will have only ASM listener, make sure only one of the private interconnects is using this combo of private and ASM).

2018-05-06_082605

Note: This installation was implemented on New Hardware , coming from Dell. During the install we found out that in the original setup the DELL systems use a Range of Ips ( 169.*) which is also used by the HA-IPS of Oracle. And even when the colleagues of Linux might grumble, it is mandatory that the range of 169* is not in use!! In the first setup the Dell systems had  the 169* enabled for their idRac interface. This IPs have been disabled.

  • For the setup of two Diskgroups ( one for the OCR and voting disks and one for the GIMR ( grid infra structure Management Repository) make sure that the Linux Admins have delivered ASM disks). In my case i got 2 times 3 Disks , so i could setup Normal redundancy Diskgroups for both Diskgroups.
  • On the Summary screen , pay extra attention to make sure that all the cluster nodes that you intend to have included in your soon to be cluster are showing ! ( Hub nodes: this should show all the nodes. If this is not the case you can select Edit ( which will rerun all steps as of Cluster node information).

Well that is all for now .. To be continued in a galaxy near you …

As always happy reading and till we meet again.

Mathijs

 

 

Installing 12.2 GI on Red hat Linux 74

Introduction.

From one of the Oracle new features books ( I believe it was 11G ) it was stated that the only constant is change. And with recent experience when installing 12.2 Grid Infrastructure as a fine example I could not agree more. But as always if everything with regard to Oracle would be easy…

For a project new installations need to be done. In this specific case that means that I will work with Red Hat Linux 74 together with Oracle 12.2 both  Grid Infrastructure and 12.2 Oracle Rdbms including latest PSU (at the moment January 2018). As mentioned in my other post setup for Grid Infrastructure has changes (no more runInstaller there , and unzipping the Oracle provided Zip in the destination which will hold your software ( so no more staging area).

For one of the platforms it was mandatory to setup 12.2 GI ( and database ) on a single server which is also referred to by Oracle as Oracle restart.

Details.

Now this is where the challenging – even scary part comes in. Once  the installation was on its way , at app 80% of  the progress the famous second window pops up where you are asked to start a second session , become root and run the script root.sh.  Well this is what happened on the way to that  theater:

CRS-4664: Node mysrvr successfully pinned.

2018/01/30 09:19:28 CLSRSC-330: Adding Clusterware entries to file ‘oracle-ohasd.service’

2018/01/30 09:21:03 CLSRSC-400: A system reboot is required to continue installing.

The command ‘/app/grid/product/12.2.0.1/grid/perl/bin/perl -I/app/grid/product/12.2.0.1/grid/perl/lib -I/app/grid/product/12.2.0.1/grid/crs/install /app/grid/product/12.2.0.1/grid/crs/install/roothas.pl ‘ execution failed

I have looked at the web and it was almost a  relieve that more bloggers already came across this phenomena.   As always when reading the scenarios that others have followed to solve the issue , it will always be the question does this apply to your specific situation. It was interesting to identify that this issue is caused by acfs drivers when the kernel used in Red Hat 74 ( and also in 73  ) is higher then an expected (default) one thus causing  the root.sh to fail. And as a spoiler alert the suggested reboot , did not help in my specific case.  Consulted with The Oracle and this came back:

==============================
>  ACFS-9154: Loading ‘oracleoks.ko’ driver.
>  modprobe: ERROR: could not insert ‘oracleoks’: Unknown symbol in module, or unknown parameter (see dmesg)
>  ACFS-9109: oracleoks.ko driver failed to load.
>  ACFS-9178: Return code = USM_FAIL
>  ACFS-9177: Return from ‘ld usm drvs’
>  ACFS-9428: Failed to load ADVM/ACFS drivers. A system reboot is recommended.
==============================

Work Around – Solution:

Either apply a oneoff patch:

25078431 – is for 7.3.
26247490 – is for 7.4.

or

Even better but also a new approach: Apply  January  2018 PSU before installation.

Both Scenarios could be interesting but given the fact that i need the psu 2018 as a baseline anyhow this is how  the scenario worked:

1 Installation did not complete, so I did not perform a normal de-install.

oracle@mysrvr:/app/grid/product/12.2.0.1 []# cd grid
oracle@mysrvr:/app/grid/product/12.2.0.1/grid []# rm -rf *

2 Unzipped to /app/grid/product/12.2.0.1/grid.

unzip /app/grid/product/12.2.0.1/linuxx64_12201_grid_home.zip -d /app/grid/product/12.2.0.1/grid

3 Latest version of Opatch needed , make sure you download and have this in place.

unzip /app/grid/product/12.2.0.1/p6880880_122010_Linux-x86-64.zip -d /app/grid/product/12.2.0.1/grid

4 this is January 2018 PSU for GI p27100009_122010_Linux-x86-64.zip.

Unzip /app/grid/product/12.2.0.1/p27100009_122010_Linux-x86-64.zip -d /app/grid/product/12.2.0.1/patch

5 Check for correct Opatch.

oracle@mysrvr:/app/grid/product/12.2.0.1/grid []# cd OPatch
oracle@mysrvr:/app/grid/product/12.2.0.1/grid/OPatch []# opatch version
OPatch Version: 12.2.0.1.11
OPatch succeeded.

6 Time to run this command , it will first patch the software tree then start the setup:

cd ..
./gridSetup.sh -applyPSU /app/grid/product/12.2.0.1/patch/27100009
After some minutes this showed : Successfully applied the patch.
The log can be found at: /app/oraInventory/logs/GridSetupActions2018-01-30_02-34-09PM/installerPatchActions_2018-01-30_02-34-09PM.log
right before the gridSetup started.

Then the grid installer started ….

7 And in a popup screen was asked to have root the run :

time to run root.sh
mysrvr:root:/app/grid/product/12.2.0.1/grid $ ./root.sh
This time  a much more positive output came in:

Performing root user operation.
 
The following environment variables are set as:
 ORACLE_OWNER= oracle
 ORACLE_HOME= /app/grid/product/12.2.0.1/grid
 
Enter the full pathname of the local bin directory: [/usr/local/bin]:
 
The following environment variables are set as:
 ORACLE_OWNER= oracle
 ORACLE_HOME= /app/grid/product/12.2.0.1/grid
 
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.
 
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /app/grid/product/12.2.0.1/grid/crs/install/crsconfig_params
The log of current session can be found at:
 /app/oracle/crsdata/mysrvr/crsconfig/roothas_2018-01-30_03-03-44PM.log
 
LOCAL ADD MODE
Creating OCR keys for user 'oracle', privgrp 'dba'..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4664: Node mysrvr successfully pinned.
2018/01/30 15:04:19 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'mysrvr'
CRS-2673: Attempting to stop 'ora.evmd' on 'mysrvr'
CRS-2677: Stop of 'ora.evmd' on 'mysrvr' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'mysrvr' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
 
mysrvr 2018/01/30 15:06:13 /app/grid/product/12.2.0.1/grid/cdata/mysrvr/backup_20180130_150613.olr 2960767134
2018/01/30 15:06:14 CLSRSC-327: Successfully configured Oracle Restart for a standalone server

And once again a happy me.

Dedication: would like to dedicate this Post to the colleagues from Oracle Acs support for  their great and swift help. Always a pleasure to work together.

As always, happy reading and till we meet again,

Mathijs

 

Install & Patch 12.2 GI in Oracle Restart

Introduction:

Where would we be without challenges. I have become team member of a project team for  a new billing environment and this team is aiming to use ( and go live) with Oracle 12.2 Grind Infra structure and Database version). The information of  this article will become a baseline for the installation of several Oracle environments on Linux. Oracle is referring to this as Oracle Restart. Next in line after that (and I love it) will be Real application clusters to be set up.

General Preparations 12.2 Grid Kata:

## Identifying ORACLE_BASE and layout of Grid Infrastructure.

echo $ORACLE_BASE
/app/oracle

echo $ORACLE_HOME
/app/grid/product/12.2.0.1/grid

## Identifying ORACLE_BASE  and Db software
echo $ORACLE_BASE
/app/oracle

echo $ORACLE_HOME
/app/oracle/product/12.2.0.1/db

## So for the 12.2 layout which is in scope for the actions on a Restart or Rac environment:

+ASM1     /app/grid/product/12.2.0.1/grid          
CRS       /app/grid/product/12.2.0.1/grid           
-MGMTDB   /app/grid/product/12.2.0.1/grid          
MYDB     /app/oracle/product/12.2.0.1/db

## Checking Red Hat release:

oracle@mysrvr1hr:/dev/mapper []# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.9 (Santiago)

## Oracle restart installation for 12.2  instructions to be found:

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/ladbi/installing-oracle-grid-infrastructure-for-a-standalone-server-with-a-new-database-installation.html#GUID-0B1CEE8C-C893-46AA-8A6A-7B5FAAEC72B3

Interesting point is that in 12.2 the famous, well known runInstaller is replaced by ./gridSetup.sh (when opening runInstaller even ran into errors (oui-10133 wen running runInstaller in 12.2).  Second point of interest will be that you have to pre-create the directory where the software will be running. 

oui-10133

## Preparations  for Installation:

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/ladbi/installing-oracle-grid-infrastructure-for-a-standalone-server-with-a-new-database-installation.html#GUID-0B1CEE8C-C893-46AA-8A6A-7B5FAAEC72B3

  1. On the server where you will install the Grid infrastructure create the directory where you want to install the software (the location you will later on call Your ORACLE_HOME).  On the source server  and in my specific case that meant that: I had to do mkdir -p /app/grid/product/12.2.0.1/grid on the source server.
  2. From Solltau: oracle@myhost:/opt/oracle/Odrive/depot/software/oracle/12c/GI []# scp linuxx64_12201_grid_home.zip oracle@mysrvr1hr:/app/grid/product/12.2.0.1/grid
  3. UNSET your environment variables if any on the installation box:
unset ORACLE_BASE 
unset ORACLE_HOME 
unset GI_HOME 
unset ORA_CRS_HOME 
unset TNS_ADMIN
unset ORACLE_SID
unset ORA_NLS10

echo $ORACLE_BASE etc.

  1. ## Check zip file in the destination that will also become your ORACLE_HOME for this install.
oracle@mysrvr1hr:/app/grid/product/12.2.0.1/grid []# ls -ltr
total 2924504
-rw-r--r--. 1 oracle dba 2994687209 Jan 3 16:28 linuxx64_12201_grid_home.zip

Make sure you use the future ORACLE_HOME destination to unzip the file. This is mandatory because compared to previous Installations where you can alter the Software installed directory,  during the 12.2 installation there will NOT be an option to choose the destination where to setup this installation. ## make sure you are in the directory /app/grid/product/12.2.0.1/grid and extract the zip!!

  1. ##Once the Zip file is extracted fire below script:
./gridSetup.sh

In one of the following detailed screens make sure you change “change directory path” to discover the disks that you will be using for  this installation. In my case this meant that the Linux Admin colleague has set up – has labelled dedicated Luns (disks) as a preparation to my actions.

root # ls -lH /dev/mapper/ASM_*
brw-rw----. 1 oracle dba 253,  6 Dec 22 16:01 /dev/mapper/ASM_ACFS_035_001
brw-rw----. 1 oracle dba 253, 33 Dec 22 16:01 /dev/mapper/ASM_OCRM_008_001
brw-rw----. 1 oracle dba 253, 34 Dec 22 16:01 /dev/mapper/ASM_OCRM_008_002

brw-rw----. 1 oracle dba 253, 25 Dec 22 16:01 /dev/mapper/ASM_VOTE_008_001
brw-rw----. 1 oracle dba 253, 26 Dec 22 16:01 /dev/mapper/ASM_VOTE_008_002
brw-rw----. 1 oracle dba 253, 30 Dec 22 16:01 /dev/mapper/ASM_VOTE_008_003

 

## Once you have made all the selections needed below screen will appear:

Grid summary

Once you selected install  you will be updated by next progress screen:

Grid prograss

 

## In a separate screen as the Root user ./root.sh  has to run which will show:

mysrvr1hr:root:/app/grid/product/12.2.0.1/grid # ./root.sh

Performing root user operation.
The following environment variables are set as:

ORACLE_OWNER= oracle
ORACLE_HOME=  /app/grid/product/12.2.0.1/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /app/grid/product/12.2.0.1/grid/crs/install/crsconfig_params
The log of current session can be found at:
/app/oracle/crsdata/mysrvr1hr/crsconfig/roothas_2018-01-03_05-02-27PM.log

## logging details of root.sh :
LOCAL ADD MODE
Creating OCR keys for user 'oracle', privgrp 'dba'..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.

CRS-4664: Node mysrvr1hr successfully pinned.
2018/01/03 17:02:50 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.conf'
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'mysrvr1hr'
CRS-2673: Attempting to stop 'ora.evmd' on 'mysrvr1hr'
CRS-2677: Stop of 'ora.evmd' on 'mysrvr1hr' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'mysrvr1hr' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.

mysrvr1hr     2018/01/03 17:03:44     /app/grid/product/12.2.0.1/grid/cdata/mysrvr1hr/backup_20180103_170344.olr     0
2018/01/03 17:03:49 CLSRSC-327: Successfully configured Oracle Restart for a standalone server
mysrvr1hr:root:/app/grid/product/12.2.0.1/grid #

 

Resolving possible issues:  12.2 GI standalone : [INS-20802] Automatic Storage Management Configuration Assistant failed (Doc ID 2277224.1)

grid completing

 

## The Installation will create the ASM instance with a default spfile. Due to company standards and due to best practice (knowing that size does matter and default settings will not do well in a heavily used environment) you should connect to the ASM instance and  alter below values:

## Specific setup for asm Instance

ALTER SYSTEM SET memory_max_target=4096M SCOPE=SPFILE;
ALTER SYSTEM SET memory_target=1536M SCOPE=SPFILE;
ALTER SYSTEM SET large_pool_size=100M SCOPE=SPFILE;
ALTER SYSTEM SET shared_pool_size=512M SCOPE=BOTH;
ALTER SYSTEM SET shared_pool_reserved_size=100M SCOPE=SPFILE;

## Nothing to do with performance but mandatory due to Standards.
ALTER SYSTEM SET audit_file_dest='/app/oracle/+ASM/admin/adump' SCOPE=SPFILE;
ALTER SYSTEM SET background_dump_dest='/app/oracle/diag/asm/+asm/+ASM/trace' SCOPE=BOTH;

 

## Company Standards as with regard to  listener:

  • Log destination: /app/oracle/diag/tnslsnr/mysrvr1hr/listener
  • One listener per vip

## So I have added a listener with the netca tool running from  the GridInfrastructurehome.

## /app/oracle/diag/tnslsnr/<servername>/<listenername>/trace

oracle@mysrvr1hr:/app/grid/product/12.2.0.1/grid/network/admin [+ASM]# lsnrctl status LISTENER_MYSRVR1HR

## Deinstallation when needed. As always you might need a way out (back again) .

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/ladbi/deinstalling-previous-release-grid-home.html#GUID-D84F94CE-251C-4D3B-9D6E-F584FB30F5B3

Note: For upgrades from previous releases, if you want to uninstall the previous release Grid home, then perform the following steps:

  1. Log in as the root user.
  2. Manually change the permissions of the previous release Grid home (see below).
  3. Run the /app/grid/product/12.2.0.1/grid/deinstall/deinstall command (as oracle User).
For example, on Grid Infrastructure for a standalone server:

# chown -R oracle:dba /app/grid/product/12.2.0.1
# chmod -R 775 /app/grid/product/12.2.0.1

In this example:
/u01/app/oracle/product/11.2.0/grid is the previous release Oracle Grid Infrastructure for a standalone server home

oracle is the Oracle Grid Infrastructure installation owner user
dba is the name of the Oracle Inventory group (OINSTALL group)

For example, on Oracle Database:

# chown -R oracle:dba /app/oracle/product/12.2.0.1
# chmod -R 775 /app/oracle/product/12.2.0.1

 

If all is well Time to start Patching the Environment !

## patching : GI : p26737266_122010_Linux-x86-64.zip

## oracle@soltau2:/opt/oracle/Odrive/depot/software/oracle/patches/Linuxx86 []# scp p26737266_122010_Linux-x86-64.zip oracle@mysrvr1hr:/app/grid/product/12.2.0.1/stage

## check current situation with opatch before patching.

opatch lsinventory -detail -oh /app/grid/product/12.2.0.1/grid

This shows:

Oracle Interim Patch Installer version 12.2.0.1.6
Copyright (c) 2018, Oracle Corporation.  All rights reserved.

Oracle Home       : /app/grid/product/12.2.0.1/grid
Central Inventory : /app/oraInventory
from           : /app/grid/product/12.2.0.1/grid/oraInst.loc

OPatch version    : 12.2.0.1.6
OUI version       : 12.2.0.1.4

Log file location : /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatch/opatch2018-01-05_14-39-27PM_1.log
Lsinventory Output file location : /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatch/lsinv/lsinventory2018-01-05_14-39-27PM.txt
--------------------------------------------------------------------------------
Local Machine Information::
Hostname: mysrvr1hr.mydomain 

ARU platform id: 226
ARU platform description:: Linux x86-64
Installed Top-level Products (1):
Oracle Grid Infrastructure 12c                                       12.2.0.1.0
There are 1 products installed in this Oracle Home.

Installed Products (99):
etc.
.
.
There are 99 products installed in this Oracle Home.
There are no Interim patches installed in this Oracle Home.
--------------------------------------------------------------------------------
OPatch succeeded.

## Use opatch to  check for conflicts:

$ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /app/grid/product/12.2.0.1/stage/26737266/26710464
$ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /app/grid/product/12.2.0.1/stage/26737266/26925644
$ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /app/grid/product/12.2.0.1/stage/26737266/26737232
$ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /app/grid/product/12.2.0.1/stage/26737266/26839277
$ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /app/grid/product/12.2.0.1/stage/26737266/26928563

## This did not show any conflicts

## Next step will be : use opatch check for space requirements (you would not want to end up installation failing due to lacking storage:

For Grid Infrastructure Home, as home user:

Create file /tmp/patch_list_gihome.txt with the following content:
cat /tmp/patch_list_gihome.txt3.  

/app/grid/product/12.2.0.1/stage/26737266/269285634.  
/app/grid/product/12.2.0.1/stage/26737266/268392775.  
/app/grid/product/12.2.0.1/stage/26737266/267372326.  
/app/grid/product/12.2.0.1/stage/26737266/269256447.  
/app/grid/product/12.2.0.1/stage 26737266/26710464

 

Run the opatch command to check if enough free space is available in the Grid Infrastructure Home:

$ORACLE_HOME/OPatch/opatch prereq CheckSystemSpace -phBaseFile /tmp/patch_list_gihome.txt

## this shows

oracle@mysrvr1hr:/app/grid/product/12.2.0.1/stage [+ASM]# $ORACLE_HOME/OPatch/opatch prereq CheckSystemSpace -phBaseFile /tmp/patch_list_gihome.txt
Oracle Interim Patch Installer version 12.2.0.1.6
Copyright (c) 2018, Oracle Corporation.  All rights reserved.

PREREQ session

Oracle Home       : /app/grid/product/12.2.0.1/grid
Central Inventory : /app/oraInventory
from           : /app/grid/product/12.2.0.1/grid/oraInst.loc

OPatch version    : 12.2.0.1.6
OUI version       : 12.2.0.1.4

Log file location : /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatch/opatch2018-01-05_14-55-06PM_1.log
Invoking prereq "checksystemspace"

Prereq "checkSystemSpace" passed.
OPatch succeeded.


 

## To patch only the GI home:

# opatchauto apply /app/grid/product/12.2.0.1/stage/26737266 -oh /app/grid/product/12.2.0.1/grid

## failed with:

OPATCHAUTO-72046: Invalid wallet parameters.
OPATCHAUTO-72046: The wallet path or wallet password provided is not valid.
OPATCHAUTO-72046: Please provide valid wallet information.

opatchauto bootstrapping failed with error code 46.

## Thank you Mos for elaborating.

 

OPATCHAUTO-72046: Invalid wallet parameters (Doc ID 2150070.1)

 

opatchauto command is not being run as root user.  Opatchauto for Grid PSUs should always be run as root user.

## So as the root user :

/app/grid/product/12.2.0.1/grid/OPatch/opatchauto apply /app/grid/product/12.2.0.1/stage/26737266 -oh /app/grid/product/12.2.0.1/grid

## and it failed again !!!

mysrvr1hr:root:/root # /app/grid/product/12.2.0.1/grid/OPatch/opatchauto apply /app/grid/product/12.2.0.1/stage/26737266 -oh /app/grid/product/12.2.0.1/grid

System initialization log file is /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchautodb/systemconfig2018-01-05_03-09-09PM.log.
Session log file is /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/opatchauto2018-01-05_03-09-12PM.log

The id for this session is 5LQ1
[init:init] Executing OPatchAutoBinaryAction action on home /app/grid/product/12.2.0.1/grid
Executing OPatch prereq operations to verify patch applicability on SIHA Home........
[init:init] OPatchAutoBinaryAction action completed on home /app/grid/product/12.2.0.1/grid with failure
Execution of [OPatchAutoBinaryAction] patch action failed, check log for more details. Failures:
Patch Target : mysrvr1hr->/app/grid/product/12.2.0.1/grid Type[siha]

Details: [
---------------------------Patching Failed---------------------------------
Command execution failed during patching in home: /app/grid/product/12.2.0.1/grid, host: mysrvr1hr.
Command failed:  /app/grid/product/12.2.0.1/grid/OPatch/opatchauto  apply /app/grid/product/12.2.0.1/stage/26737266 -oh /app/grid/product/12.2.0.1/grid -target_type has -binary -invPtrLoc /app/grid/product/12.2.0.1/grid/oraInst.loc -persistresult /app/grid/product/12.2.0.1/grid/OPatch/auto/dbsessioninfo/sessionresult_analyze_mysrvr1hr_siha.ser -analyze -online
Command failure output:
==Following patches FAILED in analysis for apply:

Patch: /app/grid/product/12.2.0.1/stage/26737266/26925644
Log: /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-05_15-09-16PM_1.log

Reason: Failed during Analysis: CheckNApplyReport Failed, [ Prerequisite Status: FAILED, Prerequisite output:
The details are:
Prerequisite check "CheckMinimumOPatchVersion" failed.]
Failed during Analysis: CheckMinimumOPatchVersion Failed, [ Prerequisite Status: FAILED, Prerequisite output:

The details are:
The OPatch being used has version 12.2.0.1.6 while the following patch(es) require higher versions: 

Patch 26710464 requires OPatch version 12.2.0.1.7.

Please download latest OPatch from My Orac ...

etc.
.
.

OPatchAuto failed.

opatchauto failed with error code 42

mysrvr1hr:root:/root #

 

## So I downloaded latest opatch  version and parked it in a temporary directory on that targeted server:

unzip p6880880_122011_Linux-x86-64.zip -d /app/grid/product/12.2.0.1/grid

## now  Opatch shows:

oracle@mysrvr1hr:/app/grid/product/12.2.0.1/opatch [+ASM]# opatch version
OPatch Version: 12.2.0.1.11

OPatch succeeded.

## Sometimes you just have to be patient to hear the lambs being silent:

## Next run as the root user :

/app/grid/product/12.2.0.1/grid/OPatch/opatchauto apply /app/grid/product/12.2.0.1/stage/26737266 -oh /app/grid/product/12.2.0.1/grid

 

## logfiles:

oracle@mysrvr1hr:/app/oracle/crsdata/mysrvr1hr/crsconfig

-rw-rw----. 1 oracle dba  17364 Jan  5 15:35 hapatch_2018-01-05_03-34-42PM.log

-rw-rw----. 1 oracle dba  23725 Jan  5 15:42 hapatch_2018-01-05_03-42-41PM.log

##showed
mysrvr1hr:root:/root # /app/grid/product/12.2.0.1/grid/OPatch/opatchauto apply /app/grid/product/12.2.0.1/stage/26737266 -oh /app/grid/product/12.2.0.1/grid
OPatchauto session is initiated at Fri Jan  5 15:33:54 2018
System initialization log file is /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchautodb/systemconfig2018-01-05_03-33-58PM.log.
Session log file is /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/opatchauto2018-01-05_03-34-02PM.log

The id for this session is XLE2
Executing OPatch prereq operations to verify patch applicability on home /app/grid/product/12.2.0.1/grid
Patch applicability verified successfully on home /app/grid/product/12.2.0.1/grid

Bringing down CRS service on home /app/grid/product/12.2.0.1/grid
Prepatch operation log file location: /app/oracle/crsdata/mysrvr1hr/crsconfig/hapatch_2018-01-05_03-34-42PM.log
CRS service brought down successfully on home /app/grid/product/12.2.0.1/grid

Start applying binary patch on home /app/grid/product/12.2.0.1/grid
Binary patch applied successfully on home /app/grid/product/12.2.0.1/grid

Starting CRS service on home /app/grid/product/12.2.0.1/grid
Postpatch operation log file location: /app/oracle/crsdata/mysrvr1hr/crsconfig/hapatch_2018-01-05_03-42-41PM.log
CRS service started successfully on home /app/grid/product/12.2.0.1/grid

OPatchAuto successful.

--------------------------------Summary--------------------------------
Patching is completed successfully. Please find the summary as follows:
Host:mysrvr1hr
SIHA Home:/app/grid/product/12.2.0.1/grid

Summary:
==Following patches were SUCCESSFULLY applied:
Patch: /app/grid/product/12.2.0.1/stage/26737266/26710464
Log: /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-05_15-35-04PM_1.log

Patch: /app/grid/product/12.2.0.1/stage/26737266/26737232
Log: /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-05_15-35-04PM_1.log

Patch: /app/grid/product/12.2.0.1/stage/26737266/26839277
Log: /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-05_15-35-04PM_1.log

Patch: /app/grid/product/12.2.0.1/stage/26737266/26925644
Log: /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-05_15-35-04PM_1.log

Patch: /app/grid/product/12.2.0.1/stage/26737266/26928563
Log: /app/grid/product/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-05_15-35-04PM_1.log

OPatchauto session completed at Fri Jan  5 15:43:05 2018
Time taken to complete the session 9 minutes, 11 seconds


 

Happy Dba , Installed 12.2 GI and Patched it with RU October 2017.

Thanks for reading and till we meet again,

Mathijs.

Character-set change in an Oracle Database

Introduction

Recently a request came in to investigate options to change the character set for a number of core databases. Goal would be to move from  we8iso8859p15 to al32utf8. Below are some thoughts, things to remember if you are really heading in such direction.

A management summary of this article would be:  Yes it is technically possible to move the character set of a database from we8iso8859p15 to al32utf8.  But this can only be achieved after careful checking first and if checks show actions are needed then you will Need a team with application – datamodel knowledge look into the reports that will be generated during the  inventory part of the scenario. With the knowledge gained from those reports they (application – datamodel) aware people will then have to fix issues if any in the Source database before running another analyses ( scan ) before doing a conversion. Oh and of course ( test test test test ) and have a backup in place  before doing this on your live production boxes.  Oracle software is trying to protect you by keeping a max of 7 days between analyses and conversion and of course that makes sense given the fact that the data of course will continue to grow during and after the first analyses.

Details

When setting up a new database there is of course a lot to be considered as in sizing of the database ( sizing during start and lets say after 12 Months ), the users and their privileges but there is one extra very important topic to think of well ( think of twice would even be better ) and that is the character set.  Plain and simple if the Character-set is not chosen correctly there might be an issue converting the character set (if possible at all ) if you want to set it to the one desired at a later point and time. As a rule of thumb one could even say if the character set is wrong the data present in the database needs to be extracted (with tools like datapump), the database would need to be recreated in the new character-set after which the data can be added (inserted) again.  Of course only to be done after testing such a scenario on a pre-production environment and only after careful analyses of  the results in the reports to check the data before moving such scenario to the production environment.

There is one exception to the above scenario where you always will have to recreate the database to get to the desired (new) character-set. If  the new character-set is a super set ( if it is holding  the same characters ( and at the same location in the character table))  then a conversion will be possible.  Of course still needed to do a proper analyses about the data present in the database before starting such conversion , plan a scenario in pre-production to do the conversion, do a lot of analyses of the results , run the same scenario ( run inventory  do an impact analyses  first on production,  making a backup, do the conversion and do a lot of post checks ones that conversion has completed).

Before Oracle 12 two tools would need to be used to do the conversion. Csscan tool would do the inventory activity and present the results in three files about the consequences of such a conversion.  If data would be lost you will have to manually fix such issues  and rerun the csscan tool. Once that has been done the csalter tool would need to run in a  restricted database ( only one instance up in case of a rac (all others need to be down) ,  only one session logged in as sys, only in a restricted database).  Of course this will mean  that  the csalter tool will require a downtime of your environment .

In Oracle 12 csscan and csalter are no longer supported and you will have to look for the dmu tool.

Csscan and Csalter

Preparations:

### This Mos note has been used  as a baseline for the csscan:

Installing and configuring Csscan in 10g and 11g (Database Character Set Scanner) (Doc ID 745809.1)

Changing the NLS_CHARACTERSET to AL32UTF8 / UTF8 (Unicode) in 8i, 9i , 10g and 11g (Doc ID 260192.1)

Csscan Output Explained (Doc ID 444701.1)

As a best practice it was mentioned to alter the script after making a copy of it.

### In the existing f.e. 11G ORACLE_HOME:
cp $ORACLE_HOME/rdbms/admin/csminst.sql $ORACLE_HOME/rdbms/admin/csminst.sql.old

### As best practice alter the script by altering the csmig user to an existing tablespace in the database with enough room. ( Hint: you might even consider to put the designated tablespace to autoextend till the analyses have completed).

alter user csmig default tablespace nomonitor quota unlimited on nomonitor;

### Then run the adapted script:

sqlplus /nolog
conn / as sysdba
set TERMOUT ON
set ECHO ON
spool csminst.log
@?/rdbms/admin/csminst.sql

### Make sure you perform below changes to the user:

sqlplus /nolog
conn / as sysdba;
alter user csmig identified by ces10;
alter user csmig account unlock ;
grant create session to CSMIG;
exit;

###  All set for the analyses so on the Operating System level run:

export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$LD_LIBRARY_PATH
csscan SYSTEM FULL=y TOCHAR=AL32UTF8 ARRAY=2048000 PROCESS=32 CAPTURE=N LOG=$ORACLE_SID.log

 

### Important to  know about csscan and csalter is that they are desupported in 12c.

https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=549806046730391&id=1418321.1&_afrWindowMode=0&_adf.ctrl-state=sm7b61jd7_375

For a scenario in 11G ( or before ) :

http://dbaonline.org/newsletter/Newsletter_200908.pdf

If your source database is big ( one of the environments where i had to run it was app 8TB).  The csscan (11G env) will take time( in my case it ran for some 13 hours). If you like to keep an eye on it  v$session_longops will be your friend:

sqlplus / as sysdba
SET pages 1000
SELECT target,
TO_CHAR(start_time,'HH24:MI:SS - DD-MM-YY'),
time_remaining,
sofar,
totalwork,
sid,
serial#,
opname
FROM v$session_longops
WHERE sid IN
(SELECT sid FROM v$session WHERE upper(program) LIKE 'CSSCAN%')
AND sofar < totalwork
ORDER BY start_time
/

DMU tool

 

### Dmu tool :

https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=549848776806800&id=1272374.1&_afrWindowMode=0&_adf.ctrl-state=sm7b61jd7_429

## Known issues with dmu:

https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=549881125542061&id=2018250.1&_afrWindowMode=0&_adf.ctrl-state=sm7b61jd7_483

### Requirements for DMU tool in Oracle (dmu can be used as of 11.2.0.3. and up)

### Needed to run this package in database you need 2 analyse.

@?/rdbms/admin/prvtdumi.plb

### Need to make sure you have a password file in place.  First attempt failed, when trying to connect to the database in the dmu tool ,  kept getting invalid username or password …  But after some time i realized that a password file might be mandatory. If a password file is not present this is the way to create password file :

orapwd file=$ORACLE_HOME/dbs/orapw$ORACLE_SID password=manager entries=16 force=y

### If password file seems to be in place , check in the database with ( you should at least see sys out there then)

SQL> select * FROM v$pwfile_users;

### DMU tool will need a user with sysdba privileges .

### Before starting need to have set up environment.

export JAVA_HOME=$ORACLE_HOME/jdk
$ORACLE_HOME/dmu/dmu.sh

 

As always, a pleasure to share thoughts and approaches. Happy reading,

Mathijs.

To Tfa or not to Tfa

Introduction

Writing a few notes on the tool of Tfa on Oracle Real Application Clusters to share my current experiences on this topic. When you create a Tar @ Oracle for an issue on one of your clusters it might very well be that Support is asking to upload Tfa reports. In such case question number one would be do I have that tool installed in my software stack of the cluster-ware. Well first comfort is that as of Oracle 11.2.0.4 this tool is present as part of the standard cluster stack. But .. ( sometimes thinking why does there always have to be a but ). In my specific case Grid Infra structure is 11.2.0.3 so first I had to look if the tool was installed as an add-on to the cluster,  and second of course had to look if it was / is up and running, if it was able to survive a node reboot etc.

Below you will find some notes on the how and why of  this tool.

On the why you should use it part:

Oracle Trace File Analyzer White Paper

Okay that does make a lot of sense since the tool is helping us not only gather various kind of information in the log files on all the cluster nodes, but it is also able to filter that information to hold only data from lets say 4 hours before the incident till 1 hour after the incident.

On the how to part:

As mentioned since the cluster is using Grid Infra structure 11.2.0.3. it was needed to check if tool was installed and was running. Details below are more with regard to the checking if tools is there and is running. Should you need more on the installation process: Oracle Grid Infrastructure Trace File Analyzer Installation

Tfa summary of commands which all have to run as the mighty ROOT:

## checking actions:
cd /opt/crs/product/tfa/bin/
tfactl print actions
## When all is running and you need to collect data as ROOT:
./tfactl diagcollect -all -node all
## which hosts are being monitored:
./tfactl print hosts
Information with regard to  the configuration
ls -ltr /etc/init.d/init.tfa

-rwxr-xr-x 1 root root 11724 Aug 9 2016 /etc/init.d/init.tfa
## Starting the Tfa Daemon(s):
MYSRVR01hr:root:/root # /etc/init.d/init.tfa start
or 
MYSRVR01hr:root:/opt/crs/product/tfa/bin # ./tfactl start
## enable – disable autostart at node reboot: 
MYSRVR01hr:root:/opt/crs/product/tfa/bin # ./tfactl enable
## show status: 
MYSRVR01hr:root:/opt/crs/product/tfa/bin # ./tfactl print status
## show config:
MYSRVR01hr:root:/opt/crs/product/tfa/bin # ./tfactl print config
##show directories:
MYSRVR01hr:root:/opt/crs/product/tfa/bin # ./tfactl print directories
## collect data for example on all nodes:
# ./tfactl diagcollect -all -node all
or
./tfactl diagcollect -from “MMM/dd/yyyy hh:mm:ss” -to “MMM/dd/yyyy hh:mm:ss”
./tfactl diagcollect -from "Jun/30/2017 09:15:00" -to "Jun/30/2017 12:00:00"
### Examples starting tfa 
MYSRVR01hr:root:/root # /etc/init.d/init.tfa start
Starting TFA..
Waiting up to 100 seconds for TFA to be started..
. . . . . 
. . . . . 
Successfully started TFA Process..
. . . . . 
TFA Started and listening for commands
## Example enabling the start of the daemons at server / node reboot:
MYSRVR01hr:root:/opt/crs/product/tfa/bin # ./tfactl enable
TFA autostart has been enabled ..
## Example status
MYSRVR01hr:root:/opt/crs/product/tfa/bin # ./tfactl print status

.------------------------------------------------------------------------------------------------.
| Host | Status of TFA | PID | Port | Version | Build ID | Inventory Status |
+----------+---------------+-------+------+------------+----------------------+------------------+
| MYSRVR01hr | RUNNING | 3608 | 5000 | 12.1.2.5.0 | 12125020150615061618 | COMPLETE |
| MYSRVR02hr | RUNNING | 20057 | 5000 | 12.1.2.5.0 | 12125020150615061618 | COMPLETE |
'----------+---------------+-------+------+------------+----------------------+------------------'
## Example showing current configuration:

MYSRVR01hr:root:/opt/crs/product/tfa/bin # ./tfactl print config
.----------------------------------------------------------.
| MYSRVR02hr                                               |
+---------------------------------------------+------------+
| Configuration Parameter | Value |
+---------------------------------------------+------------+
| TFA version | 12.1.2.5.0 |
| Automatic diagnostic collection | OFF |
| Trimming of files during diagcollection | ON |
| Repository current size (MB) | 466 |
| Repository maximum size (MB) | 1271 |
| Inventory Trace level | 1 |
| Collection Trace level | 1 |
| Scan Trace level | 1 |
| Other Trace level | 1 |
| Max Size of TFA Log (MB) | 50 |
| Max Number of TFA Logs | 10 |
| Max Size of Core File (MB) | 20 |
| Max Collection Size of Core Files (MB) | 200 |
| Automatic Purging | ON |
| Minimum Age of Collections to Purge (Hours) | 12 |
'---------------------------------------------+------------'

.----------------------------------------------------------.
| MYSRVR01hr |
+---------------------------------------------+------------+
| Configuration Parameter | Value |
+---------------------------------------------+------------+
| TFA version | 12.1.2.5.0 |
| Automatic diagnostic collection | OFF |
| Trimming of files during diagcollection | ON |
| Repository current size (MB) | 473 |
| Repository maximum size (MB) | 1170 |
| Inventory Trace level | 1 |
| Collection Trace level | 1 |
| Scan Trace level | 1 |
| Other Trace level | 1 |
| Max Size of TFA Log (MB) | 50 |
| Max Number of TFA Logs | 10 |
| Max Size of Core File (MB) | 20 |
| Max Collection Size of Core Files (MB) | 200 |
| Automatic Purging | ON |
| Minimum Age of Collections to Purge (Hours) | 12 |
'---------------------------------------------+------------'

Happy reading,

Mathijs

 

Altering the Hearbeat in Oracle Rac environment.

Introduction

When being asked for a part of job description of a dba  I would like to take a moment, smile and reply:  to serve and to protect the cluster the databases and the data of course. In the matter at hand this will have to mean that this missions/assignment will be all about increasing High availability and improve the functionality of the cluster interconnect between nodes and that all for one price: doubling  your cluster interconnect ips on the various layers.

And to be honest this assignment is even more of interest because after all how often is there a need or a challenge to alter IP addresses once you have setup the cluster. Hmm only valid options  I could think of would be action where there is a action like lifting and shifting the servers to other rooms  or as in this case because we simply want to improve availability.

Some days before the action was scheduled, I looked into the matter of the required dedicated ips together with the colleagues from team. In the file in /etc/hosts being the source for that we looked for ips which had  hb in the alias of the naming convention which is according to standards on OS level. This information  was used to setup the ips on the OS level. Once that task was completed a first and very important test had to be: can you ping these new and dedicated ips from every node which is part of the cluster. And since this is a happy flow scenario that was of course what happened. During the maintenance window itself detailed steps as seen below have been performed to make these addresses known / usable on the cluster layer in the grid infrastructure.

Detailed Plan

In this scenario we are using a 4 node Rac cluster on RH with 11.2.0.4 Grid Infrastructure. When looking on the OS level this is what we found already present in the hosts file. And even better these ips where all available and not in use.

grep -i hb /etc/hosts
 • 10.124.10.75 mysrvrahr-hb1.dc.nl mysrvrahr-hb1
 • 10.124.11.75 mysrvrahr-hb2.dc.nl mysrvrahr-hb2
 • 10.124.10.76 mysrvrbhr-hb1.dc.nl mysrvrbhr-hb1
 • 10.124.11.76 mysrvrbhr-hb2.dc.nl mysrvrbhr-hb2
 • 10.124.10.77 mysrvrchr-hb1.dc.nl mysrvrchr-hb1
 • 10.124.11.77 mysrvrchr-hb2.dc.nl mysrvrchr-hb2
 • 10.124.10.78 mysrvrdhr-hb1.dc.nl mysrvrdhr-hb1
 • 10.124.11.78 mysrvrdhr-hb2.dc.nl mysrvrdhr-hb2

The steps below have been followed based on a great Mos note in order to complete the tasks that are needed to make the Grid infrastructure (cluster) aware of the new ips. The scenario is running through a number of steps to be well prepared but also of course to be on the save side before and during the changes on the cluster layer.  It is like hmm paying respect and being brave but cautious .

Preparation steps:
As of 11.2 Grid Infrastructure, the private network configuration is not only stored in OCR but also in the gpnp profile. Documentation was very clear on this: If the private network is not available or its definition is incorrect, the CRSD process will not start and any subsequent changes to the OCR will be impossible.

  • Therefore care needs to be taken when making modifications to the configuration of the private network.
  • It is important to perform the changes in the correct order.

Note that manual modification of gpnp profile is not supported so it is best to stick to proper actions and not go into hacking mode!

So let’s take a backup of profile.xml on all cluster nodes before proceeding:

As grid user ( in my case the oracle user) , move to the correct directory( cd $GRID_HOME/gpnp/<hostname>/profiles/peer/).

cd /app/oracle/product/11.2.0/grid/gpnp/mysrvrahr/profiles/peer 
cd /app/oracle/product/11.2.0/grid/gpnp/mysrvrbhr/profiles/peer 
cd /app/oracle/product/11.2.0/grid/gpnp/mysrvrchr/profiles/peer 
cd /app/oracle/product/11.2.0/grid/gpnp/mysrvrdhr/profiles/peer 

During startup of cluster Oracle is relying on this very important xml file
for specific data like spfile / diskgroups and of course IPS. 

cp -p profile.xml profile.xml.bk

Ensure Oracle Clusterware is running on ALL cluster nodes in the cluster and save current status of resource. (Better save then sorry and to make sure you know about the health of  the cluster and its resources as they have been defined. So do check the cluster and save the current status of resources in a file as a pre-change image.

/app/oracle/product/11.2.0/grid/bin/crsctl check cluster -all 
/app/oracle/product/11.2.0/grid/bin/crsctl status resource -t>/tmp/beforeNewIps.lst

As grid user( in my case the oracle user): Get the existing information.

showing which interfaces are defined in the cluster.
##below you will see that the current (single) cluster interconnect is set up at 64.18.112.208

/app/oracle/product/11.2.0/grid/bin/oifcfg getif
bond1 64.18.112.208  global  cluster_interconnect
bond0  195.233.190.64  global  public

The command iflist will show you the network information known on the OS. Showing defined all ( or specific ) ips. Check the interfaces / subnet address can be identified by command for eth specifically:

/app/oracle/product/11.2.0/grid/bin/oifcfg iflist|grep -i eth|sort 
64.18.32.0 eth0
10.124.10.0 eth2
10.124.11.0 eth6

or

## check  interfaces / subnets in general:
 /app/oracle/product/11.2.0/grid/bin/oifcfg iflist|sort

Since we now have a good picture of the status of the cluster and since we know more about the ips being used (oifcfg getif) and about the ips being present on the system (oifcfg iflist) all things set to Add the new cluster_interconnect information. As you can see definition of both eth2 Address and eth6. And with the -global parameter the information is shared in the complete cluster on all nodes

/app/oracle/product/11.2.0/grid/bin/oifcfg setif -global eth2/10.124.10.0:cluster_interconnect 
/app/oracle/product/11.2.0/grid/bin/oifcfg setif -global eth6/10.124.11.0:cluster_interconnect

Of course there cannot be a change without verifying it. So i checked on all nodes with below command.

/app/oracle/product/11.2.0/grid/bin/oifcfg getif  

Since we are using 11GR2 Grid Infrastructure below steps are to be followed now: Shutdown Oracle Cluster ware on all nodes and disable the Oracle Cluster ware as root

Action is to be performed as the root user: 
sudo su -  
./app/oracle/product/11.2.0/grid/bin/crsctl stop crs 
./app/oracle/product/11.2.0/grid/bin/crsctl disable crs

In this specific scenario my Linux brothers in arms had already made the network configuration change at OS level as was required and that great job was seen in the oifcfg iflist command. They made sure that the new interfaces were available on all nodes after their change.

(check to ping the interfaces on all nodes with script kindly provided by Linux team member). 
for x in 10 11;do for xx in 75 76 77 78;do ping -c2 10.124.${x}.${xx}|egrep 'icmp_seq|transmitted';done;echo;done 
for x in a b c d; do for xx in 1 2;do ping -c2 mysrvr${x}hr-hb$xx|egrep 'icmp_seq|transmitted';done;echo;done 

Well all went well and has been checked so it is time to restart Oracle Cluster ware and once completed enable Oracle Cluster ware again.

On all nodes in the cluster:

## as root user: 
sudo su -  
/app/oracle/product/11.2.0/grid/bin/crsctl start crs

Seeing  = believing in this matter so after some time Check:

/app/oracle/product/11.2.0/grid/bin/crsctl check cluster -all 

In the step below we are checking the status of the resources in the cluster again and adding that information to  a file. This “post” operation file is then being used to compare the status of the cluster resources before and after.

/app/oracle/product/11.2.0/grid/bin/crsctl status resource -t>/tmp/afterNewIps.lst 
sdiff /tmp/afterNewIps.lst /tmp/beforeNewIps.lst

This compare showed me that a 10G RAC database resource and its services needed my intention, so via the cluster commands i checked and observed their status after starting them with srvctl command as the oracle user.  Once completed I ran another check as described and ah happy me all resource in the post status file were in a similar status ( online online) as in the pre status file.

as root user: 
sudo su -  
/app/oracle/product/11.2.0/grid/bin/crsctl enable crs

Time to wrap up this scenario. As part of housekeeping remove the old interface:

/app/oracle/product/11.2.0/grid/bin/oifcfg delif -global bond1/64.18.112.208:cluster_interconnect

Verified the environment one more time.

/app/oracle/product/11.2.0/grid/bin/oifcfg getif

Cluster ware proved already of course but checked the databases and listeners as a last sanity check and that completed the tasks for this time.

Time to inform Apps team that they can perform their sanity checks and start the applications again.

Happy reading and till next Time.

Mathijs