Reconfigure Rac 12.2 Gridinfra network

Summary.

In 12.2 Grid infrastructure Oracle has altered the concept of ASM to flex-ASM as a default. This blog will take a focus on re-configuring the Oracle Rac 12.2 Grid infrastructure network component parts like the interconnect, the public or to change the interface to do-not-use, whenever that applies / is an improvement to the situation at hand. Read carefully in full before performing it on one of your clusters. Baseline for this action will be a document on Mos (How to Modify Private Network Information in Oracle Clusterware (Doc ID 283684.1))

Details:

As with any change, when going through input – processing – output it is important to have a clear picture of the situation as is. So a first and  very mandatory step should be to check with the oifcfg getif command how things are before starting the changes:

When entering the command information with regard to the known network interfaces in the Rac cluster similar to below should be showing:

oracle@mysrvr1dr:/app/oracle/stage/27468969 [+ASM1]# oifcfg getif

bond0  198.19.11.0  global  public
eth0  10.217.210.0  global  cluster_interconnect,asm
eth2  192.168.10.0  global  cluster_interconnect
eth7  192.168.11.0  global  cluster_interconnect

Here bond0 will be used used as public, eth0 at the moment is holding activities for the cluster interconnect and for asm, eth2 and eth7 are dedicated to  the interconnect. Eth0 is defined as admin lan for various activities. In this setup the cluster is unstable, nodes are being evicted. Time to perform steps to stabilize it.

From the Mos note, looking at Case IV. Changing private network interface name, subnet or netmask. For 12c Oracle Clusterware with Flex ASM.

Precaution, taking backup of profile.xml on each node.

Take a backup of profile.xml on all cluster nodes before proceeding, as grid user. In this specific case this is the user that has installed the Grid Infrastructure ( in this scenario that was the  oracle user):
Command:

$ cd $GRID_HOME/gpnp/<hostname>/profiles/peer/
 $ cp -p profile.xml profile.xml.bk
cd /app/grid/product/12201/grid/gpnp/mysrvr1dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr2dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr3dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr4dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr5dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr6dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr7dr/profiles/peer
cp -p profile.xml profile.xml.bk

cd /app/grid/product/12201/grid/gpnp/mysrvr8dr/profiles/peer
cp -p profile.xml profile.xml.bk

Altering the interconnect:

One of the interconnects should be altered to make sure that the ASM listener is able to communicate using that interface to. In this scenario eth2 was used to do so. When doing this take note of the ip since it will be needed to configure a new ASM listener.

oifcfg setif -global eth2/192.168.10.0:cluster_interconnect,asm
oifcfg setif -global eth7/192.168.11.0:cluster_interconnect

Now eth2  shows that it setup for interconnect and asm (only one interconnect should be setup to combine cluster_interconnect+asm).

peer [+ASM1]# oifcfg getif

bond0  198.19.11.0  global  public
eth0  10.217.210.0  global  cluster_interconnect,asm
eth2  192.168.10.0  global  cluster_interconnect,asm
eth7  192.168.11.0  global  cluster_interconnect

With this information checked and in place it is time for setting up new listener for asm since the original ASM listener during the installation used eth0 and that eth0 will be dropped  – removed from cluster configuration in steps below:

Existing listener ASMNET1LSNR  will become new one ASMNET122LSNR.

srvctl add listener -asmlistener -l ASMNET122LSNR -subnet 192.168.10.0
(as mentioned this is the eth2 interface that we are going to use).

As always seeing is believing : use crsctl status resource -t to see details similar to below. The new ASM listener is created as a resource and it is in a status offline offline on all nodes in the cluster at this point and time :

--------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------
ora.ASMNET122LSNR_ASM.lsnr
               OFFLINE OFFLINE      mysrvr1dr                 STABLE
               OFFLINE OFFLINE      mysrvr2dr                 STABLE
               OFFLINE OFFLINE      mysrvr3dr                 STABLE
               OFFLINE OFFLINE      mysrvr4dr                 STABLE
               OFFLINE OFFLINE      mysrvr5dr                 STABLE
               OFFLINE OFFLINE      mysrvr6dr                 STABLE
               OFFLINE OFFLINE      mysrvr7dr                 STABLE
               OFFLINE OFFLINE      mysrvr8dr                 STABLE

In the next step we will remove the old ASM listener, and use a -f option to prevent errors – messages with regard to dependencies.

srvctl update listener -listener ASMNET1LSNR_ASM -asm -remove -force

I have checked again with crsctl status resource -t to make sure the old resource is gone now.

Removing the old ASM listener

In the Mos note there is a little inconsistency because it claims  that as a next step the old ASM listener should be stopped.  I was able to grep for the listener ( ps -ef|grep -i inherit)  and i saw it on OS level on the machine(S). But I am not able to stop that listener  since the cluster resource is already gone and lsnrctl did not work. Solution: What I noticed that when I skipped this step and stopped and started the cluster which is mandatory in this scenario, the listener was gone on all nodes.

Should have given this command, but that is NOT working:
lsnrctl stop ASMNET1LSNR_ASM

Check configuration before restarting GI:

First command:
srvctl config listener -asmlistener

Name: ASMNET122LSNR_ASM
Type: ASM Listener
Owner: oracle
Subnet: 192.168.10.0
Home: <CRS home>
End points: TCP:1527
Listener is enabled.
Listener is individually enabled on nodes:
Listener is individually disabled on nodes:

Second Command:
srvctl config asm

ASM home: <CRS home>
Password file: +VOTE/orapwASM
Backup of Password file:
ASM listener: LISTENER
ASM instance count: ALL
Cluster ASM listener: ASMNET122LSNR_ASM

Both results look great so time to move to the next step (restarting the Grid Infra structure on all nodes).

Restarting Grid infrastructure on all Nodes:

For this next step you have to become root (or sudo su – ) to do the next steps. First and importantly make sure that the Grid infra structure is not restarting automatically should a cluster node perform a reboot (disable crs) , then stop the Grid infrastructure software:

As root

/app/grid/product/12201/grid/bin/crsctl disable crs
/app/grid/product/12201/grid/bin/crsctl stop crs
To be done on: mysrvr[1-8]dr

Checking network configuration on all nodes.

mysrvr1dr:root:/root $ ifconfig -a

Starting cluster again:

As root

/app/grid/product/12201/grid/bin/crsctl enable crs
/app/grid/product/12201/grid/bin/crsctl start crs

To be done on: mysrvr[1-8]dr

Final checks:

oifcfg getif

bond0  198.19.11.0  global  public
eth0  10.217.210.0  global  cluster_interconnect,asm
eth2  192.168.10.0  global  cluster_interconnect,asm
eth7  192.168.11.0  global  cluster_interconnect

Time to delete eth0

Since eth0 is admin lan, and after our reconfigure steps, time  to get rid of the eth0 (remove it from the Grid infra structure).

oifcfg delif -global eth0/10.217.210.0 

And a last check again:

oifcfg getif

bond0  198.19.11.0  global  public
eth2  192.168.10.0  global  cluster_interconnect,asm
eth7  192.168.11.0  global  cluster_interconnect

Happy reading, and till we meet again,

Mathijs.

 

 

Some things happened (while installing 12.2 GI in a Rac cluster).

Summary:

One of the fine people in the Oracle community once shared  a thought where he told about repairing a bike once and repeating the plan of approach N times. As a lesson learned from  that scenario it is best to see each bike as a new challenge which deserves a new and fresh approach. In this blog I will describe a number of things i came across when setting up Grid Infra structure 12.2 with January 2018 PSU on a multi-node cluster.

Details – Things to look after:

  • Locating the log files of the installation can make a world of difference. Make sure you understand and find  the location of it and have it tailed during all of install.  In my case in this directory will find subdirs and log file for example: /app/oraInventory/logs/GridSetupActions2018-04-26_09-39-53AM.
  • In the past you always had  one destination to unzip your Software.zip and during installation the runInstaller would ask for an installation location  during setup. With 12.2 ( and in Oracle 18 Grid infra) that is no longer the case. Create the subdirectory where the software is supposed to be installed and unzip your files there as a first step.
  • runInstaller is no more … In order to start the installation process you will have to find this command:./gridSetup.sh
  • When Installing as in my case on Red Hat  Linux 7.4 with a Patched Kernel you might come across  ACFS-9154: Loading ‘oracleoks.ko’ driver.  >  modprobe: ERROR: could not insert ‘oracleoks’: Unknown symbol in module, or unknown parameter (see dmesg) >  ACFS-9109: oracleoks.ko driver failed to load. >  ACFS-9178: Return code = USM_FAIL >  ACFS-9177: Return from ‘ld usm drvs’ >  ACFS-9428: Failed to load ADVM/ACFS drivers. A system reboot is recommended. You can Solve that by running the gridSetup.sh with parameters which will install the Patch(es) first and then run the commands:./gridSetup.sh -applyPSU /app/grid/product/12201/grid/27100009.  Translated this means that the psu patch needs to be applied first and then the gridSetup can start its setup.
  • Images during setup have changed. In my case I have selected  this one. Which also brought me FLEX-ASM as per default in 12.2.

 

2018-05-06_074150

When installing GI as a standalone cluster in the follow-up screens you are asked to add the nodes of your cluster either as a Hub or as a Leave. Thus differentiating by default which nodes should have a dedicated ASM instance ( Hub) and which nodes will communicate remotely with one of the Hub-Asm instances. After install i learned that in 12.2 as a default 3 ASM instances will be created  no matter how many nodes  there are in your cluster.

  • Scan listener: Make sure as a preparation that the colleagues from Linux team have added the 3 Ips for your cluster in the Dns and try a nslookup first before installing. During installation when you have to  add the clustername ( here presented as mycluster) ,  the installing tool will also show the scan-name (and most likely you will have to alter it anyhow to meet with the information in  dns needed for the setup).( in dns mycluster-scan.prod.nl) needs to be present as 3 IP addresses

2018-05-06_075326

  • In the clusternode screen you will add all the nodes in your cluster. In this case I intended to set up each node as a hub ( thus expecting that there would be 8 asm instances in place too ( which was not the case but that is elaborated in other topic.
  • On this screen you add the nodes using the add button.
  • On this screen you can set up SSH connectivity between all the nodes. On the web it was not clear to me in various blogs and in the documentation neither, what is the preferred way to do this. I had the tool setup ssh connectivity between all nodes and i was happy with result.
  • Once completed press next and the tool will show something like “validating node readiness”.

2018-05-06_080405

  • In the specify Network Interface usage screen:
  • Best practice  / Lessons learned: Make sure you have consulted with  the Linux team about the interfaces. In my specific case > ETH0 is admin lan . You should put it to do not use. Eth2 and Eth7 are the private interconnects. Make sure that only one of the is have the option Private, ASM.  (In a flex asm cluster ASM needs a way to communicate via its dedicated listener. Since as per default you will have only ASM listener, make sure only one of the private interconnects is using this combo of private and ASM).

2018-05-06_082605

Note: This installation was implemented on New Hardware , coming from Dell. During the install we found out that in the original setup the DELL systems use a Range of Ips ( 169.*) which is also used by the HA-IPS of Oracle. And even when the colleagues of Linux might grumble, it is mandatory that the range of 169* is not in use!! In the first setup the Dell systems had  the 169* enabled for their idRac interface. This IPs have been disabled.

  • For the setup of two Diskgroups ( one for the OCR and voting disks and one for the GIMR ( grid infra structure Management Repository) make sure that the Linux Admins have delivered ASM disks). In my case i got 2 times 3 Disks , so i could setup Normal redundancy Diskgroups for both Diskgroups.
  • On the Summary screen , pay extra attention to make sure that all the cluster nodes that you intend to have included in your soon to be cluster are showing ! ( Hub nodes: this should show all the nodes. If this is not the case you can select Edit ( which will rerun all steps as of Cluster node information).

Well that is all for now .. To be continued in a galaxy near you …

As always happy reading and till we meet again.

Mathijs