Introduction:
This week I have been part of the debate again , do we or don’t we relink when major activities like Upgrade of Linux Kernel is performed . I have been asked to do the relink after the Rac cluster was upgraded on Linux. So as always thought it would be wise to make notes during the day as a plan to be performed during the night . In this blog you will find the steps i have performed on a two node Rac cluster with 11.2.0.4 Grid Infrastructure and two Oracle software trees holding 11.2.0.4 Rdbms and 11.1. Rdbms.
With regard to relinking discussion in team had been like .. 1) we might break things in relinking and 2) we don’t have the resources to do that for every server. My recommendation is to follow Oracle in this and do deal with relink of the Grid Infra right after OS has been relinked . Cause if something is broken during the Upgrade and your relinking there after well at least you know where it came from and can deal with things as from there . Where as if you do not relink your Software right after such a major change on OS you might still be hit in the dark in the upcoming weeks and you would need to figure out then what might have caused things.
You can even debate on the fact if it is needed to stop the resources like listeners and databases gracefully before shutting down the cluster or to perform a checkpoint in your database and just shutdown the crs . I have been doing both approaches and never had issues so far. But i can imagine that heavy used , busy systems might prefer the grace shutdown before shutting down GI.
Below you will find my steps . As always happy reading and till we meet again ,
Mathijs.
Detailed Plan:
mysrvrar / mysrvrbr | Steps 1 – 8 will be performed on all two nodes in my cluster, in a sequential order with some delay to make sure no cluster panic will occur. | |
1 | crsctl status resource -t>/tmp/BeforeWork.lst | Check your cluster in order to be able to compare it to what it looks like after the relinking. Maybe it is even a good idea to put it into a file. Often i end up on clusters which i am not that familiar with on a daily basis. So i tend to make this overview before i start working on the cluster. |
2 | cSpfile.ksh | This is a home made script in which several activities are performed. It will perform a create a spfile , do a checkpoint and do switch logfile right before shutting down the cluster node. |
3 | emctl stop agent | |
4 | srvctl stop home -o $ORACLE_HOME -s /tmp/statusRDBMS -n mysrvrar | This will stop all resources that started from 1120.4 home and keep a record of them in the file in /tmp/status RDBMS. This will be convenient when starting again . |
5 | ||
6 | srvctl stop instance -d MYDBCM -i MYDBCM1 | This is a shared cluster so we have customers requiring the 1120.4 software and some the 11.1 software . The 11.1 databases have to be stopped individually. |
srvctl stop instance -d MYDBCMAC -i MYDBCMAC1 | ||
7 | srvctl stop listener -n mysrvrar -l listener_MYDBCM1 | It is common to have a listener per database so i will stop the 11.1 listener in proper way as well. |
srvctl stop listener -n mysrvrar -l listener_MYDBCMAC1 | ||
8 | As root: | Dealing with the cluster means you have to logon or perform sudo su – as the ORACLE user to become ROOT to perform the needed task to stop the cluster-ware on the cluster node. |
9 | cd /opt/crs/product/11204/crs/bin | |
10 | ./crsctl disable crs | During this maintenance Linux will be patching and rebooting various times so i was asked to make sure that the Grid Infra structure is not starting at each reboot till we are ready. |
11 | ./crsctl stop crs | Last step as preparation for the Linux guys to patch the Machines . Shutting down the Grid Infra structure. Time to take a 2hr sleep. |
Time to Relink the software on the two nodes | Starting relink on the first node. Performing steps 9 and following . I will complete all steps needed on the first node and see to it that the Grid Infrastructure is started before moving on to the second node. | |
12 | CHECK IF CRS IS DOWN otherwise REPEAT step 4 | After Returning to the cluster still check if crs is down. Because it is better to be safe then sorry. |
13 | As root: | In order to relink the Grid Infra you have to become the root user again. |
14 | cd /opt/crs/product/11204/crs/bin | as root |
15 | cd /opt/crs/product/11204/crs/crs/install | |
16 | perl rootcrs.pl -unlock | Earlier this night the GI was shutdown for Linux patching. When you perform this perl rootcrs .pl -unlock it will try to shutdown the GI. So in my case i got a message that the system was not able to stop the crs .. |
17 | As the grid infrastructure for a cluster owner: | This was a bit tricky. Cause the owner of the Grid Infra in my case is Oracle so dont try this as root . Better to open a second window as Oracle for the steps below. |
18 | export ORACLE_HOME=/opt/crs/product/11204/crs | As the Oracle user. |
cd /opt/crs/product/11204/crs/bin | As the Oracle user. | |
19 | relink | Relink will also write a relink log which you can tail. |
20 | [Step 1] Log into the UNIX system as the Oracle software owner: | Once the GI software has been relinked it is time for relinking the Oracle Homes( in my case an 11.1 and 11.2. software tree). In my case i logged on as the oracle user. |
21 | [STEP 2] Verify that your $ORACLE_HOME is set correctly: | |
22 | For all Oracle Versions and Platforms, perform this basic environment check first: | |
export $ORACLE_HOME= /opt/oracle/product/11204_ee_64/db | Oracle 11.2.0.4 | |
export $ORACLE_HOME= /opt/oracle/product/111_ee_64/db | Oracle 11.1 | |
cd $ORACLE_HOME | ||
pwd | Check the environment. | |
23 | [Step 3] Verify and/or Configure the UNIX Environment for proper relinking: | |
Set LD_LIBRARY_PATH to include $ORACLE_HOME/lib | LD_LIBRARY_PATH needs to be in place so when relinking both ORACLE versions make sure you set the environment in a correct way. | |
export LD_LIBRARY_PATH=/opt/oracle/product/11204_ee_64/db/lib | ||
echo $LD_LIBRARY_PATH | ||
export LD_LIBRARY_PATH=/opt/oracle/product/111_ee_64/db/lib | ||
echo $LD_LIBRARY_PATH | ||
24 | [Step 4] For all Oracle Versions and UNIX Platforms: | |
Verify that you performed Step 2 correctly: | Check , check and check again | |
env | grep -i LD_ ….make sure that you see the correct absolute path for $ORACLE_HOME in the variable definitions. | ||
25 | [Step 5] For all Oracle Versions and UNIX Platforms: | |
Verify umask is set correctly: | ||
umask | This must return 022. If it does not, set umask to 022. | |
umask 022 | ||
umask | ||
26 | [Step 6] Run the OS Commands to Relink Oracle: | |
Important Notes: | ||
* Before relinking Oracle, shut down both the database and the listener. | ||
* The following commands will output a lot of text to your session window. To capture this output for upload to support, redirect the output to a file. | ||
* If relinking a client installation, it’s expected that some aspects of the following commands will fail if the components were not originally installed. | ||
27 | For all UNIX platforms: | |
Oracle 8.1.X, 9.X.X, 10.X.X or 11.X.X | ||
————————————- | ||
$ORACLE_HOME/bin/relink all | Oracle 11.1 | |
$ORACLE_HOME/bin/relink | oracle 11.2 | |
writing relink log to: /opt/oracle/product/11204_ee_64/db/install/relink.log | ||
28 | How to Tell if Relinking Was Successful: | If relinking was successful, the make command will eventually return to the OS prompt without an error. There will NOT be a ‘Relinking Successful’ type message. I performed a tail on the logfiles as relink was running in a second window and did not see any issues. And as the note says wait for the prompt to return ( with no comments – messages ) and you are good to go |
29 | As root again: | Since i am relinking both the GI and the RDBMS i have moved this step ( starting the GI again till after the RDBMS relinking has finished because of course during the relink of RDBMS the environment ( Databases , listeners ) have to be down ! |
30 | cd /opt/crs/product/11204/crs/crs/install/ | |
31 | perl rootcrs.pl -patch | This perl rotcrs.pl -patch wil also start the cluster on this node again.NOTE we had issues that this was hanging on the first Node . It appeared that the second node was up and running after all ( my Linux Colleague had issued a crsctl disable crs from an old not active cluster-ware software which was still present on the box) . So in this specific scenario on second node i stopped crs again then the script continued on first node. |
32 | crsctl enable crs | If you have used the disable crs . Enable it again so after a node reboot the GI will start. |
33 | As Oracle | |
emctl start agent | Agent was already running so no manual action needed. | |
34 | srvctl start home -o $ORACLE_HOME -s /tmp/statusRDBMS -n mysrvrar | This will start all resources started from 1120.4 home. The resources had been saved previously in the /tmp/statusRDBMS file |
35 | srvctl start instance -d MYDBCM -i MYDBCM1 | Starting the 11.1 Resources. |
srvctl start instance -d MYDBCMAC -i MYDBCMAC1 | ||
36 | srvctl start listener -n mysrvrar -l listener_MYDBCM1 | Starting the 11.1 Resources. |
srvctl start listener -n mysrvrar -l listener_MYDBCMAC1 | ||
37 | As Oracle User on the second node once it is relinked: | |
38 | srvctl start instance -d MYDBCM -i MYDBCM2 | Starting the 11.1 Resources. |
srvctl start instance -d MYDBCMAC -i MYDBCMAC2 | ||
39 | srvctl start listener -n mysrvrbr -l listener_REQMOD2 | Starting the 11.1 Resources. |
srvctl start listener -n mysrvrbr -l listener_MYDBCM2 | ||
srvctl start home -o $ORACLE_HOME -s /tmp/statusRDBMS -n mysrvrbr | ||
crsctl status resource -t | Check your cluster again and compare the result with the status before. Hopefully all resources will appear online online or at least show the situation as it was before . There might be an extra activity if you are using services that have been relocated during the action. In such case you will have to relocate them again to the original location. |