Are we or Are we not going to relink Oracle software after A Linux Upgrade

Introduction

Recently i have talked to  my managers with this one big topic on my mind . The Linux department is rapidly patching their servers to higher levels of Linux and they ask for support during and after them patching the OS. From management side I needed clear directive  about Oracle team following those activities closely. With Oracle support  I have started to gather information about the should or should not  relink the software . Experience will tell which way was the right one.  Cause of course there is also risks involved when relinking the Oracle  software . Examples from past seem to tell that relink brought trouble with it too.

As a central note i was given  Mos Note Is It Necessary To Relink Oracle Following OS Upgrade? (Doc ID 444595.1)

In there you see two great references :

Note 883299.1 Oracle 11gR2 Relink New Feature

and

Note 220970.1 RAC: Frequently Asked Questions
Note 284785.1 How to Check Whether Oracle Binary/Instance is RAC Enabled and Relink Oracle Binary in RAC

From the rac note ) 220970.1  this is a interesting quote as with regard to  the Grid infra structure environment:

Do I need to relink the Oracle Clusterware / Grid Infrastructure home after an OS upgrade?

Using Oracle Clusterware 10g and 11.1, Oracle Clusterware binaries cannot be relinked. However, the client shared libraries, which are part of the home can be relinked, in most cases there should not be a need to relink them. See Note:743649.1 for more information. 
 
 Using Oracle Grid Infrastructure 11.2 and higher, there are some executables in the Grid home that can and should be relinked after an OS upgrade. The following steps describe how to relink an Oracle Grid Infrastructure for Clusters home: 
 
 As root:
 
 # cd Grid_home/crs/install
 # perl rootcrs.pl -unlock
 
 As the grid infrastructure for a cluster owner:
 
 $ export ORACLE_HOME=Grid_home
 $ Grid_home/bin/relink
 
 As root again:
 
 # cd Grid_home/crs/install
 # perl rootcrs.pl -patch 
 
 Note: If using Oracle Grid Infrastructure for Standalone Environments (Oracle Restart), see the Oracle Documentation for more information: http://docs.oracle.com/cd/E11882_01/install.112/e24321/oraclerestart.htm#BABJCHIJ

So should you relink ? Well frankly I would follow recommends  of Oracle in this matter  with the idea that it needs to be done on low priority boxes like test or preprod first before touching production . And should you decide not to relink due to experiences in the past , well i would recommend to be standby once the servers are patched Os wise and the grid infra structure  and databases are starting .

 

happy reading,

 

Mathijs

Cloning your 11.2 Oracle home from an existing server to a new server

Introduction:

At the moment I am reading a great book on 12c (Pro Oracle Database 12c administration) by Darl Kuhn (http://www.apress.com/9781430257288) . And I really like his style and elaborations on topics . While i was  reading it is like being held a mirror. Indeed most times I would also use the runInstaller interactively to run the installation of the Oracle binaries and yes would even frown if I would not be allowed to use a xterm to do installations. Well Darl you gave me an inspiration for set up of environment i need shortly.  So this is a big applause for him already!

Details:

For a data migration environment i am asked to set up Oracle on a Linux server identical to the production servers that are part of the scope of the project.  After installation I will be working with BCVS to bring copies of a number of databases to this box. The production servers are 11.2.2.0 with  one-off patches and a psu patch ..  And as an important add-on information i have installed the Grid Infrastructure software already on the new box so i have Oracle Restart with an ASM instance running. ( that also means that  i have an OraInst location and an existing Oracle Inventory in place too). So which way would i head.

  1. Install Oracle binaries from scratch , and apply all the patches
  2. Clone  the Rdbms software from a production box.

Since i got time i investigated both options. But in the end i loved the scenario 2 below because I always had done the normal interactive way installation:

First $ORACLE_HOME was checked:

$ echo $ORACLE_HOME
/opt/oracle/product/112_ee_64/db

Next step was to move to the ORACLE_HOME

$ cd $ORACLE_HOME

Almost there. Moved back  ( up ) one level ( so I was in /opt/oracle/product/112_ee_64 directory).

$ cd ..

Was aiming  for the tar utility to copy everything in the subdirectory db (and below)  in the  /opt/oracle/product/112_ee_64/.

Note if you get error messages of files not being included ( happened to me too make sure you do a nohup  tar -cvf myorahome.tar>Mylogfile.lst) Most likely you either ran out of space or did not have enough privileges to run the tar for all files ( so i ran  tar as root in the end).

$ tar -cvf myorahome.tar db
  • tar -cvf is the command  to run with the options -cvf
  • myorahome.tar will be holding all the files in all the subdirectories below the /opt/oracle/product/112_ee_64/
  • db is the subdirectory ( part of my ORACLE_HOME) which I need to be added to the tar file.

That created for a full-sized Rdbms Installation with all patches included a Tarred file  of about  5G. After that  myorahome.tar file was copied to the new server with scp.

The tar file was copied to the already existing /opt/oracle/product/112_ee_64 on the new machine. There the tar file was extracted and the extract created the subdirectory db again ( and all others below 2) as part of the extract:

$ cd /opt/oracle/product/112_ee_64
Note: make sure you have plenty space on the new server ( better check that before extraction.
Next, extract the files:
$ tar -xvf myorahome.tar

Next step was needed to include, to register this cloned installation in the Oracle Inventory. In my case I used the runInstaller inside the new Oracle home. First to make sure that the new Oracle Home was not known in the Inventory by detaching it:

./runInstaller -detachHome ORACLE_HOME="/opt/oracle/product/112_ee_64/db"

Then I added the cloned software to the Inventory:

./runInstaller -silent -ignoreSysPrereqs -attachHome ORACLE_HOME="/opt/oracle/product/112_ee_64/db" ORACLE_HOME_NAME="OraDb11g_home1" LOCAL_NODE="" CLUSTER_NODES="

But of course the proof of the pudding is in tasting it once this had finished:

So  it was needed to check with Opatch if the Orainventory now was aware about all installed patches.

export PATH=/opt/oracle/product/112_ee_64/db/OPatch:$PATH

I checked  opatch was the correct one:

which opatch

Then I checked the patches with the command below:

opach lsinventory -oh /opt/oracle/product/112_ee_64/db

And it showed all the correct patches . So indeed I had cloned the Oracle software in a correct way to this machine . Yah ! And a  big thank you to Darl  for a great book which pointed me in the right direction.

Happy reading,

Mathijs

The dreaded ORA-12154: TNS:could not resolve the connect identifier specified.

Introduction

Actually  I wanted to start this one with  … and a funny thing happened on the way to the circus , but lets save that one for another occasion okay ? Last week  I got a mail from one of the users  who tried to connect  via ezconnect to an existing database from new client that i had set up for him.  AS always a challenge to see what is going on and of course it takes time to find out the real deal.

Details:

## This case occurred in a 11.2 environment on one of the test boxes . They were trying to connect via ezconnect to one of the existing databases which failed:

 [10:30:23] [ INFO] SQL Runner: Starting runing script on database SCOTT/SCOTT@MYSRVR1:1521/MYDB1
[10:30:26] [ INFO] INPUT> ERROR:
[10:30:26] [ INFO] INPUT> ORA-12154: TNS:could not resolve the connect identifier specified
[10:30:26] [ INFO] INPUT>
[10:30:26] [ INFO] INPUT>
[10:30:26] [ INFO] INPUT> SP2-0306: Invalid option.
[10:30:26] [ INFO] INPUT> Usage: CONN[ECT] [{logon|/|proxy} [AS {SYSDBA|SYSOPER|SYSASM}] [edition=value]]
[10:30:26] [ INFO] INPUT> where  ::= [/][@<connect_identifier>]
[10:30:26] [ INFO] INPUT>        ::= [][/][@<connect_identifier>]
[10:30:26] [ INFO] INPUT> ERROR:
[10:30:26] [ INFO] INPUT> ORA-12162: TNS:net service name is incorrectly specified
[10:30:26] [ INFO] INPUT>
[10:30:26] [ INFO] INPUT>

##However a  Tnsping  was working correctly

tnsping MYDB1
TNS Ping Utility for Linux: Version 11.2.0.3.0 - Production on 10-FEB-2014 10:31:14
 Copyright (c) 1997, 2011, Oracle.  All rights reserved.
 Used parameter files:
/opt/SP/STORAGE/TNS_ADMIN/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION=(ENABLE=BROKEN)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=MYSRVR1.vfnl.dc-ratingen.de)(PORT = 1521)))(CONNECT_DATA=(SERVICE_NAME=MYDB1.test.nl)))
OK (130 msec)

## Since this was a 11.2 test environment I was able to play a bit with the environment. What puzzled me most  was that  a different database showed both the short service name and the fully qualified name of the service ( with the domain name in it ) in the listener. And when added the short  service name  MYDB1  next to the qualified service name  MYDB1.test.nl  listener would still not pick up both  services , even when I read  that pmon was supposed to register the services automatically  in frequent intervals (60 seconds) .

Added some entries to my tnsnames.ora and started testing . And indeed the full service name worked  and the short service refused to !  Performed the test that I bounced the database ( again this was test so not that much harm done) with no better effect . Even the restart of listener  did not bring the solution.

So it was clear that  I needed to see what was different between the two environments  since I had one other database automatically registered in the listener  with both the services I was looking for.

Bottom-line  after investigation is it works now after the restart of the database and setting some parts different.  🙂   Lets check .

Oh and I performed three  actions in the database  to make it work. And yes the database and not the listener cause  the 11.2 environment lets the database register the services automatically with a listener  (well as long as one plays by the rules):

##First I  added the short service name to the database ( this is not a Rac environment so I did not set up a service in the clusterware using srvctl  ( and ok  I admit it i tried and clusterware replied that you cannot add a service with the same name as the database).

T his is how my services look now:

SQL> show parameter  service

NAME                                                      TYPE VALUE
------------------------------------ ----------- ------------------------------
service_names                                      string               MYDB1.test.nl, MYDB1

##Made sure that if the local listener is in place it is pointing to the correct listener

## Just wanted to make sure that  the correct listener would be used so in this test i added both my listeners:

SQL>show parameter  listener
NAME                                                      TYPE VALUE
------------------------------------ ----------- ------------------------------
listener_networks                                string
local_listener                                         string               (DESCRIPTION=(ADDRESS_LIST=(AD
                                                                                          DRESS=(PROTOCOL=TCP)(HOST=195.
                                                                                          233.124.139)(PORT=1522))(ADDRE
                                                                                          SS=(PROTOCOL=TCP)(HOST=195.233
                                                                                          .124.139)(PORT=1521))))
remote_listener                                    string

## And yes the  Domain was set:

SQL> show parameter domain
 NAME                                                      TYPE VALUE
------------------------------------ ----------- ------------------------------
db_domain                                             string               test.nl

## The environment that was working did not have set the db_domain parameter so in this test I removed it too:  It is now

SQL> show parameter domain
 NAME                                                      TYPE VALUE
------------------------------------ ----------- ------------------------------
db_domain                                             string

Now the listener ,  it is showing both services :

 Service "MYDB1" has 1 instance(s).
  Instance "MYDB1", status READY, has 6 handler(s) for this service...
Service "MYDB1.test.nl" has 1 instance(s).
  Instance "MYDB1", status READY, has 6 handler(s) for this service...

I tested it myself with two entries in tnsnames ( short name and long one and both work

MYDB1MBK1 =
   (DESCRIPTION =
     (ADDRESS_LIST =
       (ADDRESS = (PROTOCOL = TCP)(HOST = MYSRVR1.vfnl.dc-ratingen.de)(PORT = 1522))
     )
     (CONNECT_DATA =
       (SERVICE_NAME = MYDB1.test.nl)
     )
   )

 MYDB1MBK2 =
   (DESCRIPTION =
     (ADDRESS_LIST =
       (ADDRESS = (PROTOCOL = TCP)(HOST = MYSRVR1.vfnl.dc-ratingen.de)(PORT = 1522))
     )
     (CONNECT_DATA =
       (SERVICE_NAME = MYDB1)
     )

This workaround worked so I informed customer to test it.  Will have to test on another environment the effect if i keep db_domain ( and of course db_unique_name) and leave the services to blank.

Happy reading ,

Mathijs

When Rman shows errors like RMAN-20220 and RMAN-06004

Introduction:

On one of the product ion databases we are using BCVS  (Business Continuity Volume) ) in the Emc boxes to make  Rman backups.  Basically done by putting  the source (production) Database into begin backup mode then split the mirror. After the split the source environment is put to end backup mode and the Bcv is being mounted on a different server to make the Rman level backups.  It is also important to mention the we are using an Rman catalog  database where every source database present has its own Schema with the catalog for that database only.

On the Source database  the archives are put to tape.

For a great explain of the concept I recommend following blog  by one of my Heroes:  Martin Bach:

http://martincarstenbach.wordpress.com/2011/05/24/offloading-production-backups-to-a-different-storage-array/

In my environment  i got alarmed that the backup of the control file on the backup server was no longer  running. So it was time again to gear up and go out there and investigate .

Details:

On the backup server we use a tailored script to do the level backup , and in that script aso a backup of the control file  is included .  When I was looking at the log files I see that this Rman (bcv)  level backup is  doing two things :

  • Produce a level backup which is successful ,
  • After that  a copy of the control file is registered in the catalog and is put to  tape . That part is failing all the time now ( even though  the log files show the  backup as success) .

In the log files on the backup server I see after a successful level backup  following error:

 RMAN> run {
2> debug off;
3> allocate channel d1 type disk;
4> catalog controlfilecopy '/opt/oracle/admin/backup/MYDB1/backup_controlfile_MYDB1_2014_02_01_23:20.bck';
5> release channel d1;
6> allocate channel t1  type  'SBT_TAPE';
7> send 'NSR_ENV=(NSR_SERVER=MYNSR_SERVER,NSR_CLIENT=MYDB_SERVER, NSR_DATA_VOLUME_POOL=REP#0032#G02#LTO3, NSR_GROUP=DAT_0032_G02_
LTO3_TSR_local_01, NSR_SAVESET_BROWSE="0032 Days", NSR_SAVESET_RETENTION="0032 Days")';
8> backup
9> format 'ctrl_level1_%d_201402012208_3605724822_%p_%U'
10> controlfilecopy '/opt/oracle/admin/backup/MYDB1/backup_controlfile_MYDB1_2014_02_01_23:20.bck';
11> release channel t1;
12> }
Debugging turned off

allocated channel: d1
channel d1: sid=593 devtype=DISK

cataloged control file copy
control file copy filename=/opt/oracle/admin/backup/MYDB1/backup_controlfile_MYDB1_2014_02_01_23:20.bck recid=1280 stamp=83
8423230

released channel: d1

allocated channel: t1
channel t1: sid=593 devtype=SBT_TAPE
channel t1: NMO v5.0.0.0

sent command to channel: t1

Starting backup at 01-FEB-14
released channel: t1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 02/01/2014 23:20:35
RMAN-06004: ORACLE error from recovery catalog database: RMAN-20220: control file copy not found in the recovery catalog
RMAN-06090: error while looking up control file copy: /opt/oracle/admin/backup/MYDB1/backup_controlfile_MYDB1_2014_02_01_23:20.bck

At  first moment I don’t see it why this is failing so it is under investigation but it already feels like a nice puzzle.

When I checked the production side  to get a complete  overview  that was a shock to me. Because in the  log files of the Archive backups i found a clue what was going on but i also noticed that the backups where failing . In the log files I saw:

...
released channel: d1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of crosscheck command at 02/02/2014 23:23:30
RMAN-06004: ORACLE error from recovery catalog database: RMAN-20011: target database incarnation is not current in recovery catalog

Hmm wait a minute did we just see a valuable clue to  solve the case ?

So After that i started investigating after surfing the web. After connecting to the catalog ( schema) in the Rman database I started following query:

SQL> SELECT name, DBID, RESETLOGS_TIME FROM rc_database;
NAME           DBID RESETLOGS_TIME
-------- ---------- -------------------
MYDB1   3605724822 31.01.2014 07:05:51

That was making me frown  because in the catalog it was known that there had been a resetlog ???

Then I checked the incarnation:

SQL> SELECT dbid, name, dbinc_key, resetlogs_change#, resetlogs_time FROM rc_database_incarnation;
DBID NAME      DBINC_KEY RESETLOGS_CHANGE# RESETLOGS_TIME
---------- -------- ---------- ----------------- -------------------
3605724822 MYDB1            2        7.9895E+12 21.07.2009 17:18:42
3605724822 MYDB1           73                 1 04.04.2008 15:11:18
3605724822 MYDB1           74           2461875 19.05.2008 16:28:20
3605724822 MYDB1      2553841        1.0587E+13 31.01.2014 07:05:51

Not good at  all  Cause apparently the Catalog  had other information (  showing  a reset logs )  then the production database  ( which had its last reset logs  somewhere back in 2008 ) . And that was  also  the information I saw when I queried the production database ( v$database view is showing  information about a reset logs in

RESETLOGS_CHANGE# NUMBER System change number (SCN) at open resetlogs
RESETLOGS_TIME DATE Timestamp of open resetlogs

In Rman environment   the command list incarnation only returned an empty  line and a prompt which was unexpected too.

After giving it some thought and consulting a colleague I decided to stick to the easy scenario where:

  • I performed an export of the Rman schema for that Database on the Rman Catalog database server and  I will keep that export the upcoming  4 weeks for when an old restore would be needed.
  • I dropped the Rman  user  for that specific database in the Rman catalog database ( drop user  rman_Mydb1 cascade).
  • I registered the database again as a new Rman_schema in the Rman catalog database .

## After that it was time to check things>

rman TARGET  / RCVCAT rman_MYDB1/*****@RMAN
Recovery Manager: Release 10.2.0.3.0 - Production on Mon Feb 3 17:15:36 2014
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
connected to target database: MYDB1 (DBID=3605724822)
connected to recovery catalog database
RMAN> list incarnation;
List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       73      MYDB1   3605724822       PARENT  1          04.04.2008 15:11:18
1       74      MYDB1   3605724822       PARENT  2461875    19.05.2008 16:28:20
1       2       MYDB1   3605724822       CURRENT 7989501317585 21.07.2009 17:18:42

Looked  OK to me. After that  I ran an Archive backup and i checked the result of the scheduled Rman level backup the very next day.   Once again the concept worked as a charme  so happy dba again.

Happy reading,

Mathijs