When Rman shows errors like RMAN-20220 and RMAN-06004

Introduction:

On one of the product ion databases we are using BCVS  (Business Continuity Volume) ) in the Emc boxes to make  Rman backups.  Basically done by putting  the source (production) Database into begin backup mode then split the mirror. After the split the source environment is put to end backup mode and the Bcv is being mounted on a different server to make the Rman level backups.  It is also important to mention the we are using an Rman catalog  database where every source database present has its own Schema with the catalog for that database only.

On the Source database  the archives are put to tape.

For a great explain of the concept I recommend following blog  by one of my Heroes:  Martin Bach:

http://martincarstenbach.wordpress.com/2011/05/24/offloading-production-backups-to-a-different-storage-array/

In my environment  i got alarmed that the backup of the control file on the backup server was no longer  running. So it was time again to gear up and go out there and investigate .

Details:

On the backup server we use a tailored script to do the level backup , and in that script aso a backup of the control file  is included .  When I was looking at the log files I see that this Rman (bcv)  level backup is  doing two things :

  • Produce a level backup which is successful ,
  • After that  a copy of the control file is registered in the catalog and is put to  tape . That part is failing all the time now ( even though  the log files show the  backup as success) .

In the log files on the backup server I see after a successful level backup  following error:

 RMAN> run {
2> debug off;
3> allocate channel d1 type disk;
4> catalog controlfilecopy '/opt/oracle/admin/backup/MYDB1/backup_controlfile_MYDB1_2014_02_01_23:20.bck';
5> release channel d1;
6> allocate channel t1  type  'SBT_TAPE';
7> send 'NSR_ENV=(NSR_SERVER=MYNSR_SERVER,NSR_CLIENT=MYDB_SERVER, NSR_DATA_VOLUME_POOL=REP#0032#G02#LTO3, NSR_GROUP=DAT_0032_G02_
LTO3_TSR_local_01, NSR_SAVESET_BROWSE="0032 Days", NSR_SAVESET_RETENTION="0032 Days")';
8> backup
9> format 'ctrl_level1_%d_201402012208_3605724822_%p_%U'
10> controlfilecopy '/opt/oracle/admin/backup/MYDB1/backup_controlfile_MYDB1_2014_02_01_23:20.bck';
11> release channel t1;
12> }
Debugging turned off

allocated channel: d1
channel d1: sid=593 devtype=DISK

cataloged control file copy
control file copy filename=/opt/oracle/admin/backup/MYDB1/backup_controlfile_MYDB1_2014_02_01_23:20.bck recid=1280 stamp=83
8423230

released channel: d1

allocated channel: t1
channel t1: sid=593 devtype=SBT_TAPE
channel t1: NMO v5.0.0.0

sent command to channel: t1

Starting backup at 01-FEB-14
released channel: t1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 02/01/2014 23:20:35
RMAN-06004: ORACLE error from recovery catalog database: RMAN-20220: control file copy not found in the recovery catalog
RMAN-06090: error while looking up control file copy: /opt/oracle/admin/backup/MYDB1/backup_controlfile_MYDB1_2014_02_01_23:20.bck

At  first moment I don’t see it why this is failing so it is under investigation but it already feels like a nice puzzle.

When I checked the production side  to get a complete  overview  that was a shock to me. Because in the  log files of the Archive backups i found a clue what was going on but i also noticed that the backups where failing . In the log files I saw:

...
released channel: d1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of crosscheck command at 02/02/2014 23:23:30
RMAN-06004: ORACLE error from recovery catalog database: RMAN-20011: target database incarnation is not current in recovery catalog

Hmm wait a minute did we just see a valuable clue to  solve the case ?

So After that i started investigating after surfing the web. After connecting to the catalog ( schema) in the Rman database I started following query:

SQL> SELECT name, DBID, RESETLOGS_TIME FROM rc_database;
NAME           DBID RESETLOGS_TIME
-------- ---------- -------------------
MYDB1   3605724822 31.01.2014 07:05:51

That was making me frown  because in the catalog it was known that there had been a resetlog ???

Then I checked the incarnation:

SQL> SELECT dbid, name, dbinc_key, resetlogs_change#, resetlogs_time FROM rc_database_incarnation;
DBID NAME      DBINC_KEY RESETLOGS_CHANGE# RESETLOGS_TIME
---------- -------- ---------- ----------------- -------------------
3605724822 MYDB1            2        7.9895E+12 21.07.2009 17:18:42
3605724822 MYDB1           73                 1 04.04.2008 15:11:18
3605724822 MYDB1           74           2461875 19.05.2008 16:28:20
3605724822 MYDB1      2553841        1.0587E+13 31.01.2014 07:05:51

Not good at  all  Cause apparently the Catalog  had other information (  showing  a reset logs )  then the production database  ( which had its last reset logs  somewhere back in 2008 ) . And that was  also  the information I saw when I queried the production database ( v$database view is showing  information about a reset logs in

RESETLOGS_CHANGE# NUMBER System change number (SCN) at open resetlogs
RESETLOGS_TIME DATE Timestamp of open resetlogs

In Rman environment   the command list incarnation only returned an empty  line and a prompt which was unexpected too.

After giving it some thought and consulting a colleague I decided to stick to the easy scenario where:

  • I performed an export of the Rman schema for that Database on the Rman Catalog database server and  I will keep that export the upcoming  4 weeks for when an old restore would be needed.
  • I dropped the Rman  user  for that specific database in the Rman catalog database ( drop user  rman_Mydb1 cascade).
  • I registered the database again as a new Rman_schema in the Rman catalog database .

## After that it was time to check things>

rman TARGET  / RCVCAT rman_MYDB1/*****@RMAN
Recovery Manager: Release 10.2.0.3.0 - Production on Mon Feb 3 17:15:36 2014
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
connected to target database: MYDB1 (DBID=3605724822)
connected to recovery catalog database
RMAN> list incarnation;
List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       73      MYDB1   3605724822       PARENT  1          04.04.2008 15:11:18
1       74      MYDB1   3605724822       PARENT  2461875    19.05.2008 16:28:20
1       2       MYDB1   3605724822       CURRENT 7989501317585 21.07.2009 17:18:42

Looked  OK to me. After that  I ran an Archive backup and i checked the result of the scheduled Rman level backup the very next day.   Once again the concept worked as a charme  so happy dba again.

Happy reading,

Mathijs

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s