Introduction:
On one of the product ion databases we are using BCVS (Business Continuity Volume) ) in the Emc boxes to make Rman backups. Basically done by putting the source (production) Database into begin backup mode then split the mirror. After the split the source environment is put to end backup mode and the Bcv is being mounted on a different server to make the Rman level backups. It is also important to mention the we are using an Rman catalog database where every source database present has its own Schema with the catalog for that database only.
On the Source database the archives are put to tape.
For a great explain of the concept I recommend following blog by one of my Heroes: Martin Bach:
In my environment i got alarmed that the backup of the control file on the backup server was no longer running. So it was time again to gear up and go out there and investigate .
Details:
On the backup server we use a tailored script to do the level backup , and in that script aso a backup of the control file is included . When I was looking at the log files I see that this Rman (bcv) level backup is doing two things :
- Produce a level backup which is successful ,
- After that a copy of the control file is registered in the catalog and is put to tape . That part is failing all the time now ( even though the log files show the backup as success) .
In the log files on the backup server I see after a successful level backup following error:
RMAN> run { 2> debug off; 3> allocate channel d1 type disk; 4> catalog controlfilecopy '/opt/oracle/admin/backup/MYDB1/backup_controlfile_MYDB1_2014_02_01_23:20.bck'; 5> release channel d1; 6> allocate channel t1 type 'SBT_TAPE'; 7> send 'NSR_ENV=(NSR_SERVER=MYNSR_SERVER,NSR_CLIENT=MYDB_SERVER, NSR_DATA_VOLUME_POOL=REP#0032#G02#LTO3, NSR_GROUP=DAT_0032_G02_ LTO3_TSR_local_01, NSR_SAVESET_BROWSE="0032 Days", NSR_SAVESET_RETENTION="0032 Days")'; 8> backup 9> format 'ctrl_level1_%d_201402012208_3605724822_%p_%U' 10> controlfilecopy '/opt/oracle/admin/backup/MYDB1/backup_controlfile_MYDB1_2014_02_01_23:20.bck'; 11> release channel t1; 12> } Debugging turned off allocated channel: d1 channel d1: sid=593 devtype=DISK cataloged control file copy control file copy filename=/opt/oracle/admin/backup/MYDB1/backup_controlfile_MYDB1_2014_02_01_23:20.bck recid=1280 stamp=83 8423230 released channel: d1 allocated channel: t1 channel t1: sid=593 devtype=SBT_TAPE channel t1: NMO v5.0.0.0 sent command to channel: t1 Starting backup at 01-FEB-14 released channel: t1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of backup command at 02/01/2014 23:20:35 RMAN-06004: ORACLE error from recovery catalog database: RMAN-20220: control file copy not found in the recovery catalog RMAN-06090: error while looking up control file copy: /opt/oracle/admin/backup/MYDB1/backup_controlfile_MYDB1_2014_02_01_23:20.bck
At first moment I don’t see it why this is failing so it is under investigation but it already feels like a nice puzzle.
When I checked the production side to get a complete overview that was a shock to me. Because in the log files of the Archive backups i found a clue what was going on but i also noticed that the backups where failing . In the log files I saw:
... released channel: d1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of crosscheck command at 02/02/2014 23:23:30 RMAN-06004: ORACLE error from recovery catalog database: RMAN-20011: target database incarnation is not current in recovery catalog
Hmm wait a minute did we just see a valuable clue to solve the case ?
So After that i started investigating after surfing the web. After connecting to the catalog ( schema) in the Rman database I started following query:
SQL> SELECT name, DBID, RESETLOGS_TIME FROM rc_database; NAME DBID RESETLOGS_TIME -------- ---------- ------------------- MYDB1 3605724822 31.01.2014 07:05:51
That was making me frown because in the catalog it was known that there had been a resetlog ???
Then I checked the incarnation:
SQL> SELECT dbid, name, dbinc_key, resetlogs_change#, resetlogs_time FROM rc_database_incarnation; DBID NAME DBINC_KEY RESETLOGS_CHANGE# RESETLOGS_TIME ---------- -------- ---------- ----------------- ------------------- 3605724822 MYDB1 2 7.9895E+12 21.07.2009 17:18:42 3605724822 MYDB1 73 1 04.04.2008 15:11:18 3605724822 MYDB1 74 2461875 19.05.2008 16:28:20 3605724822 MYDB1 2553841 1.0587E+13 31.01.2014 07:05:51
Not good at all Cause apparently the Catalog had other information ( showing a reset logs ) then the production database ( which had its last reset logs somewhere back in 2008 ) . And that was also the information I saw when I queried the production database ( v$database view is showing information about a reset logs in
RESETLOGS_CHANGE# | NUMBER | System change number (SCN) at open resetlogs |
RESETLOGS_TIME | DATE | Timestamp of open resetlogs |
In Rman environment the command list incarnation only returned an empty line and a prompt which was unexpected too.
After giving it some thought and consulting a colleague I decided to stick to the easy scenario where:
- I performed an export of the Rman schema for that Database on the Rman Catalog database server and I will keep that export the upcoming 4 weeks for when an old restore would be needed.
- I dropped the Rman user for that specific database in the Rman catalog database ( drop user rman_Mydb1 cascade).
- I registered the database again as a new Rman_schema in the Rman catalog database .
## After that it was time to check things>
rman TARGET / RCVCAT rman_MYDB1/*****@RMAN Recovery Manager: Release 10.2.0.3.0 - Production on Mon Feb 3 17:15:36 2014 Copyright (c) 1982, 2005, Oracle. All rights reserved. connected to target database: MYDB1 (DBID=3605724822) connected to recovery catalog database RMAN> list incarnation; List of Database Incarnations DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time ------- ------- -------- ---------------- --- ---------- ---------- 1 73 MYDB1 3605724822 PARENT 1 04.04.2008 15:11:18 1 74 MYDB1 3605724822 PARENT 2461875 19.05.2008 16:28:20 1 2 MYDB1 3605724822 CURRENT 7989501317585 21.07.2009 17:18:42
Looked OK to me. After that I ran an Archive backup and i checked the result of the scheduled Rman level backup the very next day. Once again the concept worked as a charme so happy dba again.
Happy reading,
Mathijs