The good the bad and the corrupted (ORA-01578, ORA-26040, ORA-01110)

Introduction:

 

At the moment I am part of a team that will upgrade a billing environment from 10.2.0.5 to 11.2.0.4 on Solaris. After a restore and point in time  recovery  on a preproduction environment the team has upgraded all databases using the manual upgrade method ( still not that confident that the upgrades would run as smooth with the dbua….. ). As always after the upgrade there is a time of hyper care where the alert files should be monitored even more closely for funky issues. This blog is giving you some more details on the funky stuff we came across and how we analyzed it. The one thing that did make my frown during this action however was the fact that various tools showed various results ( not necessarily showing all the same results.

Conclusions:

When upgrading  important environments a DBA  should really have eyes and  ears everywhere before performing that upgrade because once upgraded   it will be a tough call to proof that something was broken in the old environment already or that it was caused by the upgrade. So I would really recommend to scrutinize  your environment like someone studies a creature under a microscope before upgrading questioning broken jobs , invalid objects and as  I learned the hard way during this upgrade .. objects with no logging option (DBA_TABLES, DBA_INDEXES, DBA_LOBS, DBA_TAB_PARTITIONS, DBA_LOB_PARTITIONS, DBA_TAB_SUBPARTITIONS, etc.
LOGGING=’NO’ indicates NOLOGGING. ).  In my case e a number of objects had been created with the no logging option  and indeed only after  upgrading I had to fight my way back to find out what was going on , what objects  were involved ( in my case  it was related to a number of objects  , staging tables for a specific user  where a drop  and recreate and exp/imp dealt with it). I  did recommend to make sure that there will be force logging for the complete database ( but was told that customer wants to be able to create objects with no logging option  for performance reasons).

As I mentioned in the blog  in the End  I was plain lucky that this  was caused by  staging tables  which made the scenario more easy since we could  drop and recreate them easily. Once again I learned something ( and found proof again that success  just loves preparation). Of course this is a slightly different scenario  then the one we will run in production since we will  run the Upgrade directly but still would need to bear in mind the objects in production  created with the no logging option if one of the databases in scope would need a restore if the upgrade would break the database. Well in  this case all databases where upgraded with success  so all DBAS  happy and confident for the production migration which is scheduled.

Details:

SERVER: mysrvr

Database: OPTMYDB

Timeline:

  • All Databases of the billing environment have been restored  to the new OPT server called mysrvr.
  • We gave upgraded all Databases from 10.2.0.5 to 11.2.0.4 on Solaris.
  • After opening  the database OPTMYDB  and running statistics jobs she started throwing  error messages in alert about block corruptions  as seen in Example 1 below. Yesterday together with a colleague from Adba  team the gather statistics issue was resolved  and three objects had been identified to have corrupt blocks ( see Example2) .  These three objects have been dropped and recreated and there were no more corruption messages in the alert file.
  • After that I have been asked by my manager to run a dbv for the full database checking all datafiles .
  • I also have used rman in Parallel to find possible corruption  which is still running but first output is in example3 :

Example 1

## In  the alertfile this was the information We came across setting off the first alarm bells:

Mon Aug 04 22:00:43 2014

DBMS_STATS: GATHER_STATS_JOB encountered errors.  Check the trace file.

Errors in file /opt/oracle/OPTMYDB/diag/rdbms/MYDB/OPTMYDB/trace/OPTMYDB_j002_3640.trc:

ORA-20011: Approximate NDV failed: ORA-29913: error in executing ODCIEXTTABLEOPEN callout

ORA-29400: data cartridge error

error opening file /home/tmcon/oracle/UNMATCHED_SUBS_3640.log

Errors in file /opt/oracle/OPTMYDB/diag/rdbms/MYDB/OPTMYDB/trace/OPTMYDB_j002_3640.trc  (incident=8665):

ORA-01578: ORACLE data block corrupted (file # 620, block # 391690)

ORA-01110: data file 620: ‘/db/MYDB/data/tbsdata_e/tbsdip01_4m_14.dbf’

ORA-26040: Data block was loaded using the NOLOGGING option

Incident details in: /opt/oracle/OPTMYDB/diag/rdbms/MYDB/OPTMYDB/incident/incdir_8665/OPTMYDB_j002_3640_i8665.trc

 

Example2

I have grepped  the ORA-01578 from the alert file and added them to a table to get an overview of datafiles and blocks having the issues. And since the alertfile is a growing  objects this is what I got over t he time interval since starting up the database as a 11020.4 environment.

grep ORA-01578 alert_OPTMYDB.log AFN BL
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946

 

## On mos I retrieved a note that offered more information on the object that was located around the corrupted block:

select file_id AFN, relative_fno, tablespace_name
from dba_data_files
where relative_fno=&RFN;
Enter value for rfn: 620
old   3: where relative_fno=&RFN
new   3: where relative_fno=620
       AFN RELATIVE_FNO TABLESPACE_NAME
———- ———— ——————————
      3689          620 UNDO_TBS_01
       620          620 TBSDIP01_4M
##  Used this statement to find the object that was on that spot in the tablespace:
select *
from dba_extents
where file_id = &AFN 620
and &BL between block_id AND block_id + blocks – 1; 391690
620
392202
SQL> select *
from dba_extents
where file_id = &AFN
and &BL between block_id AND block_id + blocks – 1;
  2    3    4  Enter value for afn: 620
old   3: where file_id = &AFN
new   3: where file_id = 620
Enter value for bl: 391690
old   4: and &BL between block_id AND block_id + blocks – 1
new   4: and 391690 between block_id AND block_id + blocks – 1
OWNER
——————————
SEGMENT_NAME
——————————————————————————–
PARTITION_NAME                 SEGMENT_TYPE       TABLESPACE_NAME
—————————— —————— ——————————
EXTENT_ID    FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
———- ———- ———- ———- ———- ————
SCOTT
SCOTT_BPS_05062014000_NOBP
                               TABLE              TBSDIP01_4M
         0        620     391689    4194304        512          620
SQL> select *
from dba_extents
where file_id = &AFN
and &BL between block_id AND block_id + blocks – 1;
  2    3    4  Enter value for afn: 620
old   3: where file_id = &AFN
new   3: where file_id = 620
Enter value for bl: 392202
old   4: and &BL between block_id AND block_id + blocks – 1
new   4: and 392202 between block_id AND block_id + blocks – 1
OWNER
——————————
SEGMENT_NAME
——————————————————————————–
PARTITION_NAME                 SEGMENT_TYPE       TABLESPACE_NAME
—————————— —————— ——————————
EXTENT_ID    FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
———- ———- ———- ———- ———- ————
SCOTT
SCOTT_BPS_05062014000_NORUEB
                               TABLE              TBSDIP01_4M
         0        620     392201    4194304        512          620
SQL> select file_id AFN, relative_fno, tablespace_name
from dba_data_files
where relative_fno=&RFN;
  2    3  Enter value for rfn: 661
old   3: where relative_fno=&RFN
new   3: where relative_fno=661
       AFN RELATIVE_FNO TABLESPACE_NAME
———- ———— ——————————
       661          661 TBSDIP01_4M
select *
from dba_extents
where file_id = &AFN
and &BL between block_id AND block_id + blocks – 1; 661 423946
OWNER
——————————
SEGMENT_NAME
——————————————————————————–
PARTITION_NAME                 SEGMENT_TYPE       TABLESPACE_NAME
—————————— —————— ——————————
EXTENT_ID    FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
———- ———- ———- ———- ———- ————
SCOTT
SCOTT_SII_05062014000R_BAN
                               TABLE              TBSDIP01_4M
         0        661     423945    4194304        512          661

 

Example3

On Mos  I also retrieved a note that offered RMAN as a tool to find  logical corruption. Since I am curious by nature of course had to use that goodie too !

 

RMAN> run {
allocate channel d1 type disk;
allocate channel d2 type disk;
allocate channel d3 type disk;
allocate channel d4 type disk;
backup validate check logical database;
}

 

##  Running this in Rman will put rows if any in the v$database_block_corruption.  When i asked for more details on this view i was told that it should return to 0 rows when rerunning.  I restarted the rman process after some time  (so maybe since it did not complete it did not wrap up properly but at second run I noticed still rows present there).

SQL> select * from V$DATABASE_BLOCK_CORRUPTION

 

FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO

———- ———- ———- —————— ———

235      34825         92         1.3760E+13 NOLOGGING

235     323083        510         1.3760E+13 NOLOGGING

548     315914        833         1.3760E+13 NOLOGGING

648     148489        512         1.3760E+13 NOLOGGING

 

## Well the good thing was that this showed in one view what most likely was  the cause of the issue : Objects being created  with nologging option  thus invalidating a 100 %  recovery after a restore  ( and recovery of a number of archives ) and upgrade to 11G after that.

From Oracle support I got this as a good note as it turned out  that the corruption was caused  by no logging operations:

The Gains and Pains of Nologging Operations in a Data Guard Environment (Doc ID 290161.1)

ORA-1578 / ORA-26040 Corrupt blocks by NOLOGGING – Error explanation and solution (Doc ID 794505.1)

A quote from this Note really made me both frown and giggle:

…… Begin Quote

Note that the data inside the affected blocks is not salvageable. Methods like “Media Recovery” or “RMAN blockrecover” will not fix the problem unless the data file was backed up after the NOLOGGING operation was registered in the Redo Log.

…… END  Quote

I also learned that the way for Oracle identifies that the block was before invalidated due to NOLOGGING  is by updating most of the bytes in that block with 0xff but only if that “invalidate” redo is applied in a Recovery. If the associated redo/archived log file is used to RECOVER the data files ( which was the case in this preproduction environment) , Oracle invalidates such blocks and the error ORA-26040 along with error ORA-1578 are reported by SQL statements in the next block reads. So in this case the recovery  was done  but blocks were touched by NOLOGGING operations were marked soft corrupt during the recovery as those redo logs with flag “this block was nologging” were applied to the block.

http://docs.oracle.com/cd/E11882_01/backup.112/e10642/osadvsce.htm#BRADV90047

The recommendation is : Always performs backup after NOLOGGING operations, don’t touch those blocks again, use that backup for recovery.

 

As always  happy reading and till we meet again,

 

Mathijs

 

Is your backup strategy safe ( a story of BCV – Snap restores with EMC)

Introduction:

As a dba there should always be some room to improve your backup and recovery scenarios.  This blog is about a  judgment  day scenario – losing a 3TB database in full and your bosses are asking  what will you do and when will the database be back on its feet. In such scenarios I do tend to debate when do you really lose the full database with a need to restore it ..  proper responses might be maybe during an upgrade  to a newer Oracle Release or when you lose the server where the  database is running (however  given the fact that  a lot of sites use SAN  that would even  offer other options than a restore of such a big database not that likely).  In my experience it would be much more common to come across scenarios to restore single datafiles , tablespaces . But OK  for the argument’s sake and to proof we have a scenario when we lose the full database ( means all data files and control files) let us go into more detail with that.

Details:

On my most important databases this has been the setup in the past when implementing the database(s) in ASM to give us a save feeling.

  • Every database has three disk groups   called SID_DATA , SID_FRA and SID_REDO .
  • Of course database files are in the SID_DATA , Control files  and  REDOLOG group members are  in   SID_DATA  and SID_REDO and the archives are put to the SID_FRA.
  • Fortunately we are using  EMC technology , both Snap and BCV. We use them to  set up  clone – reporting environments, pre-production – and  do our backups with it.

As with regard to the backups an interesting debate has risen this week which triggered this blog. In the setup  of the Snap and BCV it has always been considered that  there have to be two of them for each database. One holding the SID_DATA disk group ( means the database files)  and one Snap or BCV holding SID_FRA  and SID_REDO.  When a backup is triggered,  the following actions are performed on the production side:

  • In the database it is brought to begin backup ,
  • In the database a copy of the controlfile is pushed to SID_FRA  disk group ,
  • In the EMC tools the  mirror is broken,
  • In the database an end backup is issued.
  • On the backup server the database is mounted with the copied version of the controlfile and a level backup is running with rman (both level0 and level 1 ( indeed also using block change tracking ).
  • After a backup  the snap / bcv remains mounted on the backup server for when of a true disaster to be used to restore  the full database from disk..
  • On the production server  the archives are saved in intervals to tape directly .

 

Soo  far so good right ? .. We have it all covered (or not ?)  . ….

If you take a closer look at a judgment day disaster – recovery ask  where customer will say we need a restore of the FULL database  ( of 3TB) would we be happy  when seeing this ?

If we check the redos  and control files we will see this:

1* select a.GROUP#,MEMBER,BYTES/1024/1024 from v$logfile a, v$log b where a.group#= b.group#  order by group#;

 

GROUP# MEMBER                                                                                                                                   BYTES/1024/1024

———- ——————————————————————————– —————

1 +MYDBP1_REDO/mydbp1/onlinelog/group_1.258.772722547                                                                       2048

1 +MYDBP1_DATA/mydbp1/onlinelog/group_1.262.772722575                                                                       2048

2 +MYDBP1_REDO/mydbp1/onlinelog/group_2.259.772723069                                                                       2048

2 +MYDBP1_DATA/mydbp1/onlinelog/group_2.263.772723099                                                                       2048

3 +MYDBP1_REDO/mydbp1/onlinelog/group_3.260.772723879                                                                       2048

3 +MYDBP1_DATA/mydbp1/onlinelog/group_3.266.772723909                                                                       2048

4 +MYDBP1_REDO/mydbp1/onlinelog/group_4.261.772723247                                                                       2048

4 +MYDBP1_DATA/mydbp1/onlinelog/group_4.267.772723277                                                                       2048

5 +MYDBP1_REDO/mydbp1/onlinelog/group_5.262.772723627                                                                       2048

5 +MYDBP1_DATA/mydbp1/onlinelog/group_5.534.772723657                                                                       2048

6 +MYDBP1_REDO/mydbp1/onlinelog/group_6.263.772722935                                                                       2048

6 +MYDBP1_DATA/mydbp1/onlinelog/group_6.535.772722963                                                                       2048

 

12 rows selected.

 

## Control files are in :

SQL> show parameter control_files

 

NAME                                                      TYPE VALUE

———————————— ———– ——————————

control_files                                          string               +MYDBP1_REDO/mydbp1/controlfile/current.257.771667625, +MYDBP1_DATA/mydbp1/controlfile/current.261.771667625

 

So if all hell breaks loose and you are asked to restore the backup from the  Snap or BCV  what will happen …..

First scenario was : well it is all  taken care of since we have two  bcv – snaps  remember, so   the dba would contact the storage colleague that he should  restore the  SID_DATA  snap( he should sync it back to production) and after that we would do a recovery  using the archives and the online redo logs  who are still present on the production box . And we feel very good with this cause we thought of the  _FRA and _REDO should not be synced back to production .. right ? ( actually the proper answer is wrong).

As we were discussing this scenario.. And after looking at the locations in the v$logfile we did notice  a week spot in that idea.. If we restore the  SID_DATA from BCV that would also mean that the first member of every online redolog would be wiped out since we have a redo member in  the SID_DATA diskgroup ( Oracle does not like online redologs NOT being in sync)..  And  we would also lose a current control file since that is too  in the same diskgroup ( Neither not synced control files btw are like by Oracle).  Of course such a scenario will only be needed  between dusk till dawn  after 4 nights of being on call .. and all shouting we need this issue fixed as soon as possible  so there would be little time to investigate  and set up a scenario.

Looking at this scenario  at hand,  when being asked to restore asap from BCV, losing one of the control files well that could easily be fixed by starting the database with  one control file ( since we did not touch the _FRA and _REDO diskgroups (we did not restore the second BCV  !! lucky us)   that would offer us still a current control file and after  doing our activities to restore and recover , we could use rman to copy the controlfile again to a second – third place and amend the spfile) .

The online redo member however  that would hurt bad if it really would occur that one online redo log member would be overwritten ( since we Restore from the  SID_DATA SNAP – BCV) to be fastest possible way back on our feet from disk once  the full database would be destroyed.  In a lab test  I could consider perhaps  a scenario where mounting the database and dropping the first member from each group ….. but not on a production highly important database like in this scenario..

So  that means that we will need to make sure that when a SID_DATA  snap- bcv  is  used (resynced to production) to restore from disk  it should only hold  Datafiles  and ok a control file (and not  also a online redo log member as happens at the moment) .

In this case I will go and set up new online redo log groups and  I will set them up in the SID_FRA and SID_REDO diskgroup. Then I will  do log switches  and in the end will drop the groups that had members both in _DATA and _REDO.   After that .. I would be much more relaxed for my judgment day restore request to recover the FULL 3 TB database from the snap – bcv at maximum speed. It is not that much work to carry out and I am sure I feel a lot more confident that even a scenario where a 3TB database needs to be restored  from a BCV can be done quit quickly and with less  risks.

 

As Always  happy reading and till we meet again,

 

Mathijs

 

Rman Archive backup failing with RMAN-03002, RMAN-06059, ORA-19625 in Rac

Introduction:

As member of the  team every so many days I am part of hot line , taking care of Tickets and indeed being the (hopefully friendly) voice in your ear when  you call  the  hot line number of Oracle team at the Company. One of the tickets drew my attention ( hmm maybe cause it was about RMAN because for years I love to work with RMAN) .  An archive backup had failed various times to I was asked to investigate and fix. This Blog will show you the analyses and steps I have performed  to fix this issue.

Details:

At  work we run a shell script to run both level and archive backups and in the logfile that was produced for the Archive backup this was my first clue:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 07/11/2014 07:27:45
RMAN-06059: expected archived log not found, loss of archived log compromises recoverability
ORA-19625: error identifying file /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_47588_2.arc
ORA-27037: unable to get file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

###    So  it was my interpretation that the specific archive was not  present in the ASM diskgroup but it was local  present on the other Node in the  RAC  in the $ORACLE_HOME/dbs directory.  My investigations  on the three nodes showed me:

## On first node :

SQL> archive log list
Database log mode              Archive Mode
Automatic archival             Enabled <<– wrong cause the archives are supposed  to be in the +MYDB2_FRA1 diskgroup as a ground rule.
Oldest online log sequence     51210
Next log sequence to archive   51213
Current log sequence           51213

On the second node :

SQL> archive log list
Database log mode                 Archive Mode
Automatic archival                  Enabled
Archive destination                 MYDB2_DATA1  <<– wrong cause the + is missing.
Oldest online log sequence     47585
Next log sequence to archive   47588
Current log sequence             47588

## on the 3rd node:

SQL> archive log list
Database log mode                 Archive Mode
Automatic archival                  Enabled
Archive destination                 MYDB2_DATA1 <<– wrong cause the + is missing .
Oldest online log sequence     52542
Next log sequence to archive   52545
Current log sequence             52545

So after that my conclusion was  that  due to the wrong archive_destination on two out of three nodes in my rac the archives where being written on a local filesystem on those two boxes  and on top of that in the $ORACLE_HOME/dbs directory. Well that was not good  !!

## So first steps to be performed  was to correct the archive_destination to:

ALTER SYSTEM SET log_archive_dest_1=’LOCATION=+MYDB2_FRA1′ SCOPE=BOTH SID=’MYDB21′;
ALTER SYSTEM SET log_archive_dest_1=’LOCATION=+MYDB2_FRA1′ SCOPE=BOTH SID=’MYDB2′;
ALTER SYSTEM SET log_archive_dest_1=’LOCATION=+MYDB2_FRA1′ SCOPE=BOTH SID=’MYDB23′;

## Once that was corrected I performed a couple of log switches to see archives being born in the +MYDB2_FRA1 diskgroup. Which was the case so already a bit happy me.

## Since I had noticed that the missing +  in the archive destination caused Oracle to create the archives.
## in $ORACLE_HOME/dbs.
## I had to copy  the archives present there to asm disk group  in asmcmd in the correct sub-folder ( I noticed that the archives had been of the last 2 recent days by the creation date of them in Linux so  I was aware where to put them in ASM):

cp /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_47523_2.arc .
cp /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_47525_2.arc .
cp /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_47526_2.arc .

## and I did the same on the other server.

oracle@mysrvr:/opt/oracle/product/11202_ee_64/db/dbs/ [MYDB23]# ls -ltr *.arc .

cp /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_52480_3.arc .
cp /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_52481_3.arc .
cp /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_52482_3.arc .

## Next step was how to find out how to register these archives after they have moved to ASM diskgroup in rman. And again fellow bloggers did not let me down so after spending some time on Google surfing   I came up with this:

I connected to the rman environment both target  and catalog and I performed this which was nice since  I only had to point to the directory where all the archives were located now in ASM after my copy action:

catalog start with  ‘+MYDB2_FRA1/MYDB2/ARCHIVELOG/2014_07_10/’;
catalog start with  ‘+MYDB2_FRA1/MYDB2/ARCHIVELOG/2014_07_11/’;

## Once  that had been done I ran an archive backup with success. And  was happy  that I had solved another Puzzle .

 

As always Happy reading  and till next time,

 

Mathijs

 

 

Rman Catalog Crosscheck and Delete obsolete an Approach

Introduction

When You are or will be working with Rman  as your backup solution you will have to ask yourself several questions . One of the elementary ones will be that you have to consider whether or not you will be utilizing an Rman Catalog or that  you will have faith in the existing  multiplexed controlfiles.  On the web I see debates going on about the use of a catalog and indeed given the idea that you will not need a catalog , since you are not be using stored scripts, you might get timeouts simply cause working with catalog shows long time no communication ( and you have those hmm lets say aggressive  firewalls in place that will snap at that so-it-seems idle sessions) etc. Well this post is about what you should do if you decide to have a catalog  for rman after all or if a catalog is in place due to company hmm  history – habits – or standards.

What  you should consider to have in place ( of course always  depends on your settings and configuration) Should be  a maintenance job for your Rman Catalog for the specific Database ( target)  you have added there.  With some explain below I would like to show you what was done for that maintenance part  by showing scripts we used for that.

Details:

#!/bin/ksh
#set -vx
PROG=$(basename $0)

if [ -z “$1” ]; then
echo “ERROR – wrong start commando”
echo “EXAMPLE – $PROG <SID>”
exit 1
fi

## Script will take the Instance that has been registered as an input parameter
export ORACLE_SID=$1

## Finding the ORATAB since it will help to set Oracle Home
if [ “${OS}” = “Linux” ];then
export ORATAB=/var/opt/oracle/oratab
else
export ORATAB=/etc/oratab
fi
ORATABLINE=`grep $ORACLE_SID $ORATAB`
export ORACLE_HOME=`echo $ORATABLINE | cut -f2 -d:`

## exporting several Parameters among which logging path and logfilename with a timestamp in it
export LOGPATH=/opt/oracle/scripts/log
export CURRENT_TIMESTAMP=`date +%Y%m%d_%H%M%S`   # Format: YYYYMMDD_HHMISS   e.g.: 20110907_150455
export LOGFILE=${ORACLE_SID}_crosscheck_${CURRENT_TIMESTAMP}.log
export NLS_DATE_FORMAT=”DD-MON-YYYY HH24:MI:SS”
export ORA_NLS32=${ORACLE_HOME}/ocommon/nls/admin/data
export ORA_NLS33=${ORACLE_HOME}/ocommon/nls/admin/data
export NLS_LANG=American_America.WE8ISO8859P15

## Connect to rman and your catalog  and start logging (msglog)  to the specified logdir and logfile.
## then allocate a channel. Note if you make backups to TAPE as we do you HAVE to allocate a channel for maintenance type SBT_TAPE and you HAVE to mention in our case specific Networker Server and Networker Client.
## then you perform the crosscheck backup which will check in the Networker database if the Backups is still available. If it is not the backup will be tagged expired in the Rman catalog.
## Once the  crosscheck is finished we wanted to see a result so did a list backup.
## If you are sure that what you see is correct  delete noprompt expired will erase the no longer existing backups in rman.. MAYBE best to run script first without this. Always best to be safe then sorry.
## After delete another Listing is performed as a record of the post  delete information in the catalog.

${ORACLE_HOME}/bin/rman TARGET  / RCVCAT rman_${ORACLE_SID}/${ORACLE_SID}@RMAN MSGLOG ${LOGPATH}\/${LOGFILE} << EOF
allocate channel for maintenance device type ‘SBT_TAPE’;
send ‘NSR_ENV=(NSR_SERVER=adebckus-nl,NSR_CLIENT=adesnouh)’;
crosscheck backup;
list backup of database summary;
delete noprompt expired backup;
list backup of database summary;
release channel;
exit
EOF

Happy Reading and best  of Luck,

Mathijs

Rman Recovery of a Rac Database

Introduction:

As part of a QA  process I am restoring a Rac database.  Rman as a back-up recovery that  has been around some time now (I started working with it as from Oracle 8) will be used for it.  Even though I am aware of debates that this could be done much easier via the Gui I am still tempted to use command line ( scripting for it).  This is the general information about the  environment where this will happen today: The 2 Instance Rac Database is on Linux servers on two nodes. Backups are performed to tape directly using Rman. We are using Networker for that. The environment  i am working with is a 11.2.3.0 with ASM. As part of sharing information below are the steps I performed with some add-on comment.

Continue reading