Renaming datafiles in Oracle 11GR2 Asm

Introduction.

As part of Housekeeping job Dbas of all times have been creating tablespaces for the users and applications. During the lifecycle of an application objects will be created , deleted copy into temporary copies etc. In the end  storage used of a tablespace might look like a Swiss Cheese. Even though Oracle is smart enough to reuse the “wholes”  sometimes it is good housekeeping to reorganize  a tablespace by moving – recreating its objects in a new tablespace. Once such scenario has completed the old  tablespace  ( aka swiss cheese )  will be empty  so it could be tossed away. Or not  if the tablespace name is mandatory since Vendor scripts use hardcodes specific tablespace name.  Even though the dba might not like such ‘Fixed’ names  it is good to know  that Oracle  in 11G( my version is 11.2.03 on Linux ) offers the option to rename a tablespace. But  what will happen to the datafiles that are part of that tablespace ?

Below scenario is based on a great post I found on the web by  Frits Hoogland:  https://fritshoogland.wordpress.com/2012/07/23/rename-oracle-managed-file-omf-datafiles-in-asm/ .

In his Post Frits is offering two scenarios to do this action. Personally i favor the first once since it holds less risk. Oracle keeps track of the copied file and the old file and is dropping the old file during  the alter database datafile  rename ..  action whereas the second scenario means you will have to do some afterwork in asmcmd since the original ( old file ) remains in place ( since in rman a copy and switch to is done).

And as you can see , after all these years  (his post was written in 2012 ) still great input for the current scenario. Good posts are like good wine , they get better as they age.

Scenario:

Often on the web it is stated that you should not believe it just because it is printed. That is also what I did. To get a look and feel of the scenario let’s run a test scenario first.

## Creating a tablespace in an Asm diskgroup ( which will create the datafile as OMF).

SQL> create tablespace test1 datafile ‘+DATA’ size 100m;

Let’s see how the datafiles for this tablespace in the DATA diskgroup will look like:

SQL> select * from dba_data_files where tablespace_name = ‘TEST1’;

FILE_NAME                                                                                                                                             FILE_ID TABLESPACE_NAME                BYTES    BLOCKS STATUS    RELATIVE_FNO AUT   MAXBYTES  MAXBLOCKS INCREMENT_BY USER_BYTES USER_BLOCKS ONLINE_

—- — ———- ———- ———— ———- ———– ——-

+DATA/mydb1/datafile/test1.736.935381317                                                                                                              5 TEST1                       104857600     12800 AVAILABLE            5 NO          0        0           0  103809024    12672 ONLINE

 

Ok so we see that the datafile  is +DATA/mydb1/datafile/test1.736.935381317  at the moment.

In  the next step we will rename the tablespace:

SQL> alter tablespace test1 rename to test2;

Time to check if there has something changed under the hood now we have renamed the tablespace:

SQL> select * from dba_data_files where tablespace_name = ‘TEST2’;

FILE_NAME                                                                                                                                             FILE_ID TABLESPACE_NAME                BYTES    BLOCKS STATUS    RELATIVE_FNO AUT   MAXBYTES  MAXBLOCKS INCREMENT_BY USER_BYTES USER_BLOCKS ONLINE_

—- — ———- ———- ———— ———- ———– ——-

+DATA/mydb1/datafile/test1.736.935381317                                                                                                              5 TEST2                       104857600     12800 AVAILABLE            5 NO          0        0           0  103809024    12672 ONLINE

Not much of a surprise that the datafile is still  +DATA/mydb1/datafile/test1.736.935381317  so if we feel that there should be a 1:1 relation between datafiles  we need to take the extra mile to make it right.

In my first sqlplus session , issued this command:

SQL> alter tablespace TEST2 offline;

I am used  to work in multiple sessions on my linux sessions so I opened a second window sqlplus and started RMAN  and gave this command to copy   ( the old file) in the Diskgroup. This will create another OMF for this datafile  in the diskgroup  +DATA.

rman target /

RMAN> copy datafile ‘+DATA/mydb1/datafile/test1.736.935381317’ to ‘+DATA’;

Output of this action looked like this:

Starting backup at 08.02.2017 04:13:58

using target database control file instead of recovery catalog

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=923 device type=DISK

channel ORA_DISK_1: starting datafile copy

input datafile file number=00005 name=+DATA/mydb1/datafile/test1.736.935381317

output file name=+DATA/mydb1/datafile/test2.733.935381641 tag=TAG20170208T041400 RECID=3 STAMP=935381644

channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:07

Finished backup at 08.02.2017 04:14:07

Starting Control File and SPFILE Autobackup at 08.02.2017 04:14:08

piece handle=/opt/oracle/product/11203_ee_64/db/dbs/MYDB1_c-2786503255-20170208-00 comment=NONE

Finished Control File and SPFILE Autobackup at 08.02.2017 04:14:15

With the output file name ( the new created file ) +DATA/mydb1/datafile/test2.733.935381641

I Moved back to my first screen and in sqlplus  issued this command:

SQL> alter database rename file ‘+DATA/mydb1/datafile/test1.736.935381317’ to ‘+DATA/mydb1/datafile/test2.733.935381641’ ;

After that last step to make is to take the tablespace online again:

SQL> alter tablespace test2 online;

Let’s check the datafile for the tablespace again now.

SQL> select * from dba_data_files where tablespace_name = ‘TEST2’;

FILE_NAME                                                                                                                                             FILE_ID TABLESPACE_NAME                BYTES    BLOCKS STATUS    RELATIVE_FNO AUT   MAXBYTES  MAXBLOCKS INCREMENT_BY USER_BYTES USER_BLOCKS ONLINE_

—- — ———- ———- ———— ———- ———– ——-

+DATA/mydb1/datafile/test2.733.935381641                                                                                                              5 TEST2                       104857600     12800 AVAILABLE            5 NO          0        0           0  103809024    12672 ONLINE

Now it is time for the real deal.

First let’s find out which datafiles are part of our Tablespace POOL_DATA ( which was renamed from POOL_DATA_REORG.

SQL> select file_id,file_name from dba_data_files where tablespace_name = ‘POOL_DATA’;

FILE_ID FILE_NAME

---------------------------------------------

       41 +DATA/mydb1/datafile/pool_data_reorg.780.934704787

       42 +DATA/mydb1/datafile/pool_data_reorg.783.934705039

       43 +DATA/mydb1/datafile/pool_data_reorg.658.934705043

       44 +DATA/mydb1/datafile/pool_data_reorg.735.934705271

       45 +DATA/mydb1/datafile/pool_data_reorg.734.934705525

       46 +DATA/mydb1/datafile/pool_data_reorg.732.934705785

       47 +DATA/mydb1/datafile/pool_data_reorg.731.934706037

       48 +DATA/mydb1/datafile/pool_data_reorg.730.934706289

       49 +DATA/mydb1/datafile/pool_data_reorg.729.934706505


Time to take the tablespace offline after consulting with Vendor – Dba that there will be an agreed maintenance window to do these actions:

SQL> alter tablespace POOL_DATA offline;

Opened a second screen  and started an rman session.

rman target /

RMAN> copy datafile ‘+DATA/mydb1/datafile/pool_data_reorg.780.934704787’ to ‘+DATA’;

This showed this output in Rman

Starting backup at 08.02.2017 04:35:57

using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile copy

input datafile file number=00041 name=+DATA/mydb1/datafile/pool_data_reorg.780.934704787

output file name=+DATA/mydb1/datafile/pool_data.736.935382959 tag=TAG20170208T043558 RECID=4 STAMP=935383461

channel ORA_DISK_1: datafile copy complete, elapsed time: 00:08:25

Finished backup at 08.02.2017 04:44:23

Starting Control File and SPFILE Autobackup at 08.02.2017 04:44:24

piece handle=/opt/oracle/product/11203_ee_64/db/dbs/MYDB1_c-2786503255-20170208-03 comment=NONE

Finished Control File and SPFILE Autobackup at 08.02.2017 04:44:31

In sqlplus I issued the rename action using the above information:

SQL> alter database rename file ‘+DATA/mydb1/datafile/pool_data_reorg.780.934704787’ to ‘+DATA/mydb1/datafile/pool_data.736.935382959’;

And with the idea checking Is believing ,  in the alert log I saw this entry

Alter database rename file '+DATA/mydb1/datafile/pool_data_reorg.780.934704787' to '+DATA/mydb1/datafile/pool_data.736.935382959'

Deleted Oracle managed file +DATA/mydb1/datafile/pool_data_reorg.780.934704787

Completed: alter database rename file '+DATA/mydb1/datafile/pool_data_reorg.780.934704787' to '+DATA/mydb1/datafile/pool_data.736.935382959'

I like what I see! Working this way and performing the rename will automatically delete the old File in ASM for us!

If this would be the online datafile to be moved the next step would be to take the tablespace online again. ( In this case also as an extra check that we did not break anything).

SQL> alter tablespace POOL_DATA online;

Now let us check  the datafile(s) for the tablespace again:

SQL> select file_id,file_name from dba_data_files where tablespace_name = ‘POOL_DATA’;

   FILE_ID FILE_NAME

---------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

       41 +DATA/mydb1/datafile/pool_data.736.935382959

       42 +DATA/mydb1/datafile/pool_data_reorg.783.934705039

       43 +DATA/mydb1/datafile/pool_data_reorg.658.934705043

       44 +DATA/mydb1/datafile/pool_data_reorg.735.934705271

       45 +DATA/mydb1/datafile/pool_data_reorg.734.934705525

       46 +DATA/mydb1/datafile/pool_data_reorg.732.934705785

       47 +DATA/mydb1/datafile/pool_data_reorg.731.934706037

       48 +DATA/mydb1/datafile/pool_data_reorg.730.934706289

       49 +DATA/mydb1/datafile/pool_data_reorg.729.934706505

Ah that looks like one down ( one correct 8 More to go).

alter tablespace POOL_DATA offline;

Proceeded with the remaining 8 Datafiles and when doing  the rename step it was a good thing to see this below information in the Alert log:

Wed Feb 08 06:00:34 2017

alter database rename file '+DATA/mydb1/datafile/pool_data_reorg.783.934705039' to '+DATA/mydb1/datafile/pool_data.780.935384353'

Deleted Oracle managed file +DATA/mydb1/datafile/pool_data_reorg.783.934705039

Completed: alter database rename file '+DATA/mydb1/datafile/pool_data_reorg.783.934705039' to '+DATA/mydb1/datafile/pool_data.780.935384353'

alter database rename file '+DATA/mydb1/datafile/pool_data_reorg.658.934705043' to '+DATA/mydb1/datafile/pool_data.755.935385209'

Deleted Oracle managed file +DATA/mydb1/datafile/pool_data_reorg.658.934705043

Completed: alter database rename file '+DATA/mydb1/datafile/pool_data_reorg.658.934705043' to '+DATA/mydb1/datafile/pool_data.755.935385209'

alter database rename file '+DATA/mydb1/datafile/pool_data_reorg.735.934705271' to '+DATA/mydb1/datafile/pool_data.778.935384371'

Wed Feb 08 06:00:48 2017

Deleted Oracle managed file +DATA/mydb1/datafile/pool_data_reorg.735.934705271

Completed: alter database rename file '+DATA/mydb1/datafile/pool_data_reorg.735.934705271' to '+DATA/mydb1/datafile/pool_data.778.935384371'

alter database rename file '+DATA/mydb1/datafile/pool_data_reorg.734.934705525' to '+DATA/mydb1/datafile/pool_data.781.935385215'

Deleted Oracle managed file +DATA/mydb1/datafile/pool_data_reorg.734.934705525

Completed: alter database rename file '+DATA/mydb1/datafile/pool_data_reorg.734.934705525' to '+DATA/mydb1/datafile/pool_data.781.935385215'

alter database rename file '+DATA/mydb1/datafile/pool_data_reorg.730.934706289' to '+DATA/mydb1/datafile/pool_data.762.935386865'

Deleted Oracle managed file +DATA/mydb1/datafile/pool_data_reorg.730.934706289

Completed: alter database rename file '+DATA/mydb1/datafile/pool_data_reorg.730.934706289' to '+DATA/mydb1/datafile/pool_data.762.935386865'

alter database rename file '+DATA/mydb1/datafile/pool_data_reorg.729.934706505' to '+DATA/mydb1/datafile/pool_data.772.935386887'

Deleted Oracle managed file +DATA/mydb1/datafile/pool_data_reorg.729.934706505

Completed: alter database rename file '+DATA/mydb1/datafile/pool_data_reorg.729.934706505' to '+DATA/mydb1/datafile/pool_data.772.935386887'

Wed Feb 08 06:01:30 2017

alter database rename file '+DATA/mydb1/datafile/pool_data_reorg.732.934705785' to '+DATA/mydb1/datafile/pool_data.744.935386243'

Deleted Oracle managed file +DATA/mydb1/datafile/pool_data_reorg.732.934705785

Completed: alter database rename file '+DATA/mydb1/datafile/pool_data_reorg.732.934705785' to '+DATA/mydb1/datafile/pool_data.744.935386243'

All looked well so time to wrap up.

SQL> alter tablespace POOL_DATA online;

SQL> alter database backup controlfile to trace;

SQL> select file_id,file_name from dba_data_files where tablespace_name = ‘POOL_DATA’;

   FILE_ID FILE_NAME

----------------

                41 +DATA/mydb1/datafile/pool_data.736.935382959

                42 +DATA/mydb1/datafile/pool_data.780.935384353

                43 +DATA/mydb1/datafile/pool_data.755.935385209

                44 +DATA/mydb1/datafile/pool_data.778.935384371

                45 +DATA/mydb1/datafile/pool_data.781.935385215

                46 +DATA/mydb1/datafile/pool_data.744.935386243

                47 +DATA/mydb1/datafile/pool_data.732.935388425

                48 +DATA/mydb1/datafile/pool_data.762.935386865

                49 +DATA/mydb1/datafile/pool_data.772.935386887

 

Final check:

SQL> select TABLESPACE_NAME,status from dba_tablespaces order by 1 ;

TABLESPACE_NAME              STATUS

------------------------------ ---------

POOL_DATA                   ONLINE

POOL_IX                     ONLINE

SYSAUX                      ONLINE

SYSTEM                      ONLINE

TEMP                        ONLINE

TEST2                       ONLINE

TOOLS                       ONLINE

UNDOTBS1                    ONLINE





As always hope you had some happy reading this procedure.

 

Mathijs.

 

 

 

The good the bad and the corrupted (ORA-01578, ORA-26040, ORA-01110)

Introduction:

 

At the moment I am part of a team that will upgrade a billing environment from 10.2.0.5 to 11.2.0.4 on Solaris. After a restore and point in time  recovery  on a preproduction environment the team has upgraded all databases using the manual upgrade method ( still not that confident that the upgrades would run as smooth with the dbua….. ). As always after the upgrade there is a time of hyper care where the alert files should be monitored even more closely for funky issues. This blog is giving you some more details on the funky stuff we came across and how we analyzed it. The one thing that did make my frown during this action however was the fact that various tools showed various results ( not necessarily showing all the same results.

Conclusions:

When upgrading  important environments a DBA  should really have eyes and  ears everywhere before performing that upgrade because once upgraded   it will be a tough call to proof that something was broken in the old environment already or that it was caused by the upgrade. So I would really recommend to scrutinize  your environment like someone studies a creature under a microscope before upgrading questioning broken jobs , invalid objects and as  I learned the hard way during this upgrade .. objects with no logging option (DBA_TABLES, DBA_INDEXES, DBA_LOBS, DBA_TAB_PARTITIONS, DBA_LOB_PARTITIONS, DBA_TAB_SUBPARTITIONS, etc.
LOGGING=’NO’ indicates NOLOGGING. ).  In my case e a number of objects had been created with the no logging option  and indeed only after  upgrading I had to fight my way back to find out what was going on , what objects  were involved ( in my case  it was related to a number of objects  , staging tables for a specific user  where a drop  and recreate and exp/imp dealt with it). I  did recommend to make sure that there will be force logging for the complete database ( but was told that customer wants to be able to create objects with no logging option  for performance reasons).

As I mentioned in the blog  in the End  I was plain lucky that this  was caused by  staging tables  which made the scenario more easy since we could  drop and recreate them easily. Once again I learned something ( and found proof again that success  just loves preparation). Of course this is a slightly different scenario  then the one we will run in production since we will  run the Upgrade directly but still would need to bear in mind the objects in production  created with the no logging option if one of the databases in scope would need a restore if the upgrade would break the database. Well in  this case all databases where upgraded with success  so all DBAS  happy and confident for the production migration which is scheduled.

Details:

SERVER: mysrvr

Database: OPTMYDB

Timeline:

  • All Databases of the billing environment have been restored  to the new OPT server called mysrvr.
  • We gave upgraded all Databases from 10.2.0.5 to 11.2.0.4 on Solaris.
  • After opening  the database OPTMYDB  and running statistics jobs she started throwing  error messages in alert about block corruptions  as seen in Example 1 below. Yesterday together with a colleague from Adba  team the gather statistics issue was resolved  and three objects had been identified to have corrupt blocks ( see Example2) .  These three objects have been dropped and recreated and there were no more corruption messages in the alert file.
  • After that I have been asked by my manager to run a dbv for the full database checking all datafiles .
  • I also have used rman in Parallel to find possible corruption  which is still running but first output is in example3 :

Example 1

## In  the alertfile this was the information We came across setting off the first alarm bells:

Mon Aug 04 22:00:43 2014

DBMS_STATS: GATHER_STATS_JOB encountered errors.  Check the trace file.

Errors in file /opt/oracle/OPTMYDB/diag/rdbms/MYDB/OPTMYDB/trace/OPTMYDB_j002_3640.trc:

ORA-20011: Approximate NDV failed: ORA-29913: error in executing ODCIEXTTABLEOPEN callout

ORA-29400: data cartridge error

error opening file /home/tmcon/oracle/UNMATCHED_SUBS_3640.log

Errors in file /opt/oracle/OPTMYDB/diag/rdbms/MYDB/OPTMYDB/trace/OPTMYDB_j002_3640.trc  (incident=8665):

ORA-01578: ORACLE data block corrupted (file # 620, block # 391690)

ORA-01110: data file 620: ‘/db/MYDB/data/tbsdata_e/tbsdip01_4m_14.dbf’

ORA-26040: Data block was loaded using the NOLOGGING option

Incident details in: /opt/oracle/OPTMYDB/diag/rdbms/MYDB/OPTMYDB/incident/incdir_8665/OPTMYDB_j002_3640_i8665.trc

 

Example2

I have grepped  the ORA-01578 from the alert file and added them to a table to get an overview of datafiles and blocks having the issues. And since the alertfile is a growing  objects this is what I got over t he time interval since starting up the database as a 11020.4 environment.

grep ORA-01578 alert_OPTMYDB.log AFN BL
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946

 

## On mos I retrieved a note that offered more information on the object that was located around the corrupted block:

select file_id AFN, relative_fno, tablespace_name
from dba_data_files
where relative_fno=&RFN;
Enter value for rfn: 620
old   3: where relative_fno=&RFN
new   3: where relative_fno=620
       AFN RELATIVE_FNO TABLESPACE_NAME
———- ———— ——————————
      3689          620 UNDO_TBS_01
       620          620 TBSDIP01_4M
##  Used this statement to find the object that was on that spot in the tablespace:
select *
from dba_extents
where file_id = &AFN 620
and &BL between block_id AND block_id + blocks – 1; 391690
620
392202
SQL> select *
from dba_extents
where file_id = &AFN
and &BL between block_id AND block_id + blocks – 1;
  2    3    4  Enter value for afn: 620
old   3: where file_id = &AFN
new   3: where file_id = 620
Enter value for bl: 391690
old   4: and &BL between block_id AND block_id + blocks – 1
new   4: and 391690 between block_id AND block_id + blocks – 1
OWNER
——————————
SEGMENT_NAME
——————————————————————————–
PARTITION_NAME                 SEGMENT_TYPE       TABLESPACE_NAME
—————————— —————— ——————————
EXTENT_ID    FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
———- ———- ———- ———- ———- ————
SCOTT
SCOTT_BPS_05062014000_NOBP
                               TABLE              TBSDIP01_4M
         0        620     391689    4194304        512          620
SQL> select *
from dba_extents
where file_id = &AFN
and &BL between block_id AND block_id + blocks – 1;
  2    3    4  Enter value for afn: 620
old   3: where file_id = &AFN
new   3: where file_id = 620
Enter value for bl: 392202
old   4: and &BL between block_id AND block_id + blocks – 1
new   4: and 392202 between block_id AND block_id + blocks – 1
OWNER
——————————
SEGMENT_NAME
——————————————————————————–
PARTITION_NAME                 SEGMENT_TYPE       TABLESPACE_NAME
—————————— —————— ——————————
EXTENT_ID    FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
———- ———- ———- ———- ———- ————
SCOTT
SCOTT_BPS_05062014000_NORUEB
                               TABLE              TBSDIP01_4M
         0        620     392201    4194304        512          620
SQL> select file_id AFN, relative_fno, tablespace_name
from dba_data_files
where relative_fno=&RFN;
  2    3  Enter value for rfn: 661
old   3: where relative_fno=&RFN
new   3: where relative_fno=661
       AFN RELATIVE_FNO TABLESPACE_NAME
———- ———— ——————————
       661          661 TBSDIP01_4M
select *
from dba_extents
where file_id = &AFN
and &BL between block_id AND block_id + blocks – 1; 661 423946
OWNER
——————————
SEGMENT_NAME
——————————————————————————–
PARTITION_NAME                 SEGMENT_TYPE       TABLESPACE_NAME
—————————— —————— ——————————
EXTENT_ID    FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
———- ———- ———- ———- ———- ————
SCOTT
SCOTT_SII_05062014000R_BAN
                               TABLE              TBSDIP01_4M
         0        661     423945    4194304        512          661

 

Example3

On Mos  I also retrieved a note that offered RMAN as a tool to find  logical corruption. Since I am curious by nature of course had to use that goodie too !

 

RMAN> run {
allocate channel d1 type disk;
allocate channel d2 type disk;
allocate channel d3 type disk;
allocate channel d4 type disk;
backup validate check logical database;
}

 

##  Running this in Rman will put rows if any in the v$database_block_corruption.  When i asked for more details on this view i was told that it should return to 0 rows when rerunning.  I restarted the rman process after some time  (so maybe since it did not complete it did not wrap up properly but at second run I noticed still rows present there).

SQL> select * from V$DATABASE_BLOCK_CORRUPTION

 

FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO

———- ———- ———- —————— ———

235      34825         92         1.3760E+13 NOLOGGING

235     323083        510         1.3760E+13 NOLOGGING

548     315914        833         1.3760E+13 NOLOGGING

648     148489        512         1.3760E+13 NOLOGGING

 

## Well the good thing was that this showed in one view what most likely was  the cause of the issue : Objects being created  with nologging option  thus invalidating a 100 %  recovery after a restore  ( and recovery of a number of archives ) and upgrade to 11G after that.

From Oracle support I got this as a good note as it turned out  that the corruption was caused  by no logging operations:

The Gains and Pains of Nologging Operations in a Data Guard Environment (Doc ID 290161.1)

ORA-1578 / ORA-26040 Corrupt blocks by NOLOGGING – Error explanation and solution (Doc ID 794505.1)

A quote from this Note really made me both frown and giggle:

…… Begin Quote

Note that the data inside the affected blocks is not salvageable. Methods like “Media Recovery” or “RMAN blockrecover” will not fix the problem unless the data file was backed up after the NOLOGGING operation was registered in the Redo Log.

…… END  Quote

I also learned that the way for Oracle identifies that the block was before invalidated due to NOLOGGING  is by updating most of the bytes in that block with 0xff but only if that “invalidate” redo is applied in a Recovery. If the associated redo/archived log file is used to RECOVER the data files ( which was the case in this preproduction environment) , Oracle invalidates such blocks and the error ORA-26040 along with error ORA-1578 are reported by SQL statements in the next block reads. So in this case the recovery  was done  but blocks were touched by NOLOGGING operations were marked soft corrupt during the recovery as those redo logs with flag “this block was nologging” were applied to the block.

http://docs.oracle.com/cd/E11882_01/backup.112/e10642/osadvsce.htm#BRADV90047

The recommendation is : Always performs backup after NOLOGGING operations, don’t touch those blocks again, use that backup for recovery.

 

As always  happy reading and till we meet again,

 

Mathijs

 

Is your backup strategy safe ( a story of BCV – Snap restores with EMC)

Introduction:

As a dba there should always be some room to improve your backup and recovery scenarios.  This blog is about a  judgment  day scenario – losing a 3TB database in full and your bosses are asking  what will you do and when will the database be back on its feet. In such scenarios I do tend to debate when do you really lose the full database with a need to restore it ..  proper responses might be maybe during an upgrade  to a newer Oracle Release or when you lose the server where the  database is running (however  given the fact that  a lot of sites use SAN  that would even  offer other options than a restore of such a big database not that likely).  In my experience it would be much more common to come across scenarios to restore single datafiles , tablespaces . But OK  for the argument’s sake and to proof we have a scenario when we lose the full database ( means all data files and control files) let us go into more detail with that.

Details:

On my most important databases this has been the setup in the past when implementing the database(s) in ASM to give us a save feeling.

  • Every database has three disk groups   called SID_DATA , SID_FRA and SID_REDO .
  • Of course database files are in the SID_DATA , Control files  and  REDOLOG group members are  in   SID_DATA  and SID_REDO and the archives are put to the SID_FRA.
  • Fortunately we are using  EMC technology , both Snap and BCV. We use them to  set up  clone – reporting environments, pre-production – and  do our backups with it.

As with regard to the backups an interesting debate has risen this week which triggered this blog. In the setup  of the Snap and BCV it has always been considered that  there have to be two of them for each database. One holding the SID_DATA disk group ( means the database files)  and one Snap or BCV holding SID_FRA  and SID_REDO.  When a backup is triggered,  the following actions are performed on the production side:

  • In the database it is brought to begin backup ,
  • In the database a copy of the controlfile is pushed to SID_FRA  disk group ,
  • In the EMC tools the  mirror is broken,
  • In the database an end backup is issued.
  • On the backup server the database is mounted with the copied version of the controlfile and a level backup is running with rman (both level0 and level 1 ( indeed also using block change tracking ).
  • After a backup  the snap / bcv remains mounted on the backup server for when of a true disaster to be used to restore  the full database from disk..
  • On the production server  the archives are saved in intervals to tape directly .

 

Soo  far so good right ? .. We have it all covered (or not ?)  . ….

If you take a closer look at a judgment day disaster – recovery ask  where customer will say we need a restore of the FULL database  ( of 3TB) would we be happy  when seeing this ?

If we check the redos  and control files we will see this:

1* select a.GROUP#,MEMBER,BYTES/1024/1024 from v$logfile a, v$log b where a.group#= b.group#  order by group#;

 

GROUP# MEMBER                                                                                                                                   BYTES/1024/1024

———- ——————————————————————————– —————

1 +MYDBP1_REDO/mydbp1/onlinelog/group_1.258.772722547                                                                       2048

1 +MYDBP1_DATA/mydbp1/onlinelog/group_1.262.772722575                                                                       2048

2 +MYDBP1_REDO/mydbp1/onlinelog/group_2.259.772723069                                                                       2048

2 +MYDBP1_DATA/mydbp1/onlinelog/group_2.263.772723099                                                                       2048

3 +MYDBP1_REDO/mydbp1/onlinelog/group_3.260.772723879                                                                       2048

3 +MYDBP1_DATA/mydbp1/onlinelog/group_3.266.772723909                                                                       2048

4 +MYDBP1_REDO/mydbp1/onlinelog/group_4.261.772723247                                                                       2048

4 +MYDBP1_DATA/mydbp1/onlinelog/group_4.267.772723277                                                                       2048

5 +MYDBP1_REDO/mydbp1/onlinelog/group_5.262.772723627                                                                       2048

5 +MYDBP1_DATA/mydbp1/onlinelog/group_5.534.772723657                                                                       2048

6 +MYDBP1_REDO/mydbp1/onlinelog/group_6.263.772722935                                                                       2048

6 +MYDBP1_DATA/mydbp1/onlinelog/group_6.535.772722963                                                                       2048

 

12 rows selected.

 

## Control files are in :

SQL> show parameter control_files

 

NAME                                                      TYPE VALUE

———————————— ———– ——————————

control_files                                          string               +MYDBP1_REDO/mydbp1/controlfile/current.257.771667625, +MYDBP1_DATA/mydbp1/controlfile/current.261.771667625

 

So if all hell breaks loose and you are asked to restore the backup from the  Snap or BCV  what will happen …..

First scenario was : well it is all  taken care of since we have two  bcv – snaps  remember, so   the dba would contact the storage colleague that he should  restore the  SID_DATA  snap( he should sync it back to production) and after that we would do a recovery  using the archives and the online redo logs  who are still present on the production box . And we feel very good with this cause we thought of the  _FRA and _REDO should not be synced back to production .. right ? ( actually the proper answer is wrong).

As we were discussing this scenario.. And after looking at the locations in the v$logfile we did notice  a week spot in that idea.. If we restore the  SID_DATA from BCV that would also mean that the first member of every online redolog would be wiped out since we have a redo member in  the SID_DATA diskgroup ( Oracle does not like online redologs NOT being in sync)..  And  we would also lose a current control file since that is too  in the same diskgroup ( Neither not synced control files btw are like by Oracle).  Of course such a scenario will only be needed  between dusk till dawn  after 4 nights of being on call .. and all shouting we need this issue fixed as soon as possible  so there would be little time to investigate  and set up a scenario.

Looking at this scenario  at hand,  when being asked to restore asap from BCV, losing one of the control files well that could easily be fixed by starting the database with  one control file ( since we did not touch the _FRA and _REDO diskgroups (we did not restore the second BCV  !! lucky us)   that would offer us still a current control file and after  doing our activities to restore and recover , we could use rman to copy the controlfile again to a second – third place and amend the spfile) .

The online redo member however  that would hurt bad if it really would occur that one online redo log member would be overwritten ( since we Restore from the  SID_DATA SNAP – BCV) to be fastest possible way back on our feet from disk once  the full database would be destroyed.  In a lab test  I could consider perhaps  a scenario where mounting the database and dropping the first member from each group ….. but not on a production highly important database like in this scenario..

So  that means that we will need to make sure that when a SID_DATA  snap- bcv  is  used (resynced to production) to restore from disk  it should only hold  Datafiles  and ok a control file (and not  also a online redo log member as happens at the moment) .

In this case I will go and set up new online redo log groups and  I will set them up in the SID_FRA and SID_REDO diskgroup. Then I will  do log switches  and in the end will drop the groups that had members both in _DATA and _REDO.   After that .. I would be much more relaxed for my judgment day restore request to recover the FULL 3 TB database from the snap – bcv at maximum speed. It is not that much work to carry out and I am sure I feel a lot more confident that even a scenario where a 3TB database needs to be restored  from a BCV can be done quit quickly and with less  risks.

 

As Always  happy reading and till we meet again,

 

Mathijs

 

Rman Archive backup failing with RMAN-03002, RMAN-06059, ORA-19625 in Rac

Introduction:

As member of the  team every so many days I am part of hot line , taking care of Tickets and indeed being the (hopefully friendly) voice in your ear when  you call  the  hot line number of Oracle team at the Company. One of the tickets drew my attention ( hmm maybe cause it was about RMAN because for years I love to work with RMAN) .  An archive backup had failed various times to I was asked to investigate and fix. This Blog will show you the analyses and steps I have performed  to fix this issue.

Details:

At  work we run a shell script to run both level and archive backups and in the logfile that was produced for the Archive backup this was my first clue:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 07/11/2014 07:27:45
RMAN-06059: expected archived log not found, loss of archived log compromises recoverability
ORA-19625: error identifying file /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_47588_2.arc
ORA-27037: unable to get file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

###    So  it was my interpretation that the specific archive was not  present in the ASM diskgroup but it was local  present on the other Node in the  RAC  in the $ORACLE_HOME/dbs directory.  My investigations  on the three nodes showed me:

## On first node :

SQL> archive log list
Database log mode              Archive Mode
Automatic archival             Enabled <<– wrong cause the archives are supposed  to be in the +MYDB2_FRA1 diskgroup as a ground rule.
Oldest online log sequence     51210
Next log sequence to archive   51213
Current log sequence           51213

On the second node :

SQL> archive log list
Database log mode                 Archive Mode
Automatic archival                  Enabled
Archive destination                 MYDB2_DATA1  <<– wrong cause the + is missing.
Oldest online log sequence     47585
Next log sequence to archive   47588
Current log sequence             47588

## on the 3rd node:

SQL> archive log list
Database log mode                 Archive Mode
Automatic archival                  Enabled
Archive destination                 MYDB2_DATA1 <<– wrong cause the + is missing .
Oldest online log sequence     52542
Next log sequence to archive   52545
Current log sequence             52545

So after that my conclusion was  that  due to the wrong archive_destination on two out of three nodes in my rac the archives where being written on a local filesystem on those two boxes  and on top of that in the $ORACLE_HOME/dbs directory. Well that was not good  !!

## So first steps to be performed  was to correct the archive_destination to:

ALTER SYSTEM SET log_archive_dest_1=’LOCATION=+MYDB2_FRA1′ SCOPE=BOTH SID=’MYDB21′;
ALTER SYSTEM SET log_archive_dest_1=’LOCATION=+MYDB2_FRA1′ SCOPE=BOTH SID=’MYDB2′;
ALTER SYSTEM SET log_archive_dest_1=’LOCATION=+MYDB2_FRA1′ SCOPE=BOTH SID=’MYDB23′;

## Once that was corrected I performed a couple of log switches to see archives being born in the +MYDB2_FRA1 diskgroup. Which was the case so already a bit happy me.

## Since I had noticed that the missing +  in the archive destination caused Oracle to create the archives.
## in $ORACLE_HOME/dbs.
## I had to copy  the archives present there to asm disk group  in asmcmd in the correct sub-folder ( I noticed that the archives had been of the last 2 recent days by the creation date of them in Linux so  I was aware where to put them in ASM):

cp /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_47523_2.arc .
cp /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_47525_2.arc .
cp /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_47526_2.arc .

## and I did the same on the other server.

oracle@mysrvr:/opt/oracle/product/11202_ee_64/db/dbs/ [MYDB23]# ls -ltr *.arc .

cp /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_52480_3.arc .
cp /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_52481_3.arc .
cp /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_52482_3.arc .

## Next step was how to find out how to register these archives after they have moved to ASM diskgroup in rman. And again fellow bloggers did not let me down so after spending some time on Google surfing   I came up with this:

I connected to the rman environment both target  and catalog and I performed this which was nice since  I only had to point to the directory where all the archives were located now in ASM after my copy action:

catalog start with  ‘+MYDB2_FRA1/MYDB2/ARCHIVELOG/2014_07_10/’;
catalog start with  ‘+MYDB2_FRA1/MYDB2/ARCHIVELOG/2014_07_11/’;

## Once  that had been done I ran an archive backup with success. And  was happy  that I had solved another Puzzle .

 

As always Happy reading  and till next time,

 

Mathijs

 

 

When Rman shows errors like RMAN-20220 and RMAN-06004

Introduction:

On one of the product ion databases we are using BCVS  (Business Continuity Volume) ) in the Emc boxes to make  Rman backups.  Basically done by putting  the source (production) Database into begin backup mode then split the mirror. After the split the source environment is put to end backup mode and the Bcv is being mounted on a different server to make the Rman level backups.  It is also important to mention the we are using an Rman catalog  database where every source database present has its own Schema with the catalog for that database only.

On the Source database  the archives are put to tape.

For a great explain of the concept I recommend following blog  by one of my Heroes:  Martin Bach:

http://martincarstenbach.wordpress.com/2011/05/24/offloading-production-backups-to-a-different-storage-array/

In my environment  i got alarmed that the backup of the control file on the backup server was no longer  running. So it was time again to gear up and go out there and investigate .

Details:

On the backup server we use a tailored script to do the level backup , and in that script aso a backup of the control file  is included .  When I was looking at the log files I see that this Rman (bcv)  level backup is  doing two things :

  • Produce a level backup which is successful ,
  • After that  a copy of the control file is registered in the catalog and is put to  tape . That part is failing all the time now ( even though  the log files show the  backup as success) .

In the log files on the backup server I see after a successful level backup  following error:

 RMAN> run {
2> debug off;
3> allocate channel d1 type disk;
4> catalog controlfilecopy '/opt/oracle/admin/backup/MYDB1/backup_controlfile_MYDB1_2014_02_01_23:20.bck';
5> release channel d1;
6> allocate channel t1  type  'SBT_TAPE';
7> send 'NSR_ENV=(NSR_SERVER=MYNSR_SERVER,NSR_CLIENT=MYDB_SERVER, NSR_DATA_VOLUME_POOL=REP#0032#G02#LTO3, NSR_GROUP=DAT_0032_G02_
LTO3_TSR_local_01, NSR_SAVESET_BROWSE="0032 Days", NSR_SAVESET_RETENTION="0032 Days")';
8> backup
9> format 'ctrl_level1_%d_201402012208_3605724822_%p_%U'
10> controlfilecopy '/opt/oracle/admin/backup/MYDB1/backup_controlfile_MYDB1_2014_02_01_23:20.bck';
11> release channel t1;
12> }
Debugging turned off

allocated channel: d1
channel d1: sid=593 devtype=DISK

cataloged control file copy
control file copy filename=/opt/oracle/admin/backup/MYDB1/backup_controlfile_MYDB1_2014_02_01_23:20.bck recid=1280 stamp=83
8423230

released channel: d1

allocated channel: t1
channel t1: sid=593 devtype=SBT_TAPE
channel t1: NMO v5.0.0.0

sent command to channel: t1

Starting backup at 01-FEB-14
released channel: t1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 02/01/2014 23:20:35
RMAN-06004: ORACLE error from recovery catalog database: RMAN-20220: control file copy not found in the recovery catalog
RMAN-06090: error while looking up control file copy: /opt/oracle/admin/backup/MYDB1/backup_controlfile_MYDB1_2014_02_01_23:20.bck

At  first moment I don’t see it why this is failing so it is under investigation but it already feels like a nice puzzle.

When I checked the production side  to get a complete  overview  that was a shock to me. Because in the  log files of the Archive backups i found a clue what was going on but i also noticed that the backups where failing . In the log files I saw:

...
released channel: d1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of crosscheck command at 02/02/2014 23:23:30
RMAN-06004: ORACLE error from recovery catalog database: RMAN-20011: target database incarnation is not current in recovery catalog

Hmm wait a minute did we just see a valuable clue to  solve the case ?

So After that i started investigating after surfing the web. After connecting to the catalog ( schema) in the Rman database I started following query:

SQL> SELECT name, DBID, RESETLOGS_TIME FROM rc_database;
NAME           DBID RESETLOGS_TIME
-------- ---------- -------------------
MYDB1   3605724822 31.01.2014 07:05:51

That was making me frown  because in the catalog it was known that there had been a resetlog ???

Then I checked the incarnation:

SQL> SELECT dbid, name, dbinc_key, resetlogs_change#, resetlogs_time FROM rc_database_incarnation;
DBID NAME      DBINC_KEY RESETLOGS_CHANGE# RESETLOGS_TIME
---------- -------- ---------- ----------------- -------------------
3605724822 MYDB1            2        7.9895E+12 21.07.2009 17:18:42
3605724822 MYDB1           73                 1 04.04.2008 15:11:18
3605724822 MYDB1           74           2461875 19.05.2008 16:28:20
3605724822 MYDB1      2553841        1.0587E+13 31.01.2014 07:05:51

Not good at  all  Cause apparently the Catalog  had other information (  showing  a reset logs )  then the production database  ( which had its last reset logs  somewhere back in 2008 ) . And that was  also  the information I saw when I queried the production database ( v$database view is showing  information about a reset logs in

RESETLOGS_CHANGE# NUMBER System change number (SCN) at open resetlogs
RESETLOGS_TIME DATE Timestamp of open resetlogs

In Rman environment   the command list incarnation only returned an empty  line and a prompt which was unexpected too.

After giving it some thought and consulting a colleague I decided to stick to the easy scenario where:

  • I performed an export of the Rman schema for that Database on the Rman Catalog database server and  I will keep that export the upcoming  4 weeks for when an old restore would be needed.
  • I dropped the Rman  user  for that specific database in the Rman catalog database ( drop user  rman_Mydb1 cascade).
  • I registered the database again as a new Rman_schema in the Rman catalog database .

## After that it was time to check things>

rman TARGET  / RCVCAT rman_MYDB1/*****@RMAN
Recovery Manager: Release 10.2.0.3.0 - Production on Mon Feb 3 17:15:36 2014
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
connected to target database: MYDB1 (DBID=3605724822)
connected to recovery catalog database
RMAN> list incarnation;
List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       73      MYDB1   3605724822       PARENT  1          04.04.2008 15:11:18
1       74      MYDB1   3605724822       PARENT  2461875    19.05.2008 16:28:20
1       2       MYDB1   3605724822       CURRENT 7989501317585 21.07.2009 17:18:42

Looked  OK to me. After that  I ran an Archive backup and i checked the result of the scheduled Rman level backup the very next day.   Once again the concept worked as a charme  so happy dba again.

Happy reading,

Mathijs