The good the bad and the corrupted (ORA-01578, ORA-26040, ORA-01110)

Introduction:

 

At the moment I am part of a team that will upgrade a billing environment from 10.2.0.5 to 11.2.0.4 on Solaris. After a restore and point in time  recovery  on a preproduction environment the team has upgraded all databases using the manual upgrade method ( still not that confident that the upgrades would run as smooth with the dbua….. ). As always after the upgrade there is a time of hyper care where the alert files should be monitored even more closely for funky issues. This blog is giving you some more details on the funky stuff we came across and how we analyzed it. The one thing that did make my frown during this action however was the fact that various tools showed various results ( not necessarily showing all the same results.

Conclusions:

When upgrading  important environments a DBA  should really have eyes and  ears everywhere before performing that upgrade because once upgraded   it will be a tough call to proof that something was broken in the old environment already or that it was caused by the upgrade. So I would really recommend to scrutinize  your environment like someone studies a creature under a microscope before upgrading questioning broken jobs , invalid objects and as  I learned the hard way during this upgrade .. objects with no logging option (DBA_TABLES, DBA_INDEXES, DBA_LOBS, DBA_TAB_PARTITIONS, DBA_LOB_PARTITIONS, DBA_TAB_SUBPARTITIONS, etc.
LOGGING=’NO’ indicates NOLOGGING. ).  In my case e a number of objects had been created with the no logging option  and indeed only after  upgrading I had to fight my way back to find out what was going on , what objects  were involved ( in my case  it was related to a number of objects  , staging tables for a specific user  where a drop  and recreate and exp/imp dealt with it). I  did recommend to make sure that there will be force logging for the complete database ( but was told that customer wants to be able to create objects with no logging option  for performance reasons).

As I mentioned in the blog  in the End  I was plain lucky that this  was caused by  staging tables  which made the scenario more easy since we could  drop and recreate them easily. Once again I learned something ( and found proof again that success  just loves preparation). Of course this is a slightly different scenario  then the one we will run in production since we will  run the Upgrade directly but still would need to bear in mind the objects in production  created with the no logging option if one of the databases in scope would need a restore if the upgrade would break the database. Well in  this case all databases where upgraded with success  so all DBAS  happy and confident for the production migration which is scheduled.

Details:

SERVER: mysrvr

Database: OPTMYDB

Timeline:

  • All Databases of the billing environment have been restored  to the new OPT server called mysrvr.
  • We gave upgraded all Databases from 10.2.0.5 to 11.2.0.4 on Solaris.
  • After opening  the database OPTMYDB  and running statistics jobs she started throwing  error messages in alert about block corruptions  as seen in Example 1 below. Yesterday together with a colleague from Adba  team the gather statistics issue was resolved  and three objects had been identified to have corrupt blocks ( see Example2) .  These three objects have been dropped and recreated and there were no more corruption messages in the alert file.
  • After that I have been asked by my manager to run a dbv for the full database checking all datafiles .
  • I also have used rman in Parallel to find possible corruption  which is still running but first output is in example3 :

Example 1

## In  the alertfile this was the information We came across setting off the first alarm bells:

Mon Aug 04 22:00:43 2014

DBMS_STATS: GATHER_STATS_JOB encountered errors.  Check the trace file.

Errors in file /opt/oracle/OPTMYDB/diag/rdbms/MYDB/OPTMYDB/trace/OPTMYDB_j002_3640.trc:

ORA-20011: Approximate NDV failed: ORA-29913: error in executing ODCIEXTTABLEOPEN callout

ORA-29400: data cartridge error

error opening file /home/tmcon/oracle/UNMATCHED_SUBS_3640.log

Errors in file /opt/oracle/OPTMYDB/diag/rdbms/MYDB/OPTMYDB/trace/OPTMYDB_j002_3640.trc  (incident=8665):

ORA-01578: ORACLE data block corrupted (file # 620, block # 391690)

ORA-01110: data file 620: ‘/db/MYDB/data/tbsdata_e/tbsdip01_4m_14.dbf’

ORA-26040: Data block was loaded using the NOLOGGING option

Incident details in: /opt/oracle/OPTMYDB/diag/rdbms/MYDB/OPTMYDB/incident/incdir_8665/OPTMYDB_j002_3640_i8665.trc

 

Example2

I have grepped  the ORA-01578 from the alert file and added them to a table to get an overview of datafiles and blocks having the issues. And since the alertfile is a growing  objects this is what I got over t he time interval since starting up the database as a 11020.4 environment.

grep ORA-01578 alert_OPTMYDB.log AFN BL
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946

 

## On mos I retrieved a note that offered more information on the object that was located around the corrupted block:

select file_id AFN, relative_fno, tablespace_name
from dba_data_files
where relative_fno=&RFN;
Enter value for rfn: 620
old   3: where relative_fno=&RFN
new   3: where relative_fno=620
       AFN RELATIVE_FNO TABLESPACE_NAME
———- ———— ——————————
      3689          620 UNDO_TBS_01
       620          620 TBSDIP01_4M
##  Used this statement to find the object that was on that spot in the tablespace:
select *
from dba_extents
where file_id = &AFN 620
and &BL between block_id AND block_id + blocks – 1; 391690
620
392202
SQL> select *
from dba_extents
where file_id = &AFN
and &BL between block_id AND block_id + blocks – 1;
  2    3    4  Enter value for afn: 620
old   3: where file_id = &AFN
new   3: where file_id = 620
Enter value for bl: 391690
old   4: and &BL between block_id AND block_id + blocks – 1
new   4: and 391690 between block_id AND block_id + blocks – 1
OWNER
——————————
SEGMENT_NAME
——————————————————————————–
PARTITION_NAME                 SEGMENT_TYPE       TABLESPACE_NAME
—————————— —————— ——————————
EXTENT_ID    FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
———- ———- ———- ———- ———- ————
SCOTT
SCOTT_BPS_05062014000_NOBP
                               TABLE              TBSDIP01_4M
         0        620     391689    4194304        512          620
SQL> select *
from dba_extents
where file_id = &AFN
and &BL between block_id AND block_id + blocks – 1;
  2    3    4  Enter value for afn: 620
old   3: where file_id = &AFN
new   3: where file_id = 620
Enter value for bl: 392202
old   4: and &BL between block_id AND block_id + blocks – 1
new   4: and 392202 between block_id AND block_id + blocks – 1
OWNER
——————————
SEGMENT_NAME
——————————————————————————–
PARTITION_NAME                 SEGMENT_TYPE       TABLESPACE_NAME
—————————— —————— ——————————
EXTENT_ID    FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
———- ———- ———- ———- ———- ————
SCOTT
SCOTT_BPS_05062014000_NORUEB
                               TABLE              TBSDIP01_4M
         0        620     392201    4194304        512          620
SQL> select file_id AFN, relative_fno, tablespace_name
from dba_data_files
where relative_fno=&RFN;
  2    3  Enter value for rfn: 661
old   3: where relative_fno=&RFN
new   3: where relative_fno=661
       AFN RELATIVE_FNO TABLESPACE_NAME
———- ———— ——————————
       661          661 TBSDIP01_4M
select *
from dba_extents
where file_id = &AFN
and &BL between block_id AND block_id + blocks – 1; 661 423946
OWNER
——————————
SEGMENT_NAME
——————————————————————————–
PARTITION_NAME                 SEGMENT_TYPE       TABLESPACE_NAME
—————————— —————— ——————————
EXTENT_ID    FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
———- ———- ———- ———- ———- ————
SCOTT
SCOTT_SII_05062014000R_BAN
                               TABLE              TBSDIP01_4M
         0        661     423945    4194304        512          661

 

Example3

On Mos  I also retrieved a note that offered RMAN as a tool to find  logical corruption. Since I am curious by nature of course had to use that goodie too !

 

RMAN> run {
allocate channel d1 type disk;
allocate channel d2 type disk;
allocate channel d3 type disk;
allocate channel d4 type disk;
backup validate check logical database;
}

 

##  Running this in Rman will put rows if any in the v$database_block_corruption.  When i asked for more details on this view i was told that it should return to 0 rows when rerunning.  I restarted the rman process after some time  (so maybe since it did not complete it did not wrap up properly but at second run I noticed still rows present there).

SQL> select * from V$DATABASE_BLOCK_CORRUPTION

 

FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO

———- ———- ———- —————— ———

235      34825         92         1.3760E+13 NOLOGGING

235     323083        510         1.3760E+13 NOLOGGING

548     315914        833         1.3760E+13 NOLOGGING

648     148489        512         1.3760E+13 NOLOGGING

 

## Well the good thing was that this showed in one view what most likely was  the cause of the issue : Objects being created  with nologging option  thus invalidating a 100 %  recovery after a restore  ( and recovery of a number of archives ) and upgrade to 11G after that.

From Oracle support I got this as a good note as it turned out  that the corruption was caused  by no logging operations:

The Gains and Pains of Nologging Operations in a Data Guard Environment (Doc ID 290161.1)

ORA-1578 / ORA-26040 Corrupt blocks by NOLOGGING – Error explanation and solution (Doc ID 794505.1)

A quote from this Note really made me both frown and giggle:

…… Begin Quote

Note that the data inside the affected blocks is not salvageable. Methods like “Media Recovery” or “RMAN blockrecover” will not fix the problem unless the data file was backed up after the NOLOGGING operation was registered in the Redo Log.

…… END  Quote

I also learned that the way for Oracle identifies that the block was before invalidated due to NOLOGGING  is by updating most of the bytes in that block with 0xff but only if that “invalidate” redo is applied in a Recovery. If the associated redo/archived log file is used to RECOVER the data files ( which was the case in this preproduction environment) , Oracle invalidates such blocks and the error ORA-26040 along with error ORA-1578 are reported by SQL statements in the next block reads. So in this case the recovery  was done  but blocks were touched by NOLOGGING operations were marked soft corrupt during the recovery as those redo logs with flag “this block was nologging” were applied to the block.

http://docs.oracle.com/cd/E11882_01/backup.112/e10642/osadvsce.htm#BRADV90047

The recommendation is : Always performs backup after NOLOGGING operations, don’t touch those blocks again, use that backup for recovery.

 

As always  happy reading and till we meet again,

 

Mathijs

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s