Introduction:
At the moment I am part of a team that will upgrade a billing environment from 10.2.0.5 to 11.2.0.4 on Solaris. After a restore and point in time recovery on a preproduction environment the team has upgraded all databases using the manual upgrade method ( still not that confident that the upgrades would run as smooth with the dbua….. ). As always after the upgrade there is a time of hyper care where the alert files should be monitored even more closely for funky issues. This blog is giving you some more details on the funky stuff we came across and how we analyzed it. The one thing that did make my frown during this action however was the fact that various tools showed various results ( not necessarily showing all the same results.
Conclusions:
When upgrading important environments a DBA should really have eyes and ears everywhere before performing that upgrade because once upgraded it will be a tough call to proof that something was broken in the old environment already or that it was caused by the upgrade. So I would really recommend to scrutinize your environment like someone studies a creature under a microscope before upgrading questioning broken jobs , invalid objects and as I learned the hard way during this upgrade .. objects with no logging option (DBA_TABLES, DBA_INDEXES, DBA_LOBS, DBA_TAB_PARTITIONS, DBA_LOB_PARTITIONS, DBA_TAB_SUBPARTITIONS, etc.
LOGGING=’NO’ indicates NOLOGGING. ). In my case e a number of objects had been created with the no logging option and indeed only after upgrading I had to fight my way back to find out what was going on , what objects were involved ( in my case it was related to a number of objects , staging tables for a specific user where a drop and recreate and exp/imp dealt with it). I did recommend to make sure that there will be force logging for the complete database ( but was told that customer wants to be able to create objects with no logging option for performance reasons).
As I mentioned in the blog in the End I was plain lucky that this was caused by staging tables which made the scenario more easy since we could drop and recreate them easily. Once again I learned something ( and found proof again that success just loves preparation). Of course this is a slightly different scenario then the one we will run in production since we will run the Upgrade directly but still would need to bear in mind the objects in production created with the no logging option if one of the databases in scope would need a restore if the upgrade would break the database. Well in this case all databases where upgraded with success so all DBAS happy and confident for the production migration which is scheduled.
Details:
SERVER: mysrvr
Database: OPTMYDB
Timeline:
- All Databases of the billing environment have been restored to the new OPT server called mysrvr.
- We gave upgraded all Databases from 10.2.0.5 to 11.2.0.4 on Solaris.
- After opening the database OPTMYDB and running statistics jobs she started throwing error messages in alert about block corruptions as seen in Example 1 below. Yesterday together with a colleague from Adba team the gather statistics issue was resolved and three objects had been identified to have corrupt blocks ( see Example2) . These three objects have been dropped and recreated and there were no more corruption messages in the alert file.
- After that I have been asked by my manager to run a dbv for the full database checking all datafiles .
- I also have used rman in Parallel to find possible corruption which is still running but first output is in example3 :
Example 1
## In the alertfile this was the information We came across setting off the first alarm bells:
Mon Aug 04 22:00:43 2014
DBMS_STATS: GATHER_STATS_JOB encountered errors. Check the trace file.
Errors in file /opt/oracle/OPTMYDB/diag/rdbms/MYDB/OPTMYDB/trace/OPTMYDB_j002_3640.trc:
ORA-20011: Approximate NDV failed: ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
error opening file /home/tmcon/oracle/UNMATCHED_SUBS_3640.log
Errors in file /opt/oracle/OPTMYDB/diag/rdbms/MYDB/OPTMYDB/trace/OPTMYDB_j002_3640.trc (incident=8665):
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690)
ORA-01110: data file 620: ‘/db/MYDB/data/tbsdata_e/tbsdip01_4m_14.dbf’
ORA-26040: Data block was loaded using the NOLOGGING option
Incident details in: /opt/oracle/OPTMYDB/diag/rdbms/MYDB/OPTMYDB/incident/incdir_8665/OPTMYDB_j002_3640_i8665.trc
Example2
I have grepped the ORA-01578 from the alert file and added them to a table to get an overview of datafiles and blocks having the issues. And since the alertfile is a growing objects this is what I got over t he time interval since starting up the database as a 11020.4 environment.
grep ORA-01578 alert_OPTMYDB.log | AFN | BL |
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) | 620 | 391690 |
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) | 620 | 391690 |
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) | 620 | 391690 |
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) | 620 | 391690 |
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) | 620 | 391690 |
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) | 620 | 391690 |
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) | 620 | 391690 |
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) | 620 | 392202 |
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) | 620 | 392202 |
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) | 620 | 392202 |
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) | 620 | 392202 |
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) | 620 | 392202 |
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) | 620 | 392202 |
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) | 620 | 392202 |
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) | 661 | 423946 |
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) | 661 | 423946 |
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) | 661 | 423946 |
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) | 661 | 423946 |
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) | 661 | 423946 |
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) | 661 | 423946 |
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) | 661 | 423946 |
## On mos I retrieved a note that offered more information on the object that was located around the corrupted block:
select file_id AFN, relative_fno, tablespace_name | ||
from dba_data_files | ||
where relative_fno=&RFN; | ||
Enter value for rfn: 620 | ||
old 3: where relative_fno=&RFN | ||
new 3: where relative_fno=620 | ||
AFN RELATIVE_FNO TABLESPACE_NAME | ||
———- ———— —————————— | ||
3689 620 UNDO_TBS_01 | ||
620 620 TBSDIP01_4M | ||
## Used this statement to find the object that was on that spot in the tablespace: | ||
select * | ||
from dba_extents | ||
where file_id = &AFN | 620 | |
and &BL between block_id AND block_id + blocks – 1; | 391690 | |
620 | ||
392202 | ||
SQL> select * | ||
from dba_extents | ||
where file_id = &AFN | ||
and &BL between block_id AND block_id + blocks – 1; | ||
2 3 4 Enter value for afn: 620 | ||
old 3: where file_id = &AFN | ||
new 3: where file_id = 620 | ||
Enter value for bl: 391690 | ||
old 4: and &BL between block_id AND block_id + blocks – 1 | ||
new 4: and 391690 between block_id AND block_id + blocks – 1 | ||
OWNER | ||
—————————— | ||
SEGMENT_NAME | ||
——————————————————————————– | ||
PARTITION_NAME SEGMENT_TYPE TABLESPACE_NAME | ||
—————————— —————— —————————— | ||
EXTENT_ID FILE_ID BLOCK_ID BYTES BLOCKS RELATIVE_FNO | ||
———- ———- ———- ———- ———- ———— | ||
SCOTT | ||
SCOTT_BPS_05062014000_NOBP | ||
TABLE TBSDIP01_4M | ||
0 620 391689 4194304 512 620 | ||
SQL> select * | ||
from dba_extents | ||
where file_id = &AFN | ||
and &BL between block_id AND block_id + blocks – 1; | ||
2 3 4 Enter value for afn: 620 | ||
old 3: where file_id = &AFN | ||
new 3: where file_id = 620 | ||
Enter value for bl: 392202 | ||
old 4: and &BL between block_id AND block_id + blocks – 1 | ||
new 4: and 392202 between block_id AND block_id + blocks – 1 | ||
OWNER | ||
—————————— | ||
SEGMENT_NAME | ||
——————————————————————————– | ||
PARTITION_NAME SEGMENT_TYPE TABLESPACE_NAME | ||
—————————— —————— —————————— | ||
EXTENT_ID FILE_ID BLOCK_ID BYTES BLOCKS RELATIVE_FNO | ||
———- ———- ———- ———- ———- ———— | ||
SCOTT | ||
SCOTT_BPS_05062014000_NORUEB | ||
TABLE TBSDIP01_4M | ||
0 620 392201 4194304 512 620 | ||
SQL> select file_id AFN, relative_fno, tablespace_name | ||
from dba_data_files | ||
where relative_fno=&RFN; | ||
2 3 Enter value for rfn: 661 | ||
old 3: where relative_fno=&RFN | ||
new 3: where relative_fno=661 | ||
AFN RELATIVE_FNO TABLESPACE_NAME | ||
———- ———— —————————— | ||
661 661 TBSDIP01_4M | ||
select * | ||
from dba_extents | ||
where file_id = &AFN | ||
and &BL between block_id AND block_id + blocks – 1; | 661 | 423946 |
OWNER | ||
—————————— | ||
SEGMENT_NAME | ||
——————————————————————————– | ||
PARTITION_NAME SEGMENT_TYPE TABLESPACE_NAME | ||
—————————— —————— —————————— | ||
EXTENT_ID FILE_ID BLOCK_ID BYTES BLOCKS RELATIVE_FNO | ||
———- ———- ———- ———- ———- ———— | ||
SCOTT | ||
SCOTT_SII_05062014000R_BAN | ||
TABLE TBSDIP01_4M | ||
0 661 423945 4194304 512 661 |
Example3
On Mos I also retrieved a note that offered RMAN as a tool to find logical corruption. Since I am curious by nature of course had to use that goodie too !
RMAN> run {
allocate channel d1 type disk;
allocate channel d2 type disk;
allocate channel d3 type disk;
allocate channel d4 type disk;
backup validate check logical database;
}
## Running this in Rman will put rows if any in the v$database_block_corruption. When i asked for more details on this view i was told that it should return to 0 rows when rerunning. I restarted the rman process after some time (so maybe since it did not complete it did not wrap up properly but at second run I noticed still rows present there).
SQL> select * from V$DATABASE_BLOCK_CORRUPTION
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
———- ———- ———- —————— ———
235 34825 92 1.3760E+13 NOLOGGING
235 323083 510 1.3760E+13 NOLOGGING
548 315914 833 1.3760E+13 NOLOGGING
648 148489 512 1.3760E+13 NOLOGGING
## Well the good thing was that this showed in one view what most likely was the cause of the issue : Objects being created with nologging option thus invalidating a 100 % recovery after a restore ( and recovery of a number of archives ) and upgrade to 11G after that.
From Oracle support I got this as a good note as it turned out that the corruption was caused by no logging operations:
The Gains and Pains of Nologging Operations in a Data Guard Environment (Doc ID 290161.1)
ORA-1578 / ORA-26040 Corrupt blocks by NOLOGGING – Error explanation and solution (Doc ID 794505.1)
A quote from this Note really made me both frown and giggle:
…… Begin Quote
Note that the data inside the affected blocks is not salvageable. Methods like “Media Recovery” or “RMAN blockrecover” will not fix the problem unless the data file was backed up after the NOLOGGING operation was registered in the Redo Log.
…… END Quote
I also learned that the way for Oracle identifies that the block was before invalidated due to NOLOGGING is by updating most of the bytes in that block with 0xff but only if that “invalidate” redo is applied in a Recovery. If the associated redo/archived log file is used to RECOVER the data files ( which was the case in this preproduction environment) , Oracle invalidates such blocks and the error ORA-26040 along with error ORA-1578 are reported by SQL statements in the next block reads. So in this case the recovery was done but blocks were touched by NOLOGGING operations were marked soft corrupt during the recovery as those redo logs with flag “this block was nologging” were applied to the block.
http://docs.oracle.com/cd/E11882_01/backup.112/e10642/osadvsce.htm#BRADV90047
The recommendation is : Always performs backup after NOLOGGING operations, don’t touch those blocks again, use that backup for recovery.
As always happy reading and till we meet again,
Mathijs