ASM 11.2.0.2 Is Not Releasing File Descriptors After Drop or Dismount Diskgroup. [ID 1306574.1]

Introduction.

One of my Linux environments is set up with 11.2.0.2.0. both Grid Infra structure and Rdbms together with ASM  using  luns that are part of EMC  BCVS copy  of production for reporting purposes.   This environment can be classified as a  Oracle – Restart environment. As part of a process that is supposed to run by  automated  Job during the night the Database should  go through various stages ( shutdown  abort,  dismount the dedicated ASM diskgroups  per database in ASM).  After that the BCVS are handed over to the source side again for syncing  after which the database will be rebuild using the fresh  BCV split.

Well as I said , supposed  to run automated…  cause at this point and time , during the day this is working  properly but during the night we get  per disk in the disk group a message similar to:

asm-diskgroupname-data-001: map in use

From Reading  and searching  metalink  it was suggested to implement the latest PSU Patch January 2013  due to:

ASM 11.2.0.2 Is Not Releasing File Descriptors After Drop or Dismount Diskgroup. [ID 1306574.1]

So i started my Journey on  patching  this environment.

Psu Patch January 2013

On metalink i found following Combination,  patching the  grid infra structure  with January PSU 2013 for Grid infra (Patch 14841385 – 11.2.0.2.9 GI Patch Set Update (Includes Database PSU 11.2.0.2.9))

From the read me , i took first two action points:

  • Get the latest Opatch tool and install it (should be 11.2.0.3.0)
  • Since we will try  Opatch auto: Create  response file creation  by running: /opt/crs/product/112_ee_64/crs/OPatch/ocm/bin/ocm.rsp

After that i worked my through the readme to find this as a scenario ( and it sounded so plain and simple … )

Patching Oracle Restart Home

You must keep the Oracle Restart stack up and running when you are patching. Use the following instructions to patch the ACFS software in an Oracle Restart home, which is not the case in all installations. This is regardless of whether or not opatch auto is being used to patch the GI home.

For each node, perform the following steps to patch Oracle Restart home:

  1. On the local node, unmount the ACFS file systems. Use instructions in Section 2.8 for unmounting ACFS file systems.
  2. Stop the has stack: crsctl stop has
  3. Run <GridHome>/bin/acfsroot install.
  4. As root user execute: opatch auto <UNZIPPED_PATCH_LOCATION> -oh <ORACLE_HOME>

Well that does not sound too complicated ( and since we do not use Acfs there  steps one and three do not apply.

Stopping the has

I Issued the  command:

crsctl stop has

In the output i saw that specific resources where failing to stop. So I ran this command various times and each time checked  the status of the has daemon with:

crsctl check has

After two retries  finally got the feedback i was looking for (unable to communicate with has daemon which means it was down).

Lets  Run the Patch !

I have unzipped the patch in /opt/oracle/ stage and psu jan 2013 show following Dirs:

drwxr-xr-x  5 oracle dba      4096 Apr  4  2012 13696242
drwxr-xr-x  6 oracle dba      4096 Feb 12 14:15 14727315

As  the root user i ran following command:

### 
opatch auto /opt/oracle/stage -oh /opt/crs/product/112_ee_64/crs 
## shows
MYSRVR:root:/root # opatch auto /opt/oracle/stage -oh /opt/crs/product/112_ee_64/crs
Executing /opt/crs/product/112_ee_64/crs/perl/bin/perl /opt/crs/product/112_ee_64/crs/OPatch/crs/patch11203.pl -patchdir /opt/oracle -patchn stage -oh /opt/crs/product/112_ee_64/crs -paramfile /opt/crs/product/112_ee_64/crs/crs/install/crsconfig_params
Bareword "ValidateCRSCTL" not allowed while "strict subs" in use at /opt/crs/product/112_ee_64/crs/crs/install/crspatch.pm line 166.
Bareword "ValidateCRSCTL" not allowed while "strict subs" in use at /opt/crs/product/112_ee_64/crs/crs/install/crspatch.pm line 177.
Compilation failed in require at /opt/crs/product/112_ee_64/crs/OPatch/crs/patch11203.pl line 397.
BEGIN failed--compilation aborted at /opt/crs/product/112_ee_64/crs/OPatch/crs/patch11203.pl line 397.

In plain text ,  that patch failed !  So i started looking in Metalink.  And i came across Quote: ”….

Applying PSU patch on top of Clusterware 11.2.0.2 with opatch 11.2.0.3.2 or higher, opatch fails with the following errors when the Clusterware stack is down.

Solution Please apply the patch with the “-olderver” option … end Quote.

Lets run  the patch with new Parameters:

MYSRVR:root:/opt/oracle/stage # opatch auto -och /opt/crs/product/112_ee_64/crs -olderver

This showed following output:

Using configuration parameter file: /opt/crs/product/112_ee_64/crs/crs/install/crsconfig_params
OPatch  is bundled with OCM, Enter the absolute OCM response file path:
/opt/crs/product/112_ee_64/crs/OPatch/ocm/bin/ocm.rsp
## enter location and enter
 
OPatch  is bundled with OCM, Enter the absolute OCM response file path:
/opt/crs/product/112_ee_64/crs/OPatch/ocm/bin/ocm.rsp
Invalid response file path, To regenerate an OCM response file run /opt/crs/product/112_ee_64/crs/OPatch/ocm/bin/emocmrsp

Even though i had create a response file in first run.. It went missing so i recreated it and started again.

MYSRVR:root:/opt/oracle/stage # opatch auto -och /opt/crs/product/112_ee_64/crs -olderver

### second patch failed  which i did not understand. BUT i  came across this brilliant note on the web ( with a big Thank you for blogging about this): http://deryaoktay.wordpress.com/2012/02/08/prerequisite-check-checkactivefilesandexecutables-failed-error-while-issueing-opatch-apply/

So in the logfiles i checked :

## Feb 12, 2013 10:50:16 AM]   Finish fuser command /sbin/fuser /opt/crs/product/112_ee_64/crs/bin/kfod at Tue Feb 12 10:50:16 CET 2013
[Feb 12, 2013 10:50:16 AM]   Start fuser command /sbin/fuser /opt/crs/product/112_ee_64/crs/lib/libclntsh.so.11.1 at Tue Feb 12 10:50:16 CET 2013
[Feb 12, 2013 10:50:16 AM]   Finish fuser command /sbin/fuser /opt/crs/product/112_ee_64/crs/lib/libclntsh.so.11.1 at Tue Feb 12 10:50:16 CET 2013
[Feb 12, 2013 10:50:16 AM]   Following executables are active :
                             /opt/crs/product/112_ee_64/crs/lib/libclntsh.so.11.1
[Feb 12, 2013 10:50:16 AM]   Prerequisite check "CheckActiveFilesAndExecutables" failed.
                             The details are:
Following executables are active :/opt/crs/product/112_ee_64/crs/lib/libclntsh.so.11.1

So let us  check  in the Linux Environment:

MYSRVR:root:/opt/oracle/stage # /sbin/fuser /opt/crs/product/112_ee_64/crs/lib/libclntsh.so.11.1

/opt/crs/product/112_ee_64/crs/lib/libclntsh.so.11.1: 22789m 27072m 30586m 30603m 30638m 30649m 30651m 30668m 30684m 30734m

MYSRVR:root:/opt/oracle/stage # ps -ef|grep 22789

root     10672 22677  0 11:10 pts/10   00:00:00 grep 22789

oracle   22789 20875  0 Feb08 pts/3    00:00:00 sqlplus   as sysasm

MYSRVR:root:/opt/oracle/stage # ps -ef|grep 27072

root     11792 22677  0 11:11 pts/10   00:00:00 grep 27072

oracle   27072     1  0 10:50 ?        00:00:03 /opt/crs/product/112_ee_64/crs/bin/ohasd.bin reboot

Ok so adapting my approach, I shutdown the Has daemon again , check again and kill the one process   that kept file open.

crsctl stop has

Then i checked again

MYSRVR:root:/opt/oracle/stage # /sbin/fuser /opt/crs/product/112_ee_64/crs/lib/libclntsh.so.11.1

/opt/crs/product/112_ee_64/crs/lib/libclntsh.so.11.1: 22789m

## MYSRVR:root:/opt/oracle/stage # /sbin/fuser /opt/crs/product/112_ee_64/crs/lib/libclntsh.so.11.1

/opt/crs/product/112_ee_64/crs/lib/libclntsh.so.11.1: 22789m

MYSRVR:root:/opt/oracle/stage # ps -ef|grep 22789

root     20226 22677  0 11:14 pts/10   00:00:00 grep 22789

oracle   22789 20875  0 Feb08 pts/3    00:00:00 sqlplus   as sysasm

MYSRVR:root:/opt/oracle/stage # kill -9 22789

MYSRVR:root:/opt/oracle/stage # /sbin/fuser /opt/crs/product/112_ee_64/crs/lib/libclntsh.so.11.1

Hmm  i confess i smiled and said gotsch . Which made us well prepared for  The next step now:

As Root:
opatch auto -och /opt/crs/product/112_ee_64/crs -olderver

First  patch was skipped,  second was implemented. So far so good.

But now what?

Much  to my surprise both patches have been applied against the  GI home . and none of them was parked  in the rdbms ..  and i thought i heard auto in the readme :0 .

And as always  so better and clearer readme would have been helpful. IN the end i have chosen this approach:

To my opinion the read me should have said:

MYSRVR:root:/opt/oracle/stage # opatch auto -och /opt/crs/product/112_ee_64/crs -olderver
MYSRVR:root:/opt/oracle/stage # opatch auto -och /opt/oracle/product/112_ee_64/db/ -olderver

But i wanted to see the other method so i followed steps below

Rdbms  needs b done seperately……

opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir ./
cd /opt/oracle/stage/14727315
As the Oracle User:
Set the correct environment ORACLE_HOME ( /opt/oracle/product/112_ee_64/db/)
Stopped all Instances  and listeners that are using this oracle Home.
opatch apply

Output showed:

OPatch failed with error code 73
 oracle@MYSRVR:/opt/oracle/stage/14727315 [MYDBR]# /sbin/fuser /opt/oracle/product/112_ee_64/db/lib/libclntsh.so.11.1
/opt/oracle/product/112_ee_64/db/lib/libclntsh.so.11.1: 29523m
 
oracle@MYSRVR:/opt/oracle/stage/14727315 [MYDBR]# ps -ef|grep 29523
oracle    5935 16707  0 14:24 pts/10   00:00:00 grep 29523
oracle   29523 24662  0 Feb08 pts/6    00:00:00 sqlplus           
oracle@MYSRVR:/opt/oracle/stage/14727315 [MYDBR]# kill -9 29523

#####killed that zombie sqlplus session  and reran as the Oracle User:

opatch apply

#### following Output made me a happy dba:

OPatch found the word "warning" in the stderr of the make command.
Please look at this stderr. You can re-run this make command.
Stderr output:
ins_precomp.mk:19: warning: overriding commands for target `pcscfg.cfg'
/opt/oracle/product/112_ee_64/db/precomp/lib/env_precomp.mk:2158: warning: ignoring old commands for target `pcscfg.cfg'
/opt/oracle/product/112_ee_64/db/precomp/lib/ins_precomp.mk:19: warning: overriding commands for target `pcscfg.cfg'
/opt/oracle/product/112_ee_64/db/precomp/lib/env_precomp.mk:2158: warning: ignoring old commands for target `pcscfg.cfg'
Composite patch 14727315 successfully applied.
OPatch Session completed with warnings.
Log file location: /opt/oracle/product/112_ee_64/db/cfgtoollogs/opatch/opatch2013-02-12_14-26-46PM_1.log

OPatch completed with warnings.

####and the warnings where

Feb 12, 2013 2:35:04 PM]    2) OUI-67215:

                             OPatch found the word “warning” in the stderr of the make command.

                             Please look at this stderr. You can re-run this make command.

                             Stderr output:

                             ins_precomp.mk:19: warning: overriding commands for target `pcscfg.cfg’

                             /opt/oracle/product/112_ee_64/db/precomp/lib/env_precomp.mk:2158: warning: ignoring old commands for target `pcscfg.cfg’

                             /opt/oracle/product/112_ee_64/db/precomp/lib/ins_precomp.mk:19: warning: overriding commands for target `pcscfg.cfg’

                             /opt/oracle/product/112_ee_64/db/precomp/lib/env_precomp.mk:2158: warning: ignoring old commands for target `pcscfg.cfg’

[Feb 12, 2013 2:35:04 PM]    ——————————————————————————–

[Feb 12, 2013 2:35:04 PM]    OUI-67008:OPatch Session completed with warnings.

I have checked MOS once more to make sue  that these warnings can be ignored:  Opatch warning: overriding commands for target xxxx [ID 1448337.1]

Post Patch activities in all the databases

cd $ORACLE_HOME/rdbms/admin
sqlplus /nolog
CONNECT / AS SYSDBA
STARTUP
@catbundle.sql psu apply
QUIT
###

Sounds like a Happy end so lets check:

SQL> select * from registry$history;
ACTION_TIME
---------------------------------------------------------------------------
ACTION                         NAMESPACE
------------------------------ ------------------------------
VERSION                                ID
------------------------------ ----------
COMMENTS
--------------------------------------------------------------------------------
BUNDLE_SERIES
------------------------------
APPLY                          SERVER
11.2.0.2                                9
PSU 11.2.0.2.9
PSU

The Aftermath

Frankly i still have the issue  that the diskgroups in ASM with a status  of dismounted are  not returned immediately to the OS. At  the moment we are  adding some delays to the automated  process ( build in sleep ) in the commands , and if that is not working , well will open a tar with Mos afterall.

PS

The  MOS note in the title is suggesting two patches,   well i still  have the issues  even with the PSU  jaunary patch 2013 and with  lsinventory as proof:

/opt/oracle/stage/logs [CUSTP1R]# grep 11666137 *
20130212CRS_After:     12595561, 9956835, 11666137, 10061015, 9672816, 11695416, 9744252
20130212CRS_Before:     12635537, 11733179, 11707699, 10281887, 10079168, 9651350, 11666137
20130212RDBMS_After:     12595561, 9956835, 11666137, 10061015, 9672816, 11695416, 9744252
/opt/oracle/stage/logs [CUSTP1R]# grep 11785938 *
20130212CRS_After:     10395345, 12596444, 9564886, 10373013, 11785938, 10213073, 11827088
20130212CRS_Before:     12620422, 10157402, 12797765, 10419984, 11785938, 11694127, 12586488
20130212RDBMS_After:     10395345, 12596444, 9564886, 10373013, 11785938, 10213073, 11827088

So  most likely .. this is a to be continued..

Happy Reading,

Mathijs

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s