When your Restart / Rac Node screams : [ohasd(1749)]CRS-0004:logging terminated

Introduction:

During  one of those days on a Monday where  a hot line ( ticket ) service was provided team received a ticket that all instances  of an Oracle Restart ( 11.2.0.3)  where down. I assisted in doing  troubleshooting  which is always fun because that gives a bit of a Sherlock Holmes idea investigating issues at hand .  After logging on to the box  I saw  that the ASM Instance was up  but a  “crsctl check  has”  showed that no communication with has daemon could be established.  So the puzzle became a bit more interesting at that moment .

Details:

  • Environment is an Oracle restart environment on Linux.
  • 11.2.0.3 Grid Infra with 11.2.0.3 Databases running on it
  • Grid infra has been installed in: /opt/crs/product/11203/crs/
  • Environment went down during the weekend ( fortunately it was a Test environment  with a standard office time SLA)

It was time to  take a peek in the alert-log of the cluster on that node.  Since my installation was done in: /opt/crs/product/11203/crs/  that meant I had to look for the log-file in the subdirectory:  /opt/crs/product/11203/crs/log/mysrvr01/   ( where in  the cluster mysrvr01 is the name of the node  i was working on ) to see the alert-file. In there I found this showing what was going on:

2014-10-18 10:47:37.060:
[ohasd(1749)]CRS-0004:logging terminated for the process. log file: “/opt/crs/product/11203/crs/log/mysrvr01/ohasd/ohasd.log”
2014-10-18 10:47:37.128:
[ohasd(1749)]CRS-10000:CLSU-00100: Operating System function: mkdir failed with error data: 28
CLSU-00101: Operating System error message: No space left on device
CLSU-00103: error location: authprep6
CLSU-00104: additional error information: failed to make dir /opt/crs/product/11203/crs/auth/ohasd/mysrvr01/A4107671
2014-10-18 10:47:37.146:
[ohasd(1749)]CRS-10000:CLSU-00100: Operating System function: mkdir failed with error data: 28
CLSU-00101: Operating System error message: No space left on device
CLSU-00103: error location: authprep6
CLSU-00104: additional error information: failed to make dir /opt/crs/product/11203/crs/auth/ohasd/mysrvr01/A5139099

With this as a clue dear Watson it was a lot easier to proceed.  Started hunting large files in the file system of the  /opt/crs/product/11203/crs/  and found a flood of XML files  that had not  been erased for ages there . Which made me realize that a clean up job in cron would become handy for this as well.

After  removing the xmls and checking the mount point again  I logged in as Root  and first issued a  crsctl start has.

After some time i noticed in the alert log of the node :
014-10-20 08:26:50.847:
[/opt/crs/product/11203/crs/bin/oraagent.bin(32461)]CRS-5818:Aborted command ‘check’ for resource ‘ora.DATA1.dg’. Details at (:CRSAGF00113:) {0:0:2} in /opt/crs/product/11203/crs/log/mysrvr01/agent/ohasd/oraagent_oracle/oraagent_oracle.log.
2014-10-20 08:26:50.847:
[/opt/crs/product/11203/crs/bin/oraagent.bin(32461)]CRS-5818:Aborted command ‘check’ for resource ‘ora.DUMMY_DATA.dg’. Details at (:CRSAGF00113:) {0:0:2} in /opt/crs/product/11203/crs/log/mysrvr01/agent/ohasd/oraagent_oracle/oraagent_oracle.log.
2014-10-20 08:26:50.847:
[/opt/crs/product/11203/crs/bin/oraagent.bin(32461)]CRS-5818:Aborted command ‘check’ for resource ‘ora.FRA1.dg’. Details at (:CRSAGF00113:) {0:0:2} in /opt/crs/product/11203/crs/log/mysrvr01/agent/ohasd/oraagent_oracle/oraagent_oracle.log.
2014-10-20 08:27:52.189:
[evmd(4105)]CRS-1401:EVMD started on node mysrvr01.
2014-10-20 08:27:53.685:
[cssd(4193)]CRS-1713:CSSD daemon is started in local-only mode
2014-10-20 08:27:55.581:
[ohasd(32421)]CRS-2767:Resource state recovery not attempted for ‘ora.diskmon’ as its target state is OFFLINE
2014-10-20 08:28:02.233:
[cssd(4193)]CRS-1601:CSSD Reconfiguration complete. Active nodes are mysrvr01 .

When I checked again 🙂 ah good news not only ASM instance running but also all Instances present on the box .  Another puzzle solved.

Happy reading and till next Time ,

Mathijs

PRVF-4557 During installation of Oracle Binaries ( 11.2.0.3)

Introduction:

During an Oracle software install for a new RAC environment once again  I was surprised by Oracle . I had installed the Grid infrastructure (GI) 11.2.0.3 on RedHat Linux with success and I was about to install the Oracle Binaries which should be a piece of cake with the installed  GI up and running properly.  According to the old quote “Never stop learning ( or being surprised  )”   during the preparations the runInstaller  managed to give me a surprise  and I was unable to continue with installation. So obviously this  surprise needed  fixing first.

 

Details:

When running the runInstaller , after choosing the option to install the Oracle binaries on a Real Application Cluster environment  and after  a couple of options  I received a pop up telling me :

PRVF-4557 : Node application “ora.svrb1hr.vip” is offline on node

I was unable to go ahead and had to investigate. As always Google and the Oracle Community where  my brothers in oracle arms for this so i Came across this scenario :

First I checked my hosts file to make sure  information to be present on both nodes. That showed  following details which looked ok:

oracle@svrb1hr:/opt/oracle [CRM]# grep  vip /etc/hosts 
10.97.242.32 svrb1hr-vip.myenv.dc-world.de svrb1hr-vip
10.97.242.33 svrb2hr-vip.myenv.dc-world.de svrb2hr-vip

Next step was  checking my cluster resources. As  mentioned before GI install had finished  properly the day before so that really made me wonder:

oracle@svrb1hr:/opt/oracle [CRM]# crsctl status resource -t 
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CLUSTERDATA.dg
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.LISTENER.lsnr
               ONLINE  OFFLINE      svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.asm
               ONLINE  ONLINE       svrb1hr                 Started             
               ONLINE  ONLINE       svrb2hr                 Started             
ora.gsd
               OFFLINE OFFLINE      svrb1hr                                     
               OFFLINE OFFLINE      svrb2hr                                     
ora.net1.network
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.ons
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.registry.acfs
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       svrb1hr                                     
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       svrb2hr                                     
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       svrb1hr                                     
ora.cvu
      1        ONLINE  ONLINE       svrb2hr                                     
ora.svrb1hr.vip
      1        ONLINE  INTERMEDIATE svrb2hr                 FAILED OVER         
ora.svrb2hr.vip
      1        ONLINE  ONLINE       svrb2hr                                     
ora.oc4j
      1        ONLINE  ONLINE       svrb1hr                                     
ora.scan1.vip
      1        ONLINE  ONLINE       svrb1hr                                     
ora.scan2.vip
      1        ONLINE  ONLINE       svrb2hr                                     
ora.scan3.vip
      1        ONLINE  ONLINE       svrb1hr                                     

Well that was unexpected, because  expectation was that the srvrb1hr.vip should be running on the first node. Still don’t understand what happened to cause this. Hmm and frankly if you have suggestions what happened please let me know . But  I did know  it was needed to bring back the vip  address to the first server.

First attempt  was to issue the command needed  ( crs_relocate ) on the node where  I was already working ( node 1) .

oracle@svrb1hr:/opt/oracle [CRM]# which crs_relocate
/opt/crs/product/11203/crs/bin/crs_relocate

oracle@svrb1hr:/opt/oracle [CRM]# crs_relocate svrb1hr.vip
CRS-0210: Could not find resource ‘svrb1hr.vip’.
###  activities needed 2 b done from second node

Grumbling with this  but well at least it was explained what to do next ….

So I opened a session against the second node , made sure my Oracle Home was pointing to the GI.

oracle@svrb1hr:/opt/oracle [CRM]# ssh svrb2hr

On the second box  the command crs_relocate was entered:

                     
oracle@svrb2hr:/opt/oracle [CRS]# crs_relocate ora.svrb1hr.vip
Attempting to stop `ora.svrb1hr.vip` on member `svrb2hr`
Stop of `ora.svrb1hr.vip` on member `svrb2hr` succeeded.
Attempting to start `ora.svrb1hr.vip` on member `svrb1hr`
Start of `ora.svrb1hr.vip` on member `svrb1hr` succeeded.
Attempting to start `ora.LISTENER.lsnr` on member `svrb1hr`
Start of `ora.LISTENER.lsnr` on member `svrb1hr` succeeded.

Well that  looked promessing  so let’s check one more time then:

 

oracle@svrb2hr:/opt/oracle [CRS]#  crsctl status resource -t 
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CLUSTERDATA.dg
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.LISTENER.lsnr
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.asm
               ONLINE  ONLINE       svrb1hr                 Started             
               ONLINE  ONLINE       svrb2hr                 Started             
ora.gsd
               OFFLINE OFFLINE      svrb1hr                                     
               OFFLINE OFFLINE      svrb2hr                                     
ora.net1.network
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.ons
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.registry.acfs
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       svrb1hr                                     
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       svrb2hr                                     
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       svrb1hr                                     
ora.cvu
      1        ONLINE  ONLINE       svrb2hr                                     
ora.svrb1hr.vip
      1        ONLINE  ONLINE       svrb1hr                                     
ora.svrb2hr.vip
      1        ONLINE  ONLINE       svrb2hr                                     
ora.oc4j
      1        ONLINE  ONLINE       svrb1hr                                     
ora.scan1.vip
      1        ONLINE  ONLINE       svrb1hr                                     
ora.scan2.vip
      1        ONLINE  ONLINE       svrb2hr                                     
ora.scan3.vip
      1        ONLINE  ONLINE       svrb1hr

 

Much better!!  after this I restarted the runInstaller again from the first node and indeed installation of  the Oracle Binaries was flawless.

 

As always  happy  reading ,

 

Mathijs