First encounters of the ACFS kind

Introduction:

This weeks challenge has become to set up ACFS on two new to be implemented Real Application Clusters. As with any tooling first question will be , what is your goal if you want to start using it. My purpose is to start using ACFS as a shared mountpoint between the servers, nodes in the cluster in order to have each instance do its logging (ADR), audit and listener log  on that mount.

This blog will tell you about my setting it up,  the  issues i came across and the way it is implemented in a working way.

To share the plot of the story:  After implementing I  had the issue that  right after cluster reboot all services where launched automatiocally BUT  the instances would NOT start automatically. In the end i ended up with ACFS being defined as  a shared home for an Oracle RDBMS installation. And it worked/works after cluster reboot  cause all resources are happy and online now.

Create a general purpose file system.

After setting up the 1120.3 Grid Infrastructure in full , i have added Disks to an ASM disk group. Internally we have agreed  couple of things with regard to ACFS and the way RAC Dbs are set up :

  1. Every database shall have its own ACFS Disk group  in ASM (lot of servers are packed with database), so  we have better control.
  2. The ACFS diskgroup will have to be  set up in NORMAL  redundancy ( Bad experiences from the past in ASM if done differently) and we want the databases to start so this is pretty important a Diskgroup.
  3. Every Instance in the RAC database has its own mountpoint  where it will write ADR (diag_dest), audit files(audit_file_dest),  textcopy of the spfile.
  4. Every Database in the 11gr2 environment has a dedicated listener which will write its log file into one of the subdirectories in the same mountpoint as mentioned in previous point(3).

So this all sounded as a perfect  Case to implement ACFS to us.

After reading specific chapter in one great reference book:  Pro Oracle Database 11 RAC on Linux  Copyright © 2010 by Steve Shaw and Martin Bach (ISBN-13 (pbk): 978-1-4302-2958-2).  I went to work.

I have used ASMCA to implement following:

  1. Added the Disk  i got from the Linux Colleague to an ASM diskgroup (normal redundancy9
  2. Added a volume and set up the space that  the volume should be able to use ( which was all , 15 GB).
  3. Moved to the tab ACFS Cluster File systems  and had to make the following choices :
    1. Should i specify the ACFS home as a shared home for an Oracle RDBMS installation or
    2. Create a general purpose file system.
    3. And would i like to register the mountpoint ( if i choose general purpose fs)

Since i did not want to share Oracle software ( still uncommon in RAC country cause patching that shared software tree will be total outage of your cluster). I have chosen the  General purpose File system together with the register the mountpoint option.

After that i did some testing on the mounted ACFS file system ( added file on node one , moved to node two edited it, went to third node and opened shared  file etc). All looked happy .

Next test was to restart the cluster ware as root on all nodes to see if file system would come back. That also worked as was expected.

After that i have implement a 2 Node rac Database  and a local listener Dedicated to that specific database. All logging options have been  set up to write into  the ACFS shared file system ( so diag_dest, audit_file_dest and listener_log all point to that ACFS. Then i restarted  the grid infra on the first node . And after the cluster came back i noticed that the Grid Infra and ASM started properly, but that the Instance did not start automatically. And if i started the instance manually with srvctl start database -d <>  it worked …  Sounds like a puzzle…..

Found note in MOS. RAC Instances Not Auto Starting After Server Reboot and failing with ORA-48141 [ID 1488582.1]..

In the agent log of the grid infra i found another clue:

2012-10-09 09:21:01.185: [ora.eaina.db][1812027712] {1:32921:2} [start] clsnUtils::error Exception type=2 string=CRS-5017: The resource action “ora.<db>.db start” encountered the following error:ORA-48141: error creating directory during ADR initialization [/opt/oracle/<db>/diag] ORA-48189: OS command to create directory failed Linux-x86_64 Error: 13: Permission denied.

Explain for this is that  after a Node reboot or clusterware restart ,  the asm instance is still mounting the ACFS diskgroup when the Database already tries to acces it. It tries that 2 times by default and then gives up.

Speaking to colleague from Oracle he recommended two things:  Read note ACFS Technical Overview and Deployment Guide [ID 948187.1]. And even more important .. Even though  General ACFS should work for ADR in this case it comes highly recommend to change to shared Oracle Home  ACFS.

Performed following step :

  1. Unmounted the  ACFS file system as user Root on all  Nodes: /bin/umount -t acfs  -a
  2. Since that did not work ( filesystem busy)  performed grep open files to unmounts: lsof |grep <SID>.
  3. Stopped or killed those processes ( hint  1. Shutdown your db if diag_dest points to that ACFS. Hint 2 stop your listener on on all nodes  if your listener.log is also on this ACFS).
  4. Unregistered the mountpoint; decided to  dismount the diskgroup and delete it  in order to start from scratch.

Create a Database Home file system.

So with the shared information from scratch i performed following steps:

  1. Used ASMCA to create a diskgroup in NORMAL redundancy as a ACFS_DISKGRP for the first database.
  2. Put a volume on that Diskgroup (Entered volume name ( note 11 characters if I read correctly)). And assigned that volume to the diskgroup just created. And assigned the storage amount i wanted to use for this volume (that was 14G btw).
  3. On the oracle cluster file system tab the volume just created was already showing. VERY important for my purpose was that i selected  DATABASE HOME FILE SYSTEM.  I have entered the Database Home Mount point; the oracle home owner Name (oracle) and the Database Home owner Group (dba).
  4. On the next screen i was told to run a script as root: /opt/oracle/cfgtoollogs/asmca/scripts/cat acfs_script.sh. This script finished with:  ACFS file system is running on node1r,node2r,node3r.
  5.  Checked the resource for asm: ora.<sid>_acfs01.<sid>_acfs1.acfs
  6. Checked the resource in detail and saw:

START_DEPENDENCIES=hard(ora.<SID>_ACFS01.dg) pullup(ora.<SID>_ACFS01.dg) pullup:always(ora.asm)

COOOL so now my ACFS is waiting till the DISKGROUP is mounted.

Last steps to perform were the make the database resources in the clusterware aware of this acfs too:

## First i removed current db setting in the cluster ware:

srvctl remove database -d <SID>

##Then i registered it again with all the parameters  including acfs file system

srvctl add database -d <SID> -o /opt/oracle/product/11203_ee_64/db -c RAC -m test.nl -p +DATA01/<SID>/spfile<SID>.ora -t IMMEDIATE -a “<SID>_ACFS01,DATA01,<SID>_FRA01” -j “/opt/oracle/<SID>”

-j is the great one pointing to ACFS mounted file system for my ADRS

## I checked my activities ( adding db to the  cluster checked

ora.<SID>.db                                       OFFLINE    OFFLINE

## Then i added the following:
srvctl  add instance -d <SID> -i <SID>1 -n node1r
srvctl  add instance -d <SID> -i <SID>2 -n node2r
srvctl add service -d <SID> -s <SID>_TAF.test.nl -r <SID>1,<SID>2

## Checked:

srvctl stop database -d <SID> -o immediate
PRCC-1016 : <SID> was already stopped

## Started the database via the cluster ware:

srvctl start database -d <SID>

Final check..   As root bounced first  the cluster ware , After it came back i had what i was looking for, my instances started automatically too!  And as a coup d’ grace performed node reboot..  Everything started the way it was supposed to be so happy me.

One open issue is left ..   on Node 3  a local listener is running (no dbs though) that listener has just like othet two local listeners its logfile in the ACFS file system.. Listener starts too quick so i still see message , directory does not exist . But in total that is only a minor …

And with help following changes had been implemented on the local listener issue:

Dependency for the (local) Listener.

 ## first modified the acfs resource:

crsctl modify resource ora.<sid>_acfs01.<sid>_acfs1.acfs -attr “START_DEPENDENCIES=hard(ora.<sid>_ACFS01.dg) pullup(ora.<sid>_ACFS01.dg) pullup:always(ora.asm)”

## then:

crsctl modify resource ora.LISTENER_<sid>.lsnr -attr “START_DEPENDENCIES=hard(type:ora.cluster_vip_net1.type,ora.<sid>_acfs01.eaina_acfs1.acfs) pullup(type:ora.cluster_vip_net1.type,ora.<sid>_acfs01.eaina_acfs1.acfs)

Note there is single quote just after = and another one just before  last ”

And so this chapter can be closed.

Happy reading.

Mathijs.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s