ASM communication Issues ((ORA-01031: insufficient privileges) (WARNING: ASM communication error: op 0 state 0x0 (15055))

Introduction:

A couple of months ago  I had the issue that in an 11.2.0.2 environment  with Grid Infra structure  and ASM on Linux , without any specific reason at a specific point and time would  see  messages  about communication error between the databases and  the ASM instance and  I was not able to connect to the ASM instance with a sqlplus / as sysasm. I have opened a tar back then and even got myself  a fresh bug 14767353 number but no answers. So  But practically this remained unsolved. Friday 4th  I had same issue again, so it was time to get back to arms and investigate this.

Below you will find the steps , what i saw and how I solved it with the help of my friends(Google and Metalink).

Environment: 4 Node GI 11.2.0.2.0 on Red Hat Linux  with ASM. Installations performed as oracle:dba

Case:

Basically I saw the following:

  1. in the alerts of the databases connecting to the asm instance on the specific box i saw so that did not look all too good:
Fri Jan 04 15:58:06 2013
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
  1. when i tried to connect sqlplus  / as sysasm to one of my ASM instances on the first box without password I would get:
oracle@mynode01hr:/opt/oracle []# sqlplus / as sysasm
SQL*Plus: Release 11.2.0.2.0 Production on Fri Jan 4 15:18:56 2013
Copyright (c) 1982, 2010, Oracle.  All rights reserved.
ERROR:
ORA-01031: insufficient privileges

 

Approach / Findings:

First change to make things look more standard was that from the second ASM instance I have issued following changes:

In its setup these ASM instances only had local listener set up for the VIP  on the specific of the box but no remote listener. So that was one thing I thought needed setting up properly anyhow:

alter system set remote_listener='mynode01cl-scan.vfnl.dc-ratingen.de:1521' scope = both sid = '*';

After that  tested connectivity (s+ is an alias for sqlplus  ‘/  as sysdba’) :

oracle@mynode01hr:/opt/oracle []# s+                      

 SQL*Plus: Release 11.2.0.2.0 Production on Fri Jan 4 15:18:56 2013
Copyright (c) 1982, 2010, Oracle.  All rights reserved.
ERROR:
ORA-01031: insufficient privileges

By the way I could not connect to any of the instances on the box locally (without connect string). Connecting using the connect string however that did work:

oracle@mynode01hr:/opt/oracle []# sqlplus system@mydb11

SQL*Plus: Release 11.2.0.2.0 Production on Fri Jan 4 15:19:39 2013

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Enter password:

Connected to:

So after round one in this quest , after setting up remote listener I still had that same issue. Though it was better practice in place it only solved the issues partly.

On metalink  (which sometimes feels more like an Oracle of Delphi) I saw two workarounds , one would be change primary os-group to oinstall (not preferred because with us still old fashioned but working with dba as primary group so changing that in a running production box did not sound like a cunning plan). Second workaround gave me  a good clue , connect via connect string  which i tested with the normal databases and the ASM instance.

oracle@mynode01hr:/opt/oracle []# sqlplus sys@myasm1 / as sysdba
SQL*Plus: Release 11.2.0.2.0 Production on Fri Jan 4 15:21:01 2013
Copyright (c) 1982, 2010, Oracle.  All rights reserved.
Enter password: 
Connected to:
oracle@mynode01hr:/opt/oracle []# sqlplus sys@mydb11 as sysdba
SQL*Plus: Release 11.2.0.2.0 Production on Fri Jan 4 15:21:25 2013
Copyright (c) 1982, 2010, Oracle.  All rights reserved.
Enter password:
Connected to:

So after reading and test that was a good thing half the puzzle was solved since I was able to  communicate with both databases and the ASM instance as soon as I Added a connect string to my sqlplus session. I performed an extra test to see if the communication between the Databases Instance and the ASM instance was ok by  resizing a datafile.

SQL> alter database datafile '+MYDB1_DATA/mydb1/datafile/users.259.771667551' resize 150M;
Database altered.

So that did prove that the database and the ASM instance did work together properly. Which makes the issues at hand a little less dramatic.

As I wrote  on  Metalink   I had seen a suggestion to change to primary os- group  of oracle (my environments are all oracle:dba)  to oracle:oinstall).  Even though I did not like it in the first or second place it gave me a thought I should check about os-groups on the Linux level.  During that check the good thing was that then I noticed a delta in /etc/group on all 3 other  boxes.

oracle@mynode02hr:/opt/oracle []# cat /etc/group|grep oracle

oinstall:x:47374:oracle                                 <—-   not present on  mynode01hr , 04hr , 05hr

oracle@mynode02hr:/opt/oracle [MYDB11]# id -a oracle

uid=101(oracle) gid=101(dba) groups=47374(oinstall),101(dba)

This showed that oracle at least had the group oinstall available !!!! on the second node (02hr) where everything was working properly. The oinstall  group was lacking on all other 3 nodes including on the first node with the issues . Now that made me wonder. What if Oracle still want to have the group oinstall present even if was not the primary os group.. I contacted Linux guru and he was kind enough to help me , he added the group ( locally ). Typically to be tested with:

id -a oracle

I tested it on the first Box  01hr which was having issues and it worked!  I can connect  without ORA-01031: insufficient privileges directly again to ASM without password and without connect string. And also no more  messages in the Alerts of the instances that are connected to this Instance:

Fri Jan 04 15:58:06 2013
WARNING: ASM communication error: op 0 state 0x0 (15055)
Final check if all was the same:
for i in 1 2 4 5 ; do ssh -q oracle@mynode0${i}hr "id -a oracle"; done
 uid=101(oracle) gid=101(dba) groups=47374(oinstall),101(dba)
uid=101(oracle) gid=101(dba) groups=47374(oinstall),101(dba)
uid=101(oracle) gid=101(dba) groups=47374(oinstall),101(dba)
uid=101(oracle) gid=101(dba) groups=47374(oinstall),101(dba)

Conclusion:

The Linux boxes were not set-up in a similar way and fortunately  the correct set-up (with both the os-groups oinstall and dba being present) was in place on one box. By comparing that to the other boxes I found the missing piece in the puzzle after which my connect errors and the messages in the alerts of the databases vanished. It looks very much that the oinstall group simply has to be out there with the user oracle at least being a member of that group.

As always happy reading.

Mathijs.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s