Introduction
January 5th I wrote a post on the issues we faced with ASM instance which would not let me log in as sqlplus / as sysasm at specific point and time during which time alert log of the databases on that box would also be sending warnings to the alert log “.. ASM communication error”. With information on the web (Metalink) a solution and a workaround had been offered and implemented. For example on that specific box the oinstall gid was lacking in the first place (primary os group is dba (oracle:dba) so I had th Linux colleague added the oinstall onthat box. And as a workaround I created a tnsnames entry and connected via: sys@asm as sysasm that was also working well. So at that point and time we all thought , case closed.
Well…… Not entirely cause the issue showed again recently and even though the workaround (using the connect string method was working) I was not a happy Database Administrator with it. I opened a Tar with Oracle but I was going in circles with it this time.
Work Info
Last Friday the Issue showed again on a box in one of the clusters. An internal mail was sent within our team about this and a very interesting clue came back from one of the Colleagues who had similar experience in different project. He came up with following information on MOS:
Troubleshooting ORA-1031: Insufficient Privileges While Connecting As SYSDBA [ID 730067.1]
UNIX: Checklist for Resolving Connect AS SYSDBA Issues [ID 69642.1]
UNIX: Diagnostic C program for ORA-1031 from CONNECT INTERNAL / AS SYSDBA [ID 67984.1]
Actually especially last Note 67984.1 was very useful cause it showed that during time of issue the gid ( group Id ) was no longer valid due to an Ldap call.
With the Output of that note and the analyses after that it turned out that the NCSD daemon (http://www.linux.ncsu.edu/realm_linux/usersguide-EL4/ch04s06.php) might be part of the issue when something like that was queried on the OS:
# getent group dba
101
# getent group 5000
dba
#getent group dba
5000
When the Linux administrator configured the correct (exception) information in /etc/ldap.conf the problem vanished and the Phantom hunt ended.
Happy end
Bottom line of this:
- Never believe in phantoms, thinks like described happen for a reason.
- Always be willing to communicate with in the team and beyond cause communication might bring a so-called aha – Erlebnis (déjà vu).
- Standardize, standardize, standardize when you are using Ldap and local configurations cause you really let the ghost out of the machine otherwise.
- A special thank you to the colleagues who started the internal mail and to the one who shared his experiences with the team.
Happy reading,
Mathijs