Introduction:
On a frequent bases team members do hotline service where they act as a first line of contact when there are issues with the Oracle Environment on the various boxes. In the last week of 2013 Monday 30st of December it was my turn to do that duty when one of the interesting issues showed that even caused a Priority One incident on the server srvr_33r which is part of a 4 node Rac cluster with the database MYDB. So once again time to gear up and go investigate what was going on. Users received the following Error Message: TNS-12514: TNS:listener does not know of service requested in connect descriptor
Summary:
In the end it showed that the Oracle Instance on the 33r Box was suffering from a communication issue with its ASM instance since this ASM instance had issues with its shared_pool ( in the alert file of the Oracle Instance at the start of the incident i found : ORA-04031: unable to allocate 760 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^928″,”kglss”).
Due to that shared_pool issue the Instance was unable to communicate with its ASM Instance when it needed the information about its Local listener in the spfile. That made that information unavailable to the Oracle instance so the Cluster agent performed this : ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.20.30.72)(PORT=1521))))’ SCOPE=MEMORY SID=’MYDB1′;
So It altered my LOCAL_LISTENER in memory to the default listener on the cluster, listening on 1521 with NO services defined for this specific database !!
Even worse was that the environment was hit by an Oracle Bug ( which was pointed out by a team member to me : Bug 11772838 – oraagent modifies LOCAL_LISTENER with an incorrect value, when the parameter was set in spfile (Doc ID 11772838.8)
I have implemented the workaround for the listener and will have to schedule a maintenance window on the cluster to fix the Memory issues in the ASM instance(s).
Details:
Environment setting: 4 node 11.2.0.3 Grid Infrastructure with a mix of Databases (and versions) present. Every database has a dedicated listener on a specific port. And services for that database register with that dedicated listener. Every database has its spfile in an ASM Diskgroup.
When i started performing checks i first looked at the listener on the first Node ( Instance1 is on srvr_33r)
# lsnrctl status listener_MYDB LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 30-DEC-2013 10:02:51 Copyright (c) 1991, 2011, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=srvr_33r-vip)(PORT=20008)(IP=FIRST))) STATUS of the LISTENER ------------------------ Alias LISTENER_MYDB Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production Start Date 23-OCT-2013 01:21:27 Uptime 68 days 9 hr. 41 min. 24 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /opt/oracle/product/11203_ee_64/db/network/admin/listener.ora Listener Log File /opt/oracle/MYDB/admin/network/log/listener_MYDB.log Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.20.30.72)(PORT=20008))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.20.30.71)(PORT=20008))) Services Summary... Service "MYDB" has 1 instance(s). Instance "MYDB", status UNKNOWN, has 1 handler(s) for this service... The command completed successfully
And i compared that to the information in the listener on the other node:
(Instance 2 is on srvr_34r) oracle@srvr_34r:/opt/oracle [MYDB2]# lsnrctl status listener_MYDB LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 30-DEC-2013 10:00:27 Copyright (c) 1991, 2011, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=srvr_34r-vip)(PORT=20008)(IP=FIRST))) STATUS of the LISTENER ------------------------ Alias LISTENER_MYDB Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production Start Date 23-OCT-2013 03:10:15 Uptime 68 days 7 hr. 50 min. 12 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /opt/oracle/product/11203_ee_64/db/network/admin/listener.ora Listener Log File /opt/oracle/MYDB/admin/network/log/listener_MYDB.log Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.20.30.75)(PORT=20008))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.20.30.73)(PORT=20008))) Services Summary... Service "MYDB" has 1 instance(s). Instance "MYDB", status UNKNOWN, has 1 handler(s) for this service... Service "MYDB.prod.nl" has 1 instance(s). Instance "MYDB2", status READY, has 1 handler(s) for this service... Service "MYDB_TAF.prod.nl" has 1 instance(s). Instance "MYDB2", status READY, has 1 handler(s) for this service... The command completed successfully
Conclusion: So in short the listener on srvr_33r for this specific database was unaware of its services even though the services where / are in place.
HOWEVER when I checked for the local listener on the srvr_33r server in an sqlplus session this is what I saw:
SQL> show parameter listener NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ listener_networks string local_listener string (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.20.30.72)(PORT=1521)))) remote_listener string srvr_3rcl-scan:33100
Which proves that the Spfile was at this morning time during incident ( December 30 as of 09:00) no longer aware of its LOCAL listener !!! In the end we fixed the issue by adding the local listener again and after that issues with connectivity where resolved.
Note when I check the listener Logs I do see that listener was unaware of services at the start of the incident .
/opt/oracle/MYDB/admin/network/log/listener_MYDB.log 30-DEC-2013 00:14:40 * (CONNECT_DATA=(SERVICE_NAME=MYDB.prod.nl)(SERVER=DEDICATED)(CID=(PROGRAM=c:\oraclexe\app\oracle\product\11.2.0\server\bin\ORACLE.EXE)(HOST=X1234567)(USER=SYSTEM))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.60.205.57)(PORT=58445)) * establish * MYDB.prod.nl * 12514 TNS-12514: TNS:listener does not currently know of service requested in connect descriptor 30-DEC-2013 00:14:40 * (CONNECT_DATA=(SERVICE_NAME=MYDB.prod.nl)(SERVER=DEDICATED)(CID=(PROGRAM=c:\oraclexe\app\oracle\product\11.2.0\server\bin\ORACLE.EXE)(HOST=X1234567)(USER=SYSTEM))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.60.205.57)(PORT=58446)) * establish * MYDB.prod.nl * 12514 TNS-12514: TNS:listener does not currently know of service requested in connect descriptor
In the alert logs of the Instance I found this information and I think that this has caused our PRiO 1 incident due to the instance could not talk to the asm instance:
view /opt/oracle/MYDB/admin/Arch/alert_MYDB1.log.20131230_000158 See Note 411.1 at My Oracle Support for error and packaging details. WARNING: ASM communication error: op 0 state 0x0 (15055) ERROR: direct connection failure with ASM Dumping diagnostic data in directory=[cdmp_20131229112802], requested by (instance=1, osid=29870 (M000)), summary=[incident=48997]. Sun Dec 29 11:28:07 2013 Errors in file /opt/oracle/MYDB/diag/rdbms/MYDB/MYDB1/trace/MYDB1_m000_29870.trc (incident=48998): ORA-04031: unable to allocate 760 bytes of shared memory ("shared pool","unknown object","KKSSP^928","kglss") Incident details in: /opt/oracle/MYDB/diag/rdbms/MYDB/MYDB1/incident/incdir_48998/MYDB1_m000_29870_i48998.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. WARNING: ASM communication error: op 0 state 0x0 (15055) ERROR: direct connection failure with ASM Sun Dec 29 11:28:08 2013 Dumping diagnostic data in directory=[cdmp_20131229112808], requested by (instance=1, osid=29870 (M000)), summary=[incident=48998]. Sun Dec 29 11:28:33 2013 ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.20.30.72)(PORT=1521))))' SCOPE=MEMORY SID='MYDB1';
Addendum:
After Altering the local listener to the required and correct listener i closed the ticket. Much to my surprise after couple of hours to be kicked by the same issue on the same database again. A Colleague pointed me to Mos to Bug 11772838 – oraagent modifies LOCAL_LISTENER with an incorrect value, when the parameter was set in spfile (Doc ID 11772838.8) . So that proved i would need to make another step to have a proper workaround in place before i could start plan a maintenance window that would help to fix the cause of all this ( which was the Oracle Instance unable to communicate to its ASM instance when it needed information about the local listener in its spfile).
As Always Happy reading,
Mathijs