TNS-12514 in an Oracle Rac Environment

Introduction:

On a frequent bases team members do hotline service where they act as a first line of contact when there are issues with the Oracle Environment on the various boxes. In the last week of 2013 Monday 30st of December  it was my turn to do that duty when one of the interesting  issues showed that  even caused a Priority One incident on the server srvr_33r which is part of a  4 node Rac cluster with the  database MYDB. So once again time to gear up and go investigate what was going on. Users received the following Error Message:  TNS-12514: TNS:listener does not know of service requested in connect descriptor

Summary:

In the end it showed that the Oracle Instance on the 33r Box was suffering from a communication issue with its ASM instance since this ASM instance had issues with its shared_pool ( in the alert file of the Oracle Instance at the start of the incident i found : ORA-04031: unable to allocate 760 bytes of shared memory (“shared pool”,”unknown object”,”KKSSP^928″,”kglss”).

Due to that  shared_pool issue the Instance was unable to communicate with its ASM Instance when it needed the information about  its Local listener in the spfile. That made that information unavailable to the Oracle instance so the Cluster agent performed this :  ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.20.30.72)(PORT=1521))))’ SCOPE=MEMORY SID=’MYDB1′;

So It altered my LOCAL_LISTENER in memory to the default listener on the cluster, listening on 1521 with NO services defined for this specific database !!

Even worse was that the environment was hit by an Oracle Bug ( which was pointed out by a team member to me : Bug 11772838 – oraagent modifies LOCAL_LISTENER with an incorrect value, when the parameter was set in spfile (Doc ID 11772838.8)

I have implemented the workaround for the listener and will have to schedule a maintenance window on the cluster to fix the Memory issues in the ASM instance(s).

Details:

Environment setting:  4 node 11.2.0.3 Grid Infrastructure  with a mix of  Databases (and versions) present. Every database has a dedicated listener on a specific port. And services for that database register with that dedicated listener. Every database has its spfile in an ASM Diskgroup.

When i started performing checks i first looked at the listener on the first Node ( Instance1 is on srvr_33r)

# lsnrctl status listener_MYDB
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 30-DEC-2013 10:02:51
 Copyright (c) 1991, 2011, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=srvr_33r-vip)(PORT=20008)(IP=FIRST)))
 STATUS of the LISTENER
 ------------------------
 Alias LISTENER_MYDB
 Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production
 Start Date 23-OCT-2013 01:21:27
 Uptime 68 days 9 hr. 41 min. 24 sec
 Trace Level off
 Security ON: Local OS Authentication
 SNMP OFF
 Listener Parameter File /opt/oracle/product/11203_ee_64/db/network/admin/listener.ora
 Listener Log File /opt/oracle/MYDB/admin/network/log/listener_MYDB.log
 Listening Endpoints Summary...
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.20.30.72)(PORT=20008)))
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.20.30.71)(PORT=20008)))
 Services Summary...
 Service "MYDB" has 1 instance(s).
 Instance "MYDB", status UNKNOWN, has 1 handler(s) for this service...
 The command completed successfully

 

And i compared that to the information in the listener on the other node: 

(Instance 2 is on srvr_34r)
oracle@srvr_34r:/opt/oracle [MYDB2]# lsnrctl status listener_MYDB
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 30-DEC-2013 10:00:27
 Copyright (c) 1991, 2011, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=srvr_34r-vip)(PORT=20008)(IP=FIRST)))
 STATUS of the LISTENER
 ------------------------
 Alias LISTENER_MYDB
 Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production
 Start Date 23-OCT-2013 03:10:15
 Uptime 68 days 7 hr. 50 min. 12 sec
 Trace Level off
 Security ON: Local OS Authentication
 SNMP OFF
 Listener Parameter File /opt/oracle/product/11203_ee_64/db/network/admin/listener.ora
 Listener Log File /opt/oracle/MYDB/admin/network/log/listener_MYDB.log
 Listening Endpoints Summary...
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.20.30.75)(PORT=20008)))
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.20.30.73)(PORT=20008)))
 Services Summary...
 Service "MYDB" has 1 instance(s).
 Instance "MYDB", status UNKNOWN, has 1 handler(s) for this service...
 Service "MYDB.prod.nl" has 1 instance(s).
 Instance "MYDB2", status READY, has 1 handler(s) for this service...
 Service "MYDB_TAF.prod.nl" has 1 instance(s).
 Instance "MYDB2", status READY, has 1 handler(s) for this service...
 The command completed successfully

 

Conclusion: So in short the listener on srvr_33r for this specific database was unaware of its services even though  the services where / are in place.

HOWEVER  when I checked for the local listener on the srvr_33r server   in an sqlplus session  this is what I saw:

SQL> show parameter listener
NAME TYPE VALUE
 ------------------------------------ ----------- ------------------------------
 listener_networks string
 local_listener string (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.20.30.72)(PORT=1521))))
 remote_listener string srvr_3rcl-scan:33100

Which proves that the Spfile was at this morning time during incident ( December 30 as of 09:00)  no longer aware of its LOCAL listener !!! In the end we fixed the issue by adding the local listener again and after that issues with connectivity where resolved.

Note when I check the listener Logs I do see that listener was unaware of services at the start of  the incident .

/opt/oracle/MYDB/admin/network/log/listener_MYDB.log
30-DEC-2013 00:14:40 * (CONNECT_DATA=(SERVICE_NAME=MYDB.prod.nl)(SERVER=DEDICATED)(CID=(PROGRAM=c:\oraclexe\app\oracle\product\11.2.0\server\bin\ORACLE.EXE)(HOST=X1234567)(USER=SYSTEM))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.60.205.57)(PORT=58445)) * establish * MYDB.prod.nl * 12514
TNS-12514: TNS:listener does not currently know of service requested in connect descriptor
30-DEC-2013 00:14:40 * (CONNECT_DATA=(SERVICE_NAME=MYDB.prod.nl)(SERVER=DEDICATED)(CID=(PROGRAM=c:\oraclexe\app\oracle\product\11.2.0\server\bin\ORACLE.EXE)(HOST=X1234567)(USER=SYSTEM))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.60.205.57)(PORT=58446)) * establish * MYDB.prod.nl * 12514
TNS-12514: TNS:listener does not currently know of service requested in connect descriptor

In the alert logs of the Instance I found this information and I think that this has caused our PRiO 1 incident due to the instance could not talk to the asm instance:

view /opt/oracle/MYDB/admin/Arch/alert_MYDB1.log.20131230_000158
See Note 411.1 at My Oracle Support for error and packaging details.
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
Dumping diagnostic data in directory=[cdmp_20131229112802], requested by (instance=1, osid=29870 (M000)), summary=[incident=48997].
Sun Dec 29 11:28:07 2013
Errors in file /opt/oracle/MYDB/diag/rdbms/MYDB/MYDB1/trace/MYDB1_m000_29870.trc  (incident=48998):
ORA-04031: unable to allocate 760 bytes of shared memory ("shared pool","unknown object","KKSSP^928","kglss")
Incident details in: /opt/oracle/MYDB/diag/rdbms/MYDB/MYDB1/incident/incdir_48998/MYDB1_m000_29870_i48998.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
Sun Dec 29 11:28:08 2013
Dumping diagnostic data in directory=[cdmp_20131229112808], requested by (instance=1, osid=29870 (M000)), summary=[incident=48998].
Sun Dec 29 11:28:33 2013
ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.20.30.72)(PORT=1521))))' SCOPE=MEMORY SID='MYDB1';

Addendum:

After Altering the local listener to the required and correct listener i closed the ticket. Much to my surprise after couple of hours to be kicked by the same issue on the same database again. A  Colleague pointed me to Mos to  Bug 11772838 – oraagent modifies LOCAL_LISTENER with an incorrect value, when the parameter was set in spfile (Doc ID 11772838.8) .  So that proved i would need to make another step to have a proper workaround in place before i could start plan a maintenance window that would help to fix the cause of all this ( which was the Oracle Instance unable to communicate to its ASM instance when it needed  information about  the local listener in its spfile).

As Always Happy reading,

Mathijs

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s