Is it really the good – the – bad – and the Gui Dba building Racs?

Introduction

From my experience for some Database Administrators / sites it  still seem to be more tough /  hardcore to use the command line instead of using tools that  could do the same for you only with a lot less effort. And maybe that is also regarded to be a risk cause it is so easy. Well my thoughts on this is that a healthy mix on the use Gui – tools and command lines should  be an asset  to every administrator. And indeed before even starting debate when using the Gui of course you should always know what you are doing .. but then again that is always the case is it not ?

Creating an Rac manually – using DBCA.

Personally i think that the DBCA tool is  your best way forward if you are creating a new Rac environment. Of course  this tool also has also its limits ( such as on a 4 Node environment it will  by default set your cluster_database_instances default to 4 even when you build only a two node Rac)  but in general it is best way to work cause not only will create the Database , and the instances on the various Nodes  but it also adds the information to the 11G Grid infrastructure ( the Cluster ware ). And  for documentation purposes you can / should also always generate the create scripts in a default ( for me always /opt/oracle//admin/create) directory.

Yesterday i had to support a colleague who decided to create a two node RAC by hand . Indeed  even  when i suggested to him that using the DBCA would be helpful and fast in his actions it was his call to do it all by hand. But then he got stuck in the process with various actions.

If you decide to do  a create manually ( as they say maybe you should try to not use the Gui just cause it is written in this blog).  I think following steps are important:

  1. Work accurately ( setting up init oras , one for single Instance – Db) add the Instance specific information (in my case undo tablespace and local_listener ) were needed. Create spfile from that pfile.
  2. Start – create the single instance-db in full with the spfile.
  3. Once the single instance is alive start alter your spfile:
    1. alter system set cluster_database=true scope = spfile:
    2. alter system set instance_number= 1 scope = spfile sid=”instance1″ ;
    3. alter system set instance_number= 2 scope = spfile sid=”instance2″ ;
    4. alter system set thread = 1 scope = spfile sid=”instance1″ ;
    5. alter system set thread = 2 scope = spfile sid=”instance2″ ;
  4. Add at least the 3 redo groups to the second thread( alter database add logfile thread 2 group x:)
  5. set up the remote_listener with the scan listener
  6. alter system enable thread 2 ;
  7. Add a Taf service.
  8. As a check stop and start and stop both instances manually to see if the database is shared as it should be between the two instances.
  9. Add the information to the cluster:

srvctl  add database -d DB -o /opt/oracle/product/oracleVersion

srvctl  add instance -d DB -i DB1 -n Node1 srvctl  add instance -d DB -i DB2 -n Node2

srvctl modify instance -d DB -i DB1 -s +ASM1 srvctl modify instance -d DB -i DB2 -s +ASM2

srvctl  add service -d DB -s DB_TAF.domain -r DB1,DB2 srvctl start database -d DB srvctl start  service -d DB -s DB_TAF.domain

Ok, are  you still with me , do i rest my case here that it is a lot of work to do this the manual way ? And that these steps can be very  lets say error prone if you do not work 100%  accurate..

Yesterdays event where indeed not successful at first . Amongst other issues He got :

PRCR-1079 : Failed to start resource ora.db.db
CRS-5017: The resource action “ora.db.db start” met the following error:
ORA-29760: instance_number parameter not specified

Once again Google was our friend it showed post of a fellow blogger with regard to a Mos note:

MOS note ‘ORA-29760: instance_number parameter not specified’ When Starting the Database with Srvctl [ID 749515.1].

After that . Things worked .

Bottom-line of this story ? Plain and simple use the right and easiest tool for the Job you need to do. If that is a Gui  there is no dishonor in that . If you decide to go manual create , make sure you work accurately  hmm and have Google available for when things come up during the process.

Happy Reading ,

Mathijs

Once upon a time there was a prince2 and a production frog from an Oracle Dba perspective

Introduction.

A lot of  us have to do both projects and  day – to – day business.  What really sometimes is a sad thing happening is the mere fact that project seem to have no wider horizon then the end  of project .. It seems that the project deliverable  project -end-date is the only sacred time line..   Also not all too uncommon is the Database might be already offered to the customer before ALL  Q&A steps have finished by project managers.  So sometimes a prince might turn into a production frog  if you look at it from a production administrator perspective in the end.

Personally as an Oracle Database Administrator  I think  improving working together with Project manages should have two needs for improvement:

  1. Project managers need to be  made more aware of the Post go live situation if the implementation is only targeting a  go-live-date.
  2. Project manages should never ever ( did i mention ever) be allowed to  have customers entering the systems before go-live ( simply cause they might claim live status once they are hooked up to the database)

Hm and I have a dream I know it is wishful thinking but sometimes I  really think  it would be great to have Project people with a more technical background , maybe even with a  base training about  standards,  production  rules of other departments like Oracle ..

🙂 and for the Admins ..  Make sure you get involved in projects as early as possible and make sure , standards are followed.. or if you are in production side and a  Databases administrator is part  the project team make sure that he – she  is aware of your production world and   your standards.

Taking a first look at Prince2 if you care..:

http://www.prince2.com/prince2-process-model.asp

Happy reading

Mathijs

Alert log shows Private strand flush not complete

Introduction

Recently came across the following issue that a database is throwing  this message : Private strand flush not complete  during the switch of the redo logfile group. It is interesting to see the various opinions on this matter on the web, and in Mos. That is why i think taking inventory   and a plan might still be useful.

Details:

Database at hand is a 10.2.3.0 EE. on a Solaris box.

Looking under the hood i see:

SQL> select GROUP#,THREAD#,bytes/1024/1024 MB,members, STATUS from v$log order by 2,1 ;
GROUP#    THREAD#         MB    MEMBERS STATUS
 ---------- ---------- ---------- ---------- ----------------
 1          1       2000          2 INACTIVE
 2          1       2000          2 INACTIVE
 3          1       2000          2 CURRENT
 4          1       2000          2 INACTIVE
 5          1       2000          2 INACTIVE

Database is 1.24 TB

and looking at:

show parameter db_writer_processes
NAME                                 TYPE        VALUE
 ------------------------------------ ----------- ------------------------------
 db_writer_processes                  integer     2
Switches per hour:
Date            Switches
 ------------- ----------
 2013-03-13 02          8
 2013-03-13 03          7
 2013-03-13 04          3
 2013-03-13 05         13
 2013-03-13 06          9
 2013-03-13 07          2
 2013-03-13 08          2
 2013-03-13 09          9
 2013-03-13 10          1

According to MOS note Alert Log Messages: Private Strand Flush Not Complete [ID 372557.1] i should/could consider increasing the  db_writer_processes  at least from the current 2 to  4..  Which  sounds like a plan 2 Me.

Happy reading.

Mathijs

Sqlplus / as sysdba throws ORA-01012

Introduction.

This morning the first challenge of the day became the fact that we could not get proper access to one of the databases. We would set the Environment, set the proper ORACLE_SID  and yet after logging on would get

ORA-01012: not logged on

with any query ( select * from v$instance ; show parameter blablah ).

Reading the posts on the web give a quick solution…  First  proof:  the alert log was flooded with messages like these:

ORA-04031: unable to allocate 2072 bytes of shared memory ("shared pool","unknown object","sga heap(1,1)","parameter table block
")
Wed Feb 27 09:31:49 2013
Process J000 died, see its trace file
Wed Feb 27 09:31:49 2013
kkjcre1p: unable to spawn jobq slave process

 

Sqlplus /  as sysdba worked, but any query failed with ORA-01012.

Solution:

Created a pfile from spfile , edited both  SGA_MAX_SIZE and SGA_TARGET  and increased  the  processes as well in the init.ora .

Created spfile file from the altered pfile.

Restarted the Database, and after that happy database ( and dba ).

Happy reading,

Mathijs

ASM 11.2.0.2 Is Not Releasing File Descriptors After Drop or Dismount Diskgroup. [ID 1306574.1]

Introduction.

One of my Linux environments is set up with 11.2.0.2.0. both Grid Infra structure and Rdbms together with ASM  using  luns that are part of EMC  BCVS copy  of production for reporting purposes.   This environment can be classified as a  Oracle – Restart environment. As part of a process that is supposed to run by  automated  Job during the night the Database should  go through various stages ( shutdown  abort,  dismount the dedicated ASM diskgroups  per database in ASM).  After that the BCVS are handed over to the source side again for syncing  after which the database will be rebuild using the fresh  BCV split.

Well as I said , supposed  to run automated…  cause at this point and time , during the day this is working  properly but during the night we get  per disk in the disk group a message similar to:

asm-diskgroupname-data-001: map in use

From Reading  and searching  metalink  it was suggested to implement the latest PSU Patch January 2013  due to:

ASM 11.2.0.2 Is Not Releasing File Descriptors After Drop or Dismount Diskgroup. [ID 1306574.1]

So i started my Journey on  patching  this environment.

Continue reading

Selects Against ALL_SYNONYMS Very Slow in 11.2.0.1. and 11.2.0.2. [ID 1242844.1]

Tools such as Toad appeared 2 b very slow.

From mos I quote:

…The bug is fixed in patchset 11.2.0.3 and higher versions.
Release 11.2.0.1 and the patchset 11.2.0.2 are affected by this.

end quote.

Since patching was not an option the workaround has been followed:

Mos note is offering create or replace view ALL_SYNONYMS.. and  to set spfile value (_FIX_CONTROL).

Workaround has been implemented in Test and Preprod environments, no issues found  with it and it showed improvement @  user side.

Happy reading

Mathijs

The return of ASM communication Issues ((ORA-01031: insufficient privileges) (WARNING: ASM communication(the aftermath)

Introduction

January  5th I wrote a post on  the issues we faced with ASM instance which would not let me log in as sqlplus /  as sysasm at specific point and time during which time alert log of the databases  on  that box would also be sending warnings to the alert log “.. ASM communication error”. With information on the web (Metalink)  a solution and a workaround had been offered and implemented.  For example on that specific box the oinstall  gid was lacking in the first place (primary os group is dba (oracle:dba) so I had th Linux colleague added the oinstall onthat box. And  as a workaround  I created a tnsnames entry  and connected via: sys@asm as sysasm that was also working well.  So at that point and time we all thought , case closed.

Well…… Not entirely cause the issue showed again recently and even though the workaround (using the connect string method was working)  I was not a happy Database Administrator with it.  I opened a Tar with Oracle  but  I was going in circles with it this time.

Work Info

Last Friday the Issue showed again on a box in one of the clusters. An internal mail was sent within our team about this and a very interesting clue came back from one of the Colleagues who had similar experience in different project. He came up with following information on MOS:

Troubleshooting ORA-1031: Insufficient Privileges While Connecting As SYSDBA [ID 730067.1]
UNIX: Checklist for Resolving Connect AS SYSDBA Issues [ID 69642.1]
UNIX: Diagnostic C program for ORA-1031 from CONNECT INTERNAL / AS SYSDBA [ID 67984.1]

Actually especially  last Note 67984.1 was very useful cause it showed  that during time of issue the gid  ( group Id ) was no longer valid due to an Ldap call.

With the Output of that note and the analyses after that it turned out that the NCSD daemon (http://www.linux.ncsu.edu/realm_linux/usersguide-EL4/ch04s06.php) might be part of the issue when something like that was queried on the OS:

# getent group dba
 101
 # getent group 5000
 dba
#getent group dba
 5000

When the Linux administrator configured the correct (exception) information in /etc/ldap.conf the problem vanished and the Phantom hunt ended.

Happy end

Bottom line of this:

  • Never believe in phantoms, thinks like described happen for a reason.
  • Always be willing to communicate with in the team and beyond cause communication might bring a so-called aha – Erlebnis (déjà vu).
  • Standardize, standardize,  standardize when you are using Ldap and local configurations cause you really let the ghost out of the machine otherwise.
  • A special thank  you to the colleagues who started the internal mail and to the one who shared his experiences with the team.

Happy  reading,

Mathijs

ASM communication Issues ((ORA-01031: insufficient privileges) (WARNING: ASM communication error: op 0 state 0x0 (15055))

Introduction:

A couple of months ago  I had the issue that in an 11.2.0.2 environment  with Grid Infra structure  and ASM on Linux , without any specific reason at a specific point and time would  see  messages  about communication error between the databases and  the ASM instance and  I was not able to connect to the ASM instance with a sqlplus / as sysasm. I have opened a tar back then and even got myself  a fresh bug 14767353 number but no answers. So  But practically this remained unsolved. Friday 4th  I had same issue again, so it was time to get back to arms and investigate this.

Below you will find the steps , what i saw and how I solved it with the help of my friends(Google and Metalink).

Environment: 4 Node GI 11.2.0.2.0 on Red Hat Linux  with ASM. Installations performed as oracle:dba

Continue reading

Food for thoughts Rac One a first Step to explore

Introduction.

Some months ago I have implemented an Active Passive Solution on Oracle 11gr2 for on one of the customers.  Basically this was almost reliving old days where  tools like HP MC Service Guard (using packages where one or more database and listener configurations are stored)  or  Veritas Clusterware  (where and Agent controls the status of the resources being a database and or a listener). They would  monitor  the health of the Database/Listener,  would stop and start the Database/ Listener. With the technology at hand at that point and time ( implemented 11.2.0.2 Grid Infra with ASM and 11.2.0.2 Rdbms )  it was not   having all the required   setups to implement anything else.  Now we are 9 months further in time,  standards have been adapted and we have latest Oracle version at hand ( being 11.2.0.3) so it is time for a fresh start. Besides that recently I have been asked to  implement the Active Passive solution on more environments (since it does cover  part off HA and it controls the monitoring  of the Database/ Instance  by HP-Openview).

Continue reading