Introduction:
Earlier this week preparations have been started to add the Goldengate software to the Grid infrastructure of 11.2.0.0.3 on the Billing environment on production. As part of that scenario I also had to add a Vip address that is to be used by the Goldengate Software as part of high(er) availability. In my concept Goldengate Daemons are running on on Node only by default. During a Node crash ( of course not wanted nor desired ) or as a way to load balance work on the cluster the Vip address and the Goldengate software need to stop and restart on the other Node. Below you will find a working example as part of the preparations I have performed. Some comment has been added to the specific steps.
Commands will be typed in italic in this blog.
Details
## First step will be to be adding the vip address 10.97.242.40 oradb-gg-localp1.dc-lasvegas.us to the Grid Infra (GI). Note IP address and the description have been defined in the DNS. Once I got feedback that the address was added I was able to perform a nslookup. Of course it was not possible yet to ping the ip because we first have to add it to the cluster as is done here.
## As root:
/opt/crs/product/11203/crs/bin/appvipcfg create -network=1 -ip=10.97.242.40 -vipname=oradb-gg-localp1.dc-lasvegas.us -user=root
## Once that is in place , grant permissions to Oracle user to work with the vip address:
(As root, allow the Oracle Grid infrastructure software owner (e.g. Oracle) to run the script to start the VIP.)
/opt/crs/product/11203/crs/bin/crsctl setperm resource oradb-gg-localp1.dc-lasvegas.us -u user:oracle:r-x
## Now it is time to start the Vip:
## As Oracle, start the VIP:
/opt/crs/product/11203/crs/bin/crsctl start resource oradb-gg-localp1.dc-lasvegas.us
##Check our activities:
## As Oracle:
/opt/crs/product/11203/crs/bin/crsctl status resource oradb-gg-localp1.dc-lasvegas.us -p
## In my setup Goldengate is defined to be able to run on either node one (usapb1hr) or on node 2 (usapb2hr) in my four node cluster. And Since i want to make sure it only runs on those two servers I add placement to restricted.
## As root:
/opt/crs/product/11203/crs/bin/crsctl modify resource oradb-gg-localp1.dc-lasvegas.us -attr “HOSTING_MEMBERS=usapb1hr usapb2hr”
/opt/crs/product/11203/crs/bin/crsctl modify resource oradb-gg-localp1.dc-lasvegas.us -attr “PLACEMENT=restricted”
## As always the taste of the creme brulee is in the details so let’ s check :
## As Oracle:
/opt/crs/product/11203/crs/bin/crsctl status resource oradb-gg-localp1.dc-lasvegas.us -p
## Great that worked , now lets relocate the Vip to the other node as a test:
## As Oracle:
/opt/crs/product/11203/crs/bin/crsctl relocate resource oradb-gg-localp1.dc-lasvegas.us
## completed action with a smile Because it worked as planned.
## As always the taste of the creme brulee is in the details so let’ s check :
## As Oracle:
/opt/crs/product/11203/crs/bin/crsctl status resource oradb-gg-localp1.dc-lasvegas.us -p
## As part of making sure that setup from scratch was same on all machines ( had the same solution in Pre Prod env. ) let us first remove the existing resource for Goldengate and then add it to the GI again.
/opt/crs/product/11203/crs/bin/crsctl delete resource myGoldengate
## as Oracle ( white paper was very specific about that , performed it as root first time ending up with wrong primary group in the ACL which i checked in the end) . So stick to plan ! And do this als ORACLE. Add the resource to the GI and put in a relationship to the Vip address that has been created in the GI earlier, AND inform the cluster about the action script that is to be used during a relocate – server boot – node crash . ( This script is in my case a shell script holding conditions like stop, start , status etc and the correspondig commands in the Goldengate that are to be used by the GI:
/opt/crs/product/11203/crs/bin/crsctl add resource myGoldengate -type cluster_resource -attr “ACTION_SCRIPT=/opt/crs/product/11203/crs/crs/public/gg.local.active, CHECK_INTERVAL=30, START_DEPENDENCIES=’hard(oradb-gg-localp1.dc-lasvegas.us) pullup(oradb-gg-localp1.dc-lasvegas.us)’, STOP_DEPENDENCIES=’hard(oradb-gg-localp1.dc-lasvegas.us)‘”
## Altering hosting members and placement again ( by default only one node part of hosting_members and placement=balanced by default).
## As root:
/opt/crs/product/11203/crs/bin/crsctl modify resource myGoldengate -attr “HOSTING_MEMBERS=usapb1hr usapb2hr”
/opt/crs/product/11203/crs/bin/crsctl modify resource myGoldengate -attr “PLACEMENT=restricted”
## so in the end you should check it with this:
/opt/crs/product/11203/crs/crs/public [CRS]# crsctl status resource myGoldengate -p
## Time to set set permission to myGoldengate (altering Ownership to myGoldengate user ( which is my OS user for this).
### As root:
/opt/crs/product/11203/crs/bin/crsctl setperm resource myGoldengate -o myGoldengate
## needed and sometimes forgotten to make sure that the oracle user ( who is also the owner of the Grid infra software on these boxes).
###As root, allow oracle to run the script to start the goldengate_app application.
/opt/crs/product/11203/crs/bin/crsctl setperm resource myGoldengate -u user:oracle:r-x
Wrap-up:
All preparations are now in place. During an already scheduled maintenance window following steps will be performed to bring this scenario to a HA solution for Goldengate.
- Stop the Goldengate software daemons ( at moment stopped and started by hand) .
- Start the Goldengate resource via the Grid Infra ( giving her control of status and activities) .
- Perform checks that Goldengate is starting its activities .
- Perform a relocate of the Goldengate resource via the Grid Infra to the other node.
- Perform checks that Goldengate is starting its activities .
As an old quote states . Success just loves preparation. With these preparations in place I feel confident for the maintenance window to put this Solution live .
As Always , happy reading and till next Time ,
Mathijs