Not sure if it was word from a wise Dba or just from a fortune cookie (might even have been from a Pink Panther movie). It said always expect the unexpected and as an add-on success just loves preparation.
This week one of my tasks was to upgrade a 4 node Oracle Rac cluster from 184.108.40.206 to 12c (220.127.116.11.0) grid infrastructure. And even though I came well prepared (see also detailed other blog for that ( https://mathijsbruggink.com/2017/05/01/upgrading-11g-gridinfra-to-12c-in-linux/) several small surprises occurred which will be used as a lesson learned in upcoming upgrades of the grid infra structure. Also I would like to offer some timeline as with regard to how long the upgrade process really took.
- During the preparations needed to order extra disks for ASM storage for the Grid Infrastructure management repository (GIMR). When i started the runInstaller as a first check if all was well prepared noticed that the installer software is indeed most likely looking for a diskgroup called +OCR or +VOTING. This could be a trap if you had not extended one of them ( but instead a +GRID diskgroup ). So when preparing look for either OCR or VOTING ( best both if present) to add extra disks ( and have some disks at spare).
- During the start of maintenance window the Linux colleague mentioned that he would have to stop the Hyperion services. This activity took some 45 minutes of the change window. Will have to find out if this was a justified claim to stop those services and will need to add an extra step to the pre-checks to find out about other services – daemons that are running on the cluster that might be impacted when doing an upgrade.
- Purpose of rootupgrade.sh after the installation part via the runinstaller completes and the upgrade part via the runinstaller commences. Rootupgrade.sh will perform the actual ASM upgrade, will configure the OLR (local registry) amongst other things.
Every change on a test or production environment will have to come with a plan with regard to an estimated time needed how long the change will take. First and most important of course choose the strategy, will a rolling window be used (thus minimizing impact since at least one node will be up ( thinking about a kind of batch where first batch will hold first node , second batch holding node 2 and 3 in my 4 node example, and a last batch holding the last node)).
Start of Change Window : 20:00 CET ( 6:00 UTC ) .
According to Linux expert Hyperion services needed to be stopped before we could continue.
Start of installation: 20:45 CET.
Started the runInstaller on the first node. Software was deployed to first node and all the nodes in the cluster (4 Node Rac).
Upgrade part of the existing 18.104.22.168 GridInfra structure:
21:30 – 21:58 on the first Node (MYSRVR09hr) the rootupgrade.sh was started. (used the manual upgrade ( still a bit hmm unwilling to leave it all to the automated option), this means set up a root session on first node and run: ./app/grid/product/`1102/grid/rootupgrade.sh).
In the runInstaller it was offered to automate and to run the rootupgrade.sh in parallel on Node number 2 and 3. So in separate windows but to me it felt better to open a terminal session as root in parallel to run the script on each server.
22:06 – 22:13 on MYSRVR10hr : ./app/grid/product/`1102/grid/rootupgrade.sh
22:06 – 22:24 on MYSRVR11hr : ./app/grid/product/`1102/grid/rootupgrade.sh
On the last node MYSRVR12hr:
22:28 – 22:48 ./app/grid/product/`1102/grid/rootupgrade.sh
After that install continued with the Grid Infrastructure management repository (GIMR) database and once completed i ran a number sanity checks in the cluster:
At 23:59:59 Reported mission completed.
Happy reading and till next time,