Oracle 10gR2 Clusterware Installation Bug

During our recent Oracle upgrade we encountered a lovely little bug.  This was a particularly annoying bug as it was not publicly acknowledged, nor was a patch publicly available.

The bug is apparently specific to deployments of Clusterware from release 2 of 10G database server on EMC storage devices, or devices that use the EMC powerpath software for multi-channel communication with the disk device.

Clusterware is very touchy during installation, and if anything is not 100% right it complains and dies.  On a brand new storage device, using brand new disks with brand new partitioning on them we were unable to complete the installation without help from Oracle support.

If the installation scripts find data on the disk/partition that you nominate for the OCR disk it checks for the previous version of clusterware.  If it can’t find it an error is thrown and the script exits.  Unfortunately for us there were a few posts on MetaLink with similar problems with no resolution.  Eventually I found some references to commenting out some sections of the install script on a couple of forums from people that appeared to be having the same problem.

At first this appeared to do the trick, however after completing the installation we could only ever get 1 node of the cluster to come online.  Whichever node we started first would start, the other node simply would time out waiting to talk to the OCR before starting.

After a long time on the phone and web collaberation with Oracle Support we were informed of the internal memo about the patch that was currently in “testing” and not yet available.  The interim solution that we had to employ was changing the  manner in which we mounted the drives from the disk device.  By not using the powerpath software to access the OCR volume we were able to successfully complete the installation and have a working cluster.  All this means is that there is only 1 channel of communication to the OCR volume from each node.  If the fibre card that this channel goes through fails the cluster will fail.  Obviously this is not the most appealing solution, however we needed the cluster up and running and it got us through.

I’m still waiting to see this patch become available, some two months after the installation of our shiny new cluster.  I figured that someone else may run into this problem as 10gR2 becomes more widely used and if this helps them, then it has served it’s purpose.

2 Comments »

  1. Sunderrajan Said,

    September 9, 2006 @ 9:34 pm

    Hi,

    I too have the same setup. Emc powerpath and trying to install Oracle 10g stan edition on RAC. The clusterware installation fails after running ./root.sh
    Has oracle released a patch for this? Awaiting your reply.

    thanks.
    Sunder

  2. Me. Said,

    September 12, 2006 @ 3:54 pm

    Not that I am aware of, although I have to admit that I have not been actively looking for it for a while.

    As I said above, my solution was changing the way that we mount the OCR drives and not mount them over powerpath. I’ve not had any issues with running the servers in this configuration.

RSS feed for comments on this post · TrackBack URI

Leave a Comment