[Pacemaker] trigger STONITH for testing purposes

Wed May 20 16:39:39 UTC 2009

Hi Andrew,

> I'd say you removed no-quorum-policy=ignore

Actually, the pair of no_quorum_policy and no-quorum-policy are set to
"ignore", and expected-quorum-votes is set to "2":

  <crm_config>
    <cluster_property_set id="cib-bootstrap-options">
      ...
      <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
      <nvpair id="cib-bootstrap-options-no_quorum_policy" name="no_quorum_policy" value="ignore"/>
      <nvpair id="nvpair-1d2c923d-7619-4b45-989a-698357f9f8cb" name="no-quorum-policy" value="ignore"/>
      ...
      </cluster_property_set>
   </crm_config>

Removing the no-quorum-policy=ignore and no_quorum_policy=ignore (as in,
deleting the variables) left the cluster unable to failover with either
an ifdown iface or with a node reboot.  The state displayed by the GUI
did not agree with the state displayed by crm_mon (the GUI showed the
ifdown or rebooted node as still controlling resources, whereas crm_mon
showed the resources unavailable ... both showed the inaccessible node
as offline).

Setting the no-quorum-policy=stop had the same results, which included
the resources not migrating to the working system until returning
no-quorum-policy=ignore.  One of the tests led to filesystem corruption.
Very messy.  (this is a test-only setup, so no real data is present)

So, no, the change that I made was neither deleting nor setting
no-quorum-policy=stop.  Setting no-quorum-policy=ignore seems to be
required for the cluster to support migrations and failovers.

Cheers and thanks,
Bob Haxo

On Wed, 2009-05-20 at 11:17 +0200, Andrew Beekhof wrote:

> On Wed, May 20, 2009 at 1:31 AM, Bob Haxo <bhaxo at sgi.com> wrote:
> > Greetings,
> >
> > I liked the idea of not starting the cluster at boot, and found that the
> > fenced node would reboot and then openais start brought the node onboard
> > without triggering a reboot of the already running node.
> >
> > Then magic happened.  I chkconfig'd openais to start with boot, re-ran the
> > "ifdown eth0" command that had been triggering STONITH and then the STONITH
> > deathmarch, and, well, everything worked.  I've done this test many 10s of
> > times without a STONITH deathmarch.
> >
> > Unfortunately, I haven't a clue as to what was changed that cleared the
> > issue.
> 
> At a guess, I'd say you removed no-quorum-policy=ignore
> OpenAIS based clusters don't pretend they have quorum when only 1 of
> the 2 nodes is available (and you cant start shooting until you have
> quorum or the above option is set).
> 
> 
> >
> > Thanks for all the suggestions.
> >
> > Cheers,
> > Bob Haxo
> >
> >
> > On Tue, 2009-05-19 at 14:03 +0200, Andrew Beekhof wrote:
> >
> > On Mon, May 18, 2009 at 8:12 PM, Bob Haxo <bhaxo at sgi.com> wrote:
> >>
> >> Any suggestions as to what needs changing so that the stonith deathmarch
> >> can
> >> be avoided?
> >
> > If you only have two nodes, the only two ways have already discussed:
> > use poweroff, or don't start the cluster at boot.
> > If you don't want to do either of those, the only way to terminate the
> > stonith loop is to fix the network failure.
> >
> > If you had 3 or more nodes, the returning node wouldn't have quorum
> > and therefore wouldn't be allowed to shoot anyone.
> >
> > _______________________________________________
> > Pacemaker mailing list
> > Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > _______________________________________________
> > Pacemaker mailing list
> > Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> >
> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20090520/179c1881/attachment-0002.htm>