[Pacemaker] trigger STONITH for testing purposes

Mon May 18 18:12:31 UTC 2009

OK, I've set the stonith action to "poweroff" and I already had quarum
action set to "ignore".  The "poweroff" makes is much easier to re-set
"stonith-enabled" to "false" so that I can get two systems online
again. ;-)

However, I was more hoping to be able to reboot the fenced system
without triggering a reboot (or halt) of the working system.  Here are
some specifics:

SLES11 HAE (GA)
external/ipmi
two HA servers

<crm_config>
  <cluster_property_set id="cib-bootstrap-options">
    <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.0.3-0080ec086ae9c20ad5c4c3562000c0ad68374f0a"/>
    <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
    <nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1242661586"/>
    <nvpair id="cib-bootstrap-options-no_quorum_policy" name="no_quorum_policy" value="ignore"/>
    <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="true"/>
    <nvpair id="nvpair-a8fa01f7-fd6c-4e9b-adf6-0e48250691f1" name="stonith-action" value="poweroff"/>
    <nvpair id="nvpair-1d2c923d-7619-4b45-989a-698357f9f8cb" name="no-quorum-policy" value="ignore"/>
  </cluster_property_set>

And, the two stonith resources:

  <primitive class="stonith" id="ipmi_stonith_hikari" type="external/ipmi">
    <meta_attributes id="ipmi_stonith_hikari-meta_attributes"/>
    <operations id="ipmi_stonith_hikari-operations">
      <op id="ipmi_stonith_hikari-op-monitor-15" interval="30" name="monitor" start-delay="30" timeout="30"/>
    </operations>
    <instance_attributes id="ipmi_stonith_hikari-instance_attributes">
      <nvpair id="nvpair-d95c4018-1ebc-447b-9028-050e68c9929c" name="hostname" value="hikari"/>
      <nvpair id="nvpair-3aca66aa-bb82-43ec-8b63-e936b2507fa3" name="ipaddr" value="172.16.1.247"/>
      <nvpair id="nvpair-3f623098-c266-4132-8d9c-77744e0e8713" name="userid" value="ADMIN"/>
      <nvpair id="nvpair-04e6a6d7-6541-45d4-8d36-9768e240e79d" name="passwd" value="ADMIN"/>
      <nvpair id="nvpair-1a90ef3c-3b67-41c2-98cf-58b8a2f9cfe0" name="interface" value="lanplus"/>
    </instance_attributes>
  </primitive>
  <primitive class="stonith" id="ipmi_stonith_hikari2" type="external/ipmi">
    <meta_attributes id="ipmi_stonith_hikari2-meta_attributes">
      <nvpair id="nvpair-88049439-39e2-459d-9820-78cdeb9ae282" name="target-role" value="started"/>
    </meta_attributes>
    <operations id="ipmi_stonith_hikari2-operations">
      <op id="ipmi_stonith_hikari2-op-monitor-15" interval="30" name="monitor" start-delay="30" timeout="30"/>
    </operations>
    <instance_attributes id="ipmi_stonith_hikari2-instance_attributes">
      <nvpair id="nvpair-c4b4e4ce-6f9a-4a8d-a7fb-b8726f09ccf0" name="hostname" value="hikari2"/>
      <nvpair id="nvpair-e9d42aca-110f-4308-a3dd-645d793e49d3" name="ipaddr" value="172.16.1.248"/>
      <nvpair id="nvpair-31b086de-5209-4361-a4b8-55460cad95a8" name="userid" value="ADMIN"/>
      <nvpair id="nvpair-5b3c6b97-a49e-4d18-beea-6d7aaec000fa" name="passwd" value="ADMIN"/>
      <nvpair id="nvpair-6f98c068-7b2e-4309-8f5b-2c7c2527cc93" name="interface" value="lanplus"/>
    </instance_attributes>
  </primitive>

And the relevant pair of constraints:

  <rsc_location id="stonith_hikari_on_hikari2" node="hikari" rsc="ipmi_stonith_hikari" score="-INFINITY"/>
  <rsc_location id="stonith_hikari2_on_hikari" node="hikari2" rsc="ipmi_stonith_hikari2" score="-INFINITY"/>

Any suggestions as to what needs changing so that the stonith deathmarch
can be avoided?

Cheers and thanks,
Bob Haxo
SGI

On Fri, 2009-05-15 at 20:26 -0500, Karl Katzke wrote:

> Bob, as we've discussed a few other times recently, when you're
> testing (and depending on your preference in production), you may want
> to set the stonith policy to 'poweroff' as opposed to 'reboot'. 
> Also, if you have a two-node cluster, pacemaker depends on quorum and
> the loss thereof creates another stonith event. You'll want to set the
> loss of quorum action to 'ignore'. 
> ... in short, RTFM: http://www.clusterlabs.org/wiki/Documentation --
> Pacemaker Configuration Explained 1.0 has *everything* you need to
> know in it. 
> 
> 
> -K 
> 
> 
> ---
> Karl Katzke
> Systems Analyst II
> TAMU - DRGS
> 
> 
> 
> 
> 
> 
> >>> On 5/15/2009 at  7:22 PM, in message
> <1242433367.21186.4.camel at nalu.engr.sgi.com>, Bob Haxo <bhaxo at sgi.com> wrote:
> 
> > Ok, never mind this question.  "ifdown interface" works nicely to 
> > trigger STONITH action. 
> >  
> > Unfortunately (if I may ask a new question) ... I now have one server 
> > rebooting, then the other rebooting, and back to the first rebooting in 
> > what looks to be an endless loop of reboots. 
> >  
> > Suggestions? 
> >  
> > Cheers, 
> > Bob Haxo 
> > SGI 
> >  
> > On Fri, 2009-05-15 at 16:53 -0700, Bob Haxo wrote: 
> >  
> > > Greetings, 
> > >  
> > > What manual administrative actions can be used to trigger STONITH 
> > > action?   
> > >  
> > > I have created a pair of STONITH resources (external/ipmi) and would 
> > > like to test that these resources work as expected (which, if I 
> > > understand the default correctly, is to reboot the node). 
> > >  
> > > Thanks, 
> > > Bob Haxo 
> > > SGI 
> > >  
> > > SLES11 HAE  
> > >  
> > > _______________________________________________ 
> > > Pacemaker mailing list 
> > > Pacemaker at oss.clusterlabs.org 
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker 
> >  
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20090518/0c6c2ea6/attachment-0002.htm>