[Pacemaker] chain/cascade stonith agents?

Thu Aug 16 16:21:01 UTC 2012

On Thu, 2012-08-16 at 09:37 +1000, Andrew Beekhof wrote:
> On Thu, Aug 16, 2012 at 1:59 AM, Bob Haxo <bhaxo at sgi.com> wrote:
> > HI All,
> >
> > Is chaining/cascading of stonith agents implemented?
> 
> Yes.  But you'll want to use the current git HEAD
> 
> > If yes, would
> > someone please point me to the documentation?
> 
> Um, I'm sorry to say that it's not actually documented yet :-(
> 
> I can provide an example though, it should be reasonably self explanatory
> 
> <cib crm_feature_set="3.0.6" validate-with="pacemaker-1.2"
> admin_epoch="1" epoch="0" num_updates="0">
>   <configuration>
> ...
>     <fencing-topology>
>       <!-- try poison-pill and fail back to power -->
>       <fencing-level id="f-p1.1" target="pcmk-1" index="1"
> devices="poison-pill"/>
>       <fencing-level id="f-p1.2" target="pcmk-1" index="2" devices="power"/>
> 
>       <!-- try disk and network, and fail back to power -->
>       <fencing-level id="f-p2.1" target="pcmk-2" index="1"
> devices="disk,network"/>
>       <fencing-level id="f-p2.2" target="pcmk-2" index="2" devices="power"/>
>     </fencing-topology>
>   </configuration>
>   <status/>
> </cib>
> .
> 
> > I'd like to implement a stonith chain in which stonith_ipmilan is the
> > first stonith agent, and if that fails, a second stonith agent gets
> > called (for example stonith_apc).
> >
> > ((In short, I find it tiresome to pull the power cable(s) for a HA
> > failover demonstration only to have the failover, well, fail, when
> > stonith_ipmilan goes into a failure loop when it doesn't get a response
> > from the powered-off BMC.))
> >
> > Is there a way of setting stonith_ipmilan to give up and return a
> > "stonith success"?  I was thinking that I would chain stonith_ipmilan
> > with the ever popular stonith_null to achieve this end.
> 
> For a demo, sure.
> But in production, how do you tell the difference between "I can't
> reach the BMC because its powered off" and "I can't reach the BMC
> because my network link to it is disrupted"?
> 
> Note there is also 'stonith_admin --confirm $node' which will tell
> stonith-ng and the rest of pacemaker that $node is safely down.

Yes, it is a trade-off.  Certainly during development, I'm less
concerned about a corrupted virt than I am concerned about the hang that
occurs when there is no response to the lack of response to the
powered-off system.  The virt can easily be re-imaged.

Is there an easier way of forcing the stonith_ipmilan to give-up than
chaining to stonith_null?

Thanks,
Bob Haxo

> 
> >
> > Cheers,
> > Bob Haxo
> > bhaxo at sgi.com
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org