[Pacemaker] trigger STONITH for testing purposes

Tim Serong tim at wirejunkie.com
Tue May 19 09:48:32 UTC 2009


Bob Haxo wrote:
> OK, I've set the stonith action to "poweroff" and I already had quarum
> action set to "ignore".  The "poweroff" makes is much easier to re-set
> "stonith-enabled" to "false" so that I can get two systems online
> again. ;-)
> 
> However, I was more hoping to be able to reboot the fenced system
> without triggering a reboot (or halt) of the working system.  Here are
> some specifics:
> 
> SLES11 HAE (GA)
> external/ipmi
> two HA servers
> 
> 
> ...
> 
> Any suggestions as to what needs changing so that the stonith deathmarch
> can be avoided?

I can't offer any useful commentary on your config, but I can suggest
another trick for debugging this:

1) Change the IPMI password, so that STONITH will still be attempted,
   but will fail (can't reboot the node due to authentication failure).
2) This will put the cluster into a slightly bizarre state, where
   (ultimately) no resources will run properly, but at least your
   machines won't be continually rebooting.
3) tail and/or cat /var/log/ha_log and /var/log/ha_debug (or whererever
   the log files are) on both nodes.  This should tell you what it was
   that failed that resulted in STONITH, and hopefully give you some
   idea of where to look next (eg: if a "stop" action failed, maybe
   instrument that resource agent to log more detailed failure
   messages).
4) Don't forget to reset your IMPI passwords once the problem is
   solved! :)

Hope that helps,

Tim




More information about the Pacemaker mailing list