[Pacemaker] trigger STONITH for testing purposes
Tim Serong
tim at wirejunkie.com
Tue May 19 09:48:32 UTC 2009
Bob Haxo wrote:
> OK, I've set the stonith action to "poweroff" and I already had quarum
> action set to "ignore". The "poweroff" makes is much easier to re-set
> "stonith-enabled" to "false" so that I can get two systems online
> again. ;-)
>
> However, I was more hoping to be able to reboot the fenced system
> without triggering a reboot (or halt) of the working system. Here are
> some specifics:
>
> SLES11 HAE (GA)
> external/ipmi
> two HA servers
>
>
> ...
>
> Any suggestions as to what needs changing so that the stonith deathmarch
> can be avoided?
I can't offer any useful commentary on your config, but I can suggest
another trick for debugging this:
1) Change the IPMI password, so that STONITH will still be attempted,
but will fail (can't reboot the node due to authentication failure).
2) This will put the cluster into a slightly bizarre state, where
(ultimately) no resources will run properly, but at least your
machines won't be continually rebooting.
3) tail and/or cat /var/log/ha_log and /var/log/ha_debug (or whererever
the log files are) on both nodes. This should tell you what it was
that failed that resulted in STONITH, and hopefully give you some
idea of where to look next (eg: if a "stop" action failed, maybe
instrument that resource agent to log more detailed failure
messages).
4) Don't forget to reset your IMPI passwords once the problem is
solved! :)
Hope that helps,
Tim
More information about the Pacemaker
mailing list