[Pacemaker] STONITH Deathmatch Explained

Dejan Muhamedagic dejanmm at fastmail.fm
Thu May 14 09:32:09 UTC 2009


Hi,

On Thu, May 14, 2009 at 06:32:00PM +1000, Tim Serong wrote:
> Greetings,
> 
> I've written up a brief document entitled "STONITH Deathmatch Explained
> (and Some Hints for Resource Agent Authors and Systems Engineers)":
> 
>   http://ourobengr.com/ha
> 
> It's a description of causes of STONITH deathmatch in
> Heartbeat/Pacemaker HA clusters, where two nodes continually shoot each
> other, thus rendering the system less available than a non-HA system
> would be.
> 
> Hopefully publishing this will save at least a few people from some of
> the pain myself and a couple of others experienced last year, in
> particular when trying to debug resource agents that were misbehaving in
> unexpected ways.
> 
> Comments, feedback, etc. welcome.

Great document! A very funny illustration too :)

Just a few remarks:

- in "Causes ..." you missed to mention split-brain (no
  communication channels working) and, at the same time, to
  stress how important it is to have redundant communications :)

- even though you mention that in the title, I'd still move the
  resource agent intricacies into another document; they are all
  very valuable advice, but of no concern to cluster
  administrators; it's also good to keep the focus on our little
  problem; then you'll have to find other "Things You Didn't
  Think Of" (or just keep the title and leave the section empty:
  it is important; or insert another illustration)

- devote more space/thought to the part on how to avoid a
  "deathmatch"; there's only a mention on chkconfig within
  "Debugging ..." (or one can also use the "poweroff" fencing
  operation); also, note that this occurs only in cases reboot
  doesn't fix a problem (e.g. split-brain)

Thanks,

Dejan

> Thanks,
> 
> Tim
> 
> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker




More information about the Pacemaker mailing list