[Pacemaker] trying to stabilize sbd stonith

Sat Feb 27 15:09:43 UTC 2010

Hi,

Found it, for Googling sake, let me give the solution as well:

1.	From hb_gui (or whatever method you prefer), stop all resources
2.	Stop all nodes in the cluster
3.	Start all the nodes in the cluster again
4.	Start the resources one by one manually, respecting all ordering
constraints.

Cheers,
Sander

On Thu, 2010-02-25 at 20:49 +0100, Sander van Vugt wrote:
> Hi,
> 
> Some additions
> On Thu, 2010-02-25 at 20:30 +0100, Sander van Vugt wrote:
> > Following up on my message that I've sent yesterday. In my 2-node test
> > cluster, sbd stonith works in an excellent way. In my customers 3-node
> > cluster it almost works in an excellent way. That is: I've got one node
> > that is in an uninterrupted STONITH loop. It comes up with a status
> > online, then online becomes online(clean) after which it receives a
> > stonith and restart. I think it's kind of cool to see that it works, but
> > I would like to get out of this loop. I've got the feeling that I'm
> > missing something very obvious. What I know is that it does see the
> > stonith device. But: the softdog watchdog module doesn't want to load,
> > and I have no clue what the watchdog module for this server (Dell
> > PowerEdge 2950) might be. Or am I looking in the wrong direction? I've
> > got the impression that I am overlooking something very obvious
> > (therefore, no log files and other information (yet))
> 
> So I decided to have a look at the logs anyway, after verifying that
> I've applied the complete procedure that Lars has sent me yesterday. Now
> it appears that the Meatware stonith resource that I've used for testing
> purposes is doing something nasty. Here's what happens:
> 
> 1.	I start openais (rcopenais start) on node3 (for some reason it
> doesn't come up automatically).
> 2.	It comes up, node1 sees that and says "hey, I've got a meatware
> stonith waiting for you, please admin run meatclient -c node3 to make
> sure it's gone" and then the 3rd node reboots. 
> 
> Now the interesting part is that I've removed the meatware stonith agent
> already from the cluster, so it looks like it is zombie-ing still
> around. Is there any way to get this meatclient zombie out of the system
> without actually restarting the entire cluster?
> 
> Thanks,
> Sander
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker