[Pacemaker] Question on ILO stonith resource config and restarting

Thu Oct 30 13:26:06 UTC 2008

On Wed, Oct 29, 2008 at 12:51:44PM -0400, Aaron Bush wrote:
> I have a 0.6 pacemaker/heartbeat cluster setup in a lab with resources
> as follows:
> 
> Group-lvs(ordered): two primitives -> ocf/IPddr2 and ocf/ldirectord.
> Clone-pingd: set to monitor a couple of Ips and used to set a weight for
> where to run the LVS group.
> 
> -- This is the area that I have a question on --
> Clone-stonith-node1: HP ILO to shoot node1
> Clone-stonith-node2: HP ILO to shoot node2
> 
> I read on the old linux-ha site that using a clone for ILO/stonith was
> the way to go.  I'm not sure I see how this would work correctly and be
> preferred over a standard resource.  What I am confused about is this:
> the external/riloe stonith plugin only knows how to shoot one node so

Please make sure that you use the latest edition of
external/riloe. The previous one didn't work under all
circumstances.

Thanks,

Dejan

> why would you want to run it as a clone since each external/riloe is
> configured differently.  I went ahead and configured the riloe's as
> clones feeling that the docs are correct and that the reason would
> become obvious to me later.  (I also saw a similar post with no
> response:
> http://www.gossamer-threads.com/lists/linuxha/users/35685?nohighlight=1#
> 35685)
> 
> I then noticed that my ILO clones were starting on the 'wrong' nodes.
> As in the stonith resource to kill node 2 was actually running on node
> 2; which is pointless if node 2 locks up.  So I added resource
> constraints to force the stonith clone to stay on a node that was not
> the one to be shot.  This seemed to work well.
> 
> The next issue I have is that when I disconnect the LAN cable on a
> single node that connects it to the rest of the network the clone
> stonith monitor will fail since it can't connect to the other nodes ILO
> for status.  After some time (minutes let's say) I reconnect the LAN
> cable but never see the clone stonith come back to life, just stays
> failed.  What should I be looking at to make sure that the clone stonith
> restarts properly.
> 
> Any advice on how to more properly setup an HP ILO stonith in this
> scenario would be greatly appreciated.  (I can see where a clone stonith
> would be useful in a large cluster of n>2 nodes since all nodes could
> have a chance to shoot a failed node and maybe this is the reason for
> cloned stonith with ILO?  Basically in a cluster of N nodes each node
> would be running N-1 stonith resources, ready to shoot a dead node.)
> 
> Thanks in advance,
> -ab
> 
> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at clusterlabs.org
> http://list.clusterlabs.org/mailman/listinfo/pacemaker