[Pacemaker] questions about expected behaviour stonith:meatware

Thu Jun 16 10:23:50 UTC 2011

On 2011-06-16 12:11, Jelle de Jong wrote:
> On 16-06-11 08:38, Florian Haas wrote:
>> On 06/16/2011 12:50 AM, imnotpc wrote:
>>>> Funny but it looks fairly unequivocal to me.
>>>
>>> Yes and no. The message is clear but unless you have someone sitting at
>>> a console 24/7 running tail on the log file, it has little value.
>>> According to the ClusterLabs stonith docs (which I just realized you
>>> wrote, haha):
>>
>> Meatware requires operator intervention, that much is a given.
>> _Notifying_ an operator that intervention is necessary, beyond logging
>> to the console or a log file, is beyond meatware's domain.
> 
> Just to be sure. I understand meatware needs operator intervention.
> However my problem was that there is no failover any-more... (this may
> be the intended behaviour?).

Yes it is.

> I thought when both nodes A and B are running and node A is running the
> resources. Node A dies, node B would takeover and A get fenced. But by
> running meatware, it will fence A but will not take over the resources
> until the operator intervenes...

You're getting it wrong. Fencing cannot be considered completed until
the node is confirmed down, and with meatware the only way to confirm
that is by an operator intervening.

> I did detect that when node A runs the
> resources and node B dies node A keeps running the resources. (as
> expected, but does not match the behaviour when the other node dies)
> 
> I was just expecting a fail-over would still work but the failed node
> would be fenced

Again, fencing must complete _before_ a takeover can occur. Anything
else would be utterly pointless. Suppose you have a node accessing
shared storage, which becomes unreliable or fails to communicate in the
cluster, or fails to stop a resource properly so it must be assumed that
it still holds handles open on that shared storage. Then you _must_ kill
that node _before_ you fail over, otherwise you risk concurrent
uncoordinated access to your storage, wrecking your data.

> and needed to be cleared, I tried this in a third node
> set-up and a two node set-up. I also went back to a two node set-up for
> my kvm host cluster because I got some unexplainable behaviour and went
> back to keep it more KISS.

If behavior is "unexplainable", then it's a good idea to a) consult the
documentation, b) post to the mailing list. One of them will usually
lead to a good explanation.

Florian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110616/8e1d5a52/attachment-0004.sig>