[Pacemaker] RFC: What part of the XML configuration do you hate the most?
Keisuke MORI
kskmori at intellilink.co.jp
Fri Jun 27 12:21:46 UTC 2008
Dejan Muhamedagic <dejanmm at fastmail.fm> writes:
>> 4) node fencing without the poweroff
>> (this is a kind of a new feature request)
>> Node fencing is just simple and good enough in most of our cases but
>> we hesitate to use STONITH(poweroff/reboot) as the first action
>> of a failure, because:
>
> Do you mean on operation (such as stop) failures? Or other
> failures?
>
I meant monitor/start failures in this particular scenario.
>> - we want to shutdown the services gracefully as long as possible.
>
> Well, if the stop op failed, one can't do anything but shutdown,
> right?
Yes, I agree with you in the case of stop failures,
What I want to do here is that even when a monitor failed,
I want let all the resources (including other group or clone
resources) move away from the failed node.
>
>> - rebooting the failed node may lose the evidence of the
>> real cause of a failure. We want to preserve it as possible
>> to investigate it later and to ensure that the all problems are resolved.
>>
>> We think that, ideally, when a resource failed the node would
>> try to go to 'standby' state, and only when it failed it
>> would escalate to STONITH to poweroff.
>
> Perhaps another on_fail action. But I still don't see how that
> could help.
>
> Also, if there's a split brain one can of course only do stonith.
sfex can be used for that, and that's one of our major reasons
that we developed it.
>
>> 5) STONITH priority
>> Another reason why we hesitate using STONITH is the "cross counter"
>> problem when split-brain occured.
>> It would be great if we can tune so that a node with resouces running
>> is most likely to survive.
>
> I guess that you mean the case when two nodes try to shoot each
> other. OK, one node could know if it's holding the majority of
> resources, but how does the other node know what its peer is
> doing? Or did I completely misunderstand your point?
>
You're exactly right. Thank you for clarifying my explanation.
But I'm not expecting here the _perfect_ solution which would work
on every situation in _all automatically_ as you suggested.
Manual tunable parameters for a specific configuration would be
just fine, I think.
Just an idea in my mind is something like a 'stonith-delay'.
The intention is that, "If you're going to shoot a node which
a specific resource is running on, then please hold a second."
which will give a chance for the active node to shoot others and survive.
Obviously it will increase the failover time when the node was
really down, but I think it would be 2-3 seconds (or around the keepalive).
It's a trade-off and up to the users.
Thanks,
--
Keisuke MORI
NTT DATA Intellilink Corporation
More information about the Pacemaker
mailing list