[Pacemaker] RFC: What part of the XML configuration do you hate the most?
Andrew Beekhof
beekhof at gmail.com
Fri Jun 27 12:23:51 UTC 2008
On Jun 27, 2008, at 2:18 PM, Keisuke MORI wrote:
> Hi,
>
> just about topic 4) in this mail...
>
> Andrew Beekhof <beekhof at gmail.com> writes:
>>> 4) node fencing without the poweroff
>>> (this is a kind of a new feature request)
>>> Node fencing is just simple and good enough in most of our cases
>>> but
>>> we hesitate to use STONITH(poweroff/reboot) as the first action
>>> of a failure, because:
>>> - we want to shutdown the services gracefully as long as possible.
>>> - rebooting the failed node may lose the evidence of the
>>> real cause of a failure. We want to preserve it as possible
>>> to investigate it later and to ensure that the all problems are
>>> resolved.
>>>
>>> We think that, ideally, when a resource failed the node would
>>> try to go to 'standby' state, and only when it failed it
>>> would escalate to STONITH to poweroff.
>>
>> The problem with this is that it directly (and negatively) impacts
>> service availability.
>> It is unsafe to start services elsewhere until they are confirmed
>> dead
>> on the existing node.
>>
>> So relying on manual shutdowns greatly increases failover time.
>
>
> Right, but I think it depends on applications.
>
> In the case of database applications such as pgsql or oracle,
> the most dominant factor of failover time is the recovery time.
> Shutting down a node in the middle of a transaction will cause a
> rollback action and will increase the recovery time more and more.
> We estimates 3-5 minutes at most for the recovery time in our
> configuration.
>
> Another case is Filesystem on a shared storage.
> You should run fsck before mounting it on the failover-ed node
> for the safety of the data if the filesystem was not umounted cleanly.
> It would take a very long time particularly if the filesystem
> is very large as used by a database.
>
> Addition to this, there may be a risk of data loss if the power
> was suddenly down. Such risks may be neglected, but if there's
> anything we can do to avoid or minimize such risks then we want
> to take the steps for that.
I think you want on_fail=block.
The cluster wont do anything itself but will instead wait for human
intervention.
>
>
>
>>
>> One thing we used to do (but had to disable because we couldn't get
>> it
>> 100% right at the time) was move off the healthy resources before
>> shooting the node. I think resurrecting this feature is a better
>> approach.
>
> Yes, that sounds good to me.
> One thing I'm wondering is that if the cluster manager was able
> to confirm all the resouces were stopped on the failed node, it
> does not necessarily need to be turned off, doesn't it?
If it could do that - then it wouldn't have tried to shoot it in the
first place :-)
More information about the Pacemaker
mailing list