[Pacemaker] RFC: What part of the XML configuration do you hate the most?

Fri Jun 27 12:23:51 UTC 2008

On Jun 27, 2008, at 2:18 PM, Keisuke MORI wrote:

> Hi,
>
> just about topic 4) in this mail...
>
> Andrew Beekhof <beekhof at gmail.com> writes:
>>> 4) node fencing without the poweroff
>>>  (this is a kind of a new feature request)
>>>  Node fencing is just simple and good enough in most of our cases  
>>> but
>>>  we hesitate to use STONITH(poweroff/reboot) as the first action
>>>  of a failure, because:
>>>  - we want to shutdown the services gracefully as long as possible.
>>>  - rebooting the failed node may lose the evidence of the
>>>    real cause of a failure. We want to preserve it as possible
>>>    to investigate it later and to ensure that the all problems are
>>> resolved.
>>>
>>>  We think that, ideally, when a resource failed the node would
>>>  try to go to 'standby' state, and only when it failed it
>>>  would escalate to STONITH to poweroff.
>>
>> The problem with this is that it directly (and negatively) impacts
>> service availability.
>> It is unsafe to start services elsewhere until they are confirmed  
>> dead
>> on the existing node.
>>
>> So relying on manual shutdowns greatly increases failover time.
>
>
> Right, but I think it depends on applications.
>
> In the case of database applications such as pgsql or oracle,
> the most dominant factor of failover time is the recovery time.
> Shutting down a node in the middle of a transaction will cause a
> rollback action and will increase the recovery time more and more.
> We estimates 3-5 minutes at most for the recovery time in our  
> configuration.
>
> Another case is Filesystem on a shared storage.
> You should run fsck before mounting it on the failover-ed node
> for the safety of the data if the filesystem was not umounted cleanly.
> It would take a very long time particularly if the filesystem
> is very large as used by a database.
>
> Addition to this, there may be a risk of data loss if the power
> was suddenly down.  Such risks may be neglected, but if there's
> anything we can do to avoid or minimize such risks then we want
> to take the steps for that.

I think you want on_fail=block.
The cluster wont do anything itself but will instead wait for human  
intervention.

>
>
>
>>
>> One thing we used to do (but had to disable because we couldn't get  
>> it
>> 100% right at the time) was move off the healthy resources before
>> shooting the node.  I think resurrecting this feature is a better
>> approach.
>
> Yes, that sounds good to me.
> One thing I'm wondering is that if the cluster manager was able
> to confirm all the resouces were stopped on the failed node, it
> does not necessarily need to be turned off, doesn't it?

If it could do that - then it wouldn't have tried to shoot it in the  
first place :-)