[Pacemaker] RFC: What part of the XML configuration do you hate the most?

Wed Jul 9 09:46:24 UTC 2008

Hi,

Andrew Beekhof wrote:
(snip)
>> 1) migration-threthold
>>   As you've already implemented it, this is one of the most
>>   requested features from our customers (as Ikeda-san also mentioned
>>   in the other mail). Thank you for that.
>>
>>   But precisely we have two scenarios to configure to:
>>   a) monitor NG -> stop -> start on the same node
>>      -> monitor NG (Nth time) -> stop -> failover to another node
>>   b) monitor NG -> monitor NG (Nth times) -> stop -> failover to 
>> another node
>>
>>   The current pacemaker behaves as a), I think, but b) is also
>>   useful when you want to ignore a transient error.
> 
> how was b) possible?
> I've not done anything deliberately to break it...
> 

I'm thinking to implement it by modifying like the following:
When it detects monitor NG, consider the operation to have succeeded
till the fail-count exceeds the new threshold,
as if the operation's setting of "on_fail" was "ignore".
For this modification, maybe we will make some changes
in unpack_rsc_op().
Its advantage is the counter of operation's NG can be unified
in fail-count.

But it has a problem.
When monitor NG occurs in succession, pengine can't detect it.
This is because lrmd doesn't notify its client
when the return code of RA is the same as the last time.
On trial, I tried to cancel the monitor operation and set it again
for re-setting the status in lrmd.
The problem is that when the monitor was failed
the next monitor action is executed again immediately without the interval.

Do you have any ideas to solve this problem?
I'm very much interested to hear your any comments.

Regards,
Satomi Taniguchi

>>
>>
>>
>> 2) auto-failback = on/off
>>   Another FAQ to us is how to configure to 'auto-failback off'.
>>   Currently we achieve this by 'default-resource-stickiness=INFINITY'
>>   but it would be great to have a more purpose-oriented parameter
>>   (that makes easy to understand to users).
> 
> "back" is a concept that only applies to 2 node clusters.
> the only sane way to make this work in 3+ node clusters is to invert it 
> and instead have a parameter that indicates if the resource should stay 
> where it is.
> 
> thats what default-resource-stickiness is.
> 
>>
>>
>>
>> 3) the standard location of the "initial (or bootstrap) cib.xml"
>>   I saw many people confusing where to store the cib.xml and
>>   how to start at the first boot time. Then they would use
>>   different ways each other (one may use cibadmin -U,  other
>>   may place it into /var/lib/heartbeat/crm/ by hands, etc. and
>>   the original cib.xml would be gone somewhere) .
> 
> Actually it gets moved out of the way.
> We do a lot of work to prevent admin changes from being overwritten/lost 
> (even when they specifically go against the documentation).
> 
>>
>>
>>   I think it would be good to have the standard location of
>>   the initial cib.xml and provide the official procedure to
>>   bootstrap with using it.
> 
> The documentation is very clear on this.
> Do not modify the real CIB by hand. Ever.
> 
> "Create" is ok, but never ever modify.
> 
> Adding an extra location is simply going to make it even more complex.
> 
>> 4) node fencing without the poweroff
>>   (this is a kind of a new feature request)
>>   Node fencing is just simple and good enough in most of our cases but
>>   we hesitate to use STONITH(poweroff/reboot) as the first action
>>   of a failure, because:
>>   - we want to shutdown the services gracefully as long as possible.
>>   - rebooting the failed node may lose the evidence of the
>>     real cause of a failure. We want to preserve it as possible
>>     to investigate it later and to ensure that the all problems are 
>> resolved.
>>
>>   We think that, ideally, when a resource failed the node would
>>   try to go to 'standby' state, and only when it failed it
>>   would escalate to STONITH to poweroff.
> 
> The problem with this is that it directly (and negatively) impacts 
> service availability.
> It is unsafe to start services elsewhere until they are confirmed dead 
> on the existing node.
> 
> So relying on manual shutdowns greatly increases failover time.
> 
> One thing we used to do (but had to disable because we couldn't get it 
> 100% right at the time) was move off the healthy resources before 
> shooting the node.  I think resurrecting this feature is a better approach.
> 
>> 5) STONITH priority
>>   Another reason why we hesitate using STONITH is the "cross counter"
>>   problem when split-brain occured.
>>   It would be great if we can tune so that a node with resouces running
>>   is most likely to survive.
>>
>>
>> 6) node fencing when the connectivity failure is detected by pingd.
>>   Currently we have to have the pingd constrains for all resources.
>>   It woule be helpful to simplify the config and the recovery operation
>>   if we could configure the behavior as same as a resource failure.
> 
> I think this could be easily done by creating a new mode for pingd - 
> such that it "fails" when all connectivity is lost.
> Then it would just be a matter of setting on_fail=fence for pingd's 
> monitor op.
> 
>> Regarding to 1)-b), 4) and 5), I and my colleagues think that they
>> are important and we're now studying how we can implement them.
> 
> Please let me know if you come up with anything.
> I don't have any real objection to the concepts - well, except maybe 4).
> 
> You might also want to add bugzilla enhancements for these so that we 
> (or I) don't forget about them.
> 
>>
>>
>> I hope it would help for the evolution of Pacemaker.
>>
>> Thanks,
>>
>> Keisuke MORI
>> NTT DATA Intellilink Corporation
>>
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at clusterlabs.org
>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
> 
> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at clusterlabs.org
> http://list.clusterlabs.org/mailman/listinfo/pacemaker