[Pacemaker] RFC: What part of the XML configuration do you hate the most?

Tue Jun 24 13:48:12 UTC 2008

Hi Keisuke-san,

On Tue, Jun 24, 2008 at 10:01:19PM +0900, Keisuke MORI wrote:
> 
> Andrew Beekhof <abeekhof at suse.de> writes:
> 
> > The changes for Pacemaker 1.0 include an overhaul of configuration
> > syntax.
> >
> > We have a few things in mind for this already, but I'd also like to
> > get people's opinion on which parts need the most attention.
> >
> > looking forward to hearing your answers...
> 
> It's a bit late but I would like to mention some random comments 
> about the configuration and about some features.
> Those are from questions and requests from our customers and the field.  
> 
> 
> 1) migration-threthold
>    As you've already implemented it, this is one of the most
>    requested features from our customers (as Ikeda-san also mentioned
>    in the other mail). Thank you for that.
> 
>    But precisely we have two scenarios to configure to:
>    a) monitor NG -> stop -> start on the same node
>       -> monitor NG (Nth time) -> stop -> failover to another node
>    b) monitor NG -> monitor NG (Nth times) -> stop -> failover to another node
> 
>    The current pacemaker behaves as a), I think, but b) is also
>    useful when you want to ignore a transient error.

The b) part has already been discussed on the list and it's
supposed to be implemented in lrmd. I still don't have the API
defined, but thought about something like

	max-total-failures (how many times a monitor may fail)
	max-consecutive-failures (how many times in a row a monitor may fail)

These should probably be attributes defined on the monitor
operation level.

> 3) the standard location of the "initial (or bootstrap) cib.xml"
>    I saw many people confusing where to store the cib.xml and
>    how to start at the first boot time. Then they would use
>    different ways each other (one may use cibadmin -U,  other
>    may place it into /var/lib/heartbeat/crm/ by hands, etc. and
>    the original cib.xml would be gone somewhere) .
> 
>    I think it would be good to have the standard location of
>    the initial cib.xml and provide the official procedure to
>    bootstrap with using it.

I guess that a tool clearly named "apply this CIB" could do
(which is basically cibadmin -R -x). Not sure if a standard
bootstrap location would make things easier or just more
confusing.

> 4) node fencing without the poweroff
>    (this is a kind of a new feature request)
>    Node fencing is just simple and good enough in most of our cases but
>    we hesitate to use STONITH(poweroff/reboot) as the first action
>    of a failure, because:

Do you mean on operation (such as stop) failures? Or other
failures?

>    - we want to shutdown the services gracefully as long as possible.

Well, if the stop op failed, one can't do anything but shutdown,
right?

>    - rebooting the failed node may lose the evidence of the
>      real cause of a failure. We want to preserve it as possible
>      to investigate it later and to ensure that the all problems are resolved.
> 
>    We think that, ideally, when a resource failed the node would
>    try to go to 'standby' state, and only when it failed it
>    would escalate to STONITH to poweroff.

Perhaps another on_fail action. But I still don't see how that
could help.

Also, if there's a split brain one can of course only do stonith.

> 5) STONITH priority
>    Another reason why we hesitate using STONITH is the "cross counter"
>    problem when split-brain occured.
>    It would be great if we can tune so that a node with resouces running
>    is most likely to survive.

I guess that you mean the case when two nodes try to shoot each
other. OK, one node could know if it's holding the majority of
resources, but how does the other node know what its peer is
doing? Or did I completely misunderstand your point?

> 6) node fencing when the connectivity failure is detected by pingd.
>    Currently we have to have the pingd constrains for all resources.
>    It woule be helpful to simplify the config and the recovery operation
>    if we could configure the behavior as same as a resource failure.

Agreed. Just not sure how this could be implemented. Perhaps an
RA which would monitor the attributes created by pingd and for
which one could set on_fail to fence.

> Regarding to 1)-b), 4) and 5), I and my colleagues think that they
> are important and we're now studying how we can implement them.
> 
> I hope it would help for the evolution of Pacemaker.

Definitely!

Cheers,

Dejan