[Pacemaker] Failure after intermittent network outage
andrew at beekhof.net
Mon Mar 14 04:52:00 EDT 2011
On Fri, Mar 11, 2011 at 1:31 PM, Pavel Levshin <pavel at levshin.spb.ru> wrote:
> Hi Andrew.
> I'm sorry, but I can not agree.
> Look again at the DC log. Here it says: "Action lost". This is why I use
> this term.
Including those in the first email might have been helpful don't you think?
Please file a bug and attach a hb_report archive for the period
covered by your problem.
>> Either remove the RA, or make sure it returns something sensible when
>> tools or configuration it needs are not available.
> This is what I mean by "error-prone". Such RA may appear again from fresh
> RPM. And errors in RAs just happen.
Which is why the second option is the preferred one.
> OK, I see, there is a way: I could copy each RA to the new location (like
> ocf:safe:VirtualDomain), so they will not be touched by RPMS.
That would also work
> I could even give each resource it's own RA, such as VirtualDomain-X,
> VirtualDomain-Y and so on, and place them only on those nodes where resource
> can run.
> I only think it is not the best possible way to go.
>> No. For safety we still need to verify that X is not running on node
>> C before we allow it to be active anywhere else.
>> That you know the X is unavailable on C is one thing, but the cluster
>> needs to know too.
> Therefore, I propose an addition to the Pacemaker: a way to tell the cluster
> that resource X cannot be executed on node C.
Highly unlikely to happen.
> Currently, it is done through
> status section of the CIB. I wish there was a way to do the same via
> configuration. Then the cluster could get rid of quirks with unneeded RAs.
> Maybe anyone will support my proposal?
> Pavel Levshin //flicker
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
More information about the Pacemaker