[Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?

Tue May 21 01:23:40 UTC 2013

On 21/05/2013, at 1:39 AM, Andrew Widdersheim <awiddersheim at hotmail.com> wrote:

> Have I just run into a shortcoming with pacemaker?

Short answer: yes but there is a work-around

Basically attrd should be but is not truly atomic.
Despite its best efforts, updates can still arrive at sufficiently different times to produce the behavior you saw.

> Should I file a bug or RFE somewhere? Seems like there should be another parameter when setting up a pingd resource to tell the DC/policy engine to wait x amount of seconds so that all nodes have shared their connection state before it makes a decision about moving resources.

That would be:

       crmd-transition-delay = time [0s]
           *** Advanced Use Only *** Enabling this option will slow down cluster recovery under all conditions

           Delay cluster recovery for the configured interval to allow for additional/related events to occur. Useful if your configuration is sensitive to the order in which ping updates arrive.

from "man crmd" :) 

For some reason its not in pacemaker-explained, I'll fix that now.