[Pacemaker] 1.1.10 rc7 + final strange behaviour

Wed Jul 31 01:33:01 UTC 2013

On 30/07/2013, at 12:29 AM, Johan Huysmans <johan.huysmans at inuits.be> wrote:

> Hi,
> 
> I was testing the latest rc7 and the final version of 1.1.10.
> 
> My test is the combination of cloned resources and 1 resource group containing constraints that the resource group can only run on a node where the cloned resource is running
> 
> With a 1 node setup everything works ok.

Are you sure about that?
You've configured both on-fail=block (do nothing) and an aggressive failure-timeout (pretend the failure never happened after 10s).

> Triggering a failure in the cloned resource triggered a stop of the resource group.
> Recovering the failure in the cloned resource triggered a recovery of the group.
> 
> I added my other node, and performed the same failure.
> However here strange things happened.
> The failure is not correctly shown in crm_mon. A failing resource can be shown as ok and and when it is recovered it can be shown a failing.
> The resource group is still running on the node of the cluster where the cloned resource is failing, however this should failover.
> 
> It seems that something got broken in one of the latest rc's as this did work.
> 
> 
> I included in a crm_report of the moment where a cloned resource failed.
> 
> Greetings,
> Johan
> <pcmk-Mon-29-Jul-2013.tar.bz2>_______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org