[Pacemaker] Pacemaker resource migration behaviour

Wed Feb 6 19:14:47 UTC 2013

----- Original Message -----
> From: "James Guthrie" <jag at open.ch>
> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> Sent: Wednesday, February 6, 2013 6:52:07 AM
> Subject: Re: [Pacemaker] Pacemaker resource migration behaviour
> 
> A quick addendum to this message:
> 
> The log files I provided actually continue until the resources do get
> started on the host. The trigger for that is the 6-minute
> failure-timeout timer that pops. As can be seen in pe-input-50, the
> resources conntrackd, condition, sub-ospfd and sub-ripd are in slave
> on both hosts and sub-squid is not started on either. This shows
> that the desired end-state of the transitions produced with
> pe-input-49 is never reached.
> 

Yep, This looks like a bug in attrd.  I see the command going out to delete the fail-count for squid, but it fails. Since the fail-count isn't properly expired that sub-squid device can't start.

Can you open a bugs.clusterlabs.org issue for this please.  Include the logs. 

Thanks,
-- Vossel

> James
> 
> On Feb 6, 2013, at 1:41 PM, James Guthrie <jag at open.ch> wrote:
> 
> > Hi David,
> > 
> > Unfortunately crm_report doesn't work correctly on my hosts as we
> > have compiled from source with custom paths and apparently the
> > crm_report and associated tools are not built to use the paths
> > that can be customised with autoconf.
> > 
> > Despite that, I have done some investigation and think I may have
> > found an inconsistency. I have attached the pacemaker-relevant
> > syslog, including the pe-input files. The logfile starts where
> > pacemaker detects that sub-squid is not running on mu. It then
> > fails over to nu, where two further failures take place. In order
> > to recover from these failures, the pengine produces transitions
> > 106, 107, 108 and 109, with the corresponding pe-input files 46,
> > 47, 48 and 49.
> > 
> > The way I understand it, pacemaker works through the transitions
> > until something happens from outside, at which point the
> > transitions are recalculated and pacemaker continues on.
> > 
> > Using crm_simulate to observe the transitions that should happen
> > tells me that the transitions that were calculated from
> > pe-input-49 ought to have resulted in the resources conntrackd,
> > condition, sub-ospfd, sub-ripd and sub-squid being promote to
> > master. In fact, this never happens, but the crmd reports the
> > transition as being complete. It appears as though nowhere is it
> > acknowledged that the current state is not the desired outcome as
> > calculated by the pengine. Is it possible that this is a bug?
> > 
> > Regards,
> > James
> > 
> > <pacemaker-not-starting-resources.tar.gz>
> > On Feb 5, 2013, at 7:41 PM, David Vossel <dvossel at redhat.com>
> > wrote:
> > 
> >> 
> >> 
> >> ----- Original Message -----
> >>> From: "James Guthrie" <jag at open.ch>
> >>> To: "The Pacemaker cluster resource manager"
> >>> <pacemaker at oss.clusterlabs.org>
> >>> Sent: Tuesday, February 5, 2013 8:12:57 AM
> >>> Subject: Re: [Pacemaker] Pacemaker resource migration behaviour
> >>> 
> >>> Hi all,
> >>> 
> >>> as a follow-up to this, I realised that I needed to slightly
> >>> change
> >>> the way the resource constraints are put together, but I'm still
> >>> seeing the same behaviour.
> >>> 
> > 
> >>> Below are an excerpt from the logs on the host and the revised
> >>> xml
> >>> configuration. In this case, I caused two failures on the host
> >>> mu,
> >>> which forced the resources onto nu then I forced two failures on
> >>> nu.
> >>> What can be seen in the logs are the two detected failures on nu
> >>> (the "warning: update_failcount:" lines). After the two failures
> >>> on
> >>> nu, the VIP is migrated back to mu, but none of the "support"
> >>> resources are promoted with it.
> >> 
> >> I can't tell much from this output.
> >> 
> >> Run the steps you use to reproduce this and create a crm_report of
> >> the issue so we can see both the logs and pengine transition
> >> files that proceed this.
> >> 
> >> -- Vossel
> >> 
> >> 
> >>> Regards,
> >>> James
> >>> 
> >> 
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >> 
> >> Project Home: http://www.clusterlabs.org
> >> Getting started:
> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> > 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>