[Pacemaker] Pacemaker resource migration behaviour

Wed Feb 6 12:52:07 UTC 2013

A quick addendum to this message:

The log files I provided actually continue until the resources do get started on the host. The trigger for that is the 6-minute failure-timeout timer that pops. As can be seen in pe-input-50, the resources conntrackd, condition, sub-ospfd and sub-ripd are in slave on both hosts and sub-squid is not started on either. This shows that the desired end-state of the transitions produced with pe-input-49 is never reached.

James

On Feb 6, 2013, at 1:41 PM, James Guthrie <jag at open.ch> wrote:

> Hi David,
> 
> Unfortunately crm_report doesn't work correctly on my hosts as we have compiled from source with custom paths and apparently the crm_report and associated tools are not built to use the paths that can be customised with autoconf.
> 
> Despite that, I have done some investigation and think I may have found an inconsistency. I have attached the pacemaker-relevant syslog, including the pe-input files. The logfile starts where pacemaker detects that sub-squid is not running on mu. It then fails over to nu, where two further failures take place. In order to recover from these failures, the pengine produces transitions 106, 107, 108 and 109, with the corresponding pe-input files 46, 47, 48 and 49.
> 
> The way I understand it, pacemaker works through the transitions until something happens from outside, at which point the transitions are recalculated and pacemaker continues on.
> 
> Using crm_simulate to observe the transitions that should happen tells me that the transitions that were calculated from pe-input-49 ought to have resulted in the resources conntrackd, condition, sub-ospfd, sub-ripd and sub-squid being promote to master. In fact, this never happens, but the crmd reports the transition as being complete. It appears as though nowhere is it acknowledged that the current state is not the desired outcome as calculated by the pengine. Is it possible that this is a bug?
> 
> Regards,
> James
> 
> <pacemaker-not-starting-resources.tar.gz>
> On Feb 5, 2013, at 7:41 PM, David Vossel <dvossel at redhat.com> wrote:
> 
>> 
>> 
>> ----- Original Message -----
>>> From: "James Guthrie" <jag at open.ch>
>>> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
>>> Sent: Tuesday, February 5, 2013 8:12:57 AM
>>> Subject: Re: [Pacemaker] Pacemaker resource migration behaviour
>>> 
>>> Hi all,
>>> 
>>> as a follow-up to this, I realised that I needed to slightly change
>>> the way the resource constraints are put together, but I'm still
>>> seeing the same behaviour.
>>> 
> 
>>> Below are an excerpt from the logs on the host and the revised xml
>>> configuration. In this case, I caused two failures on the host mu,
>>> which forced the resources onto nu then I forced two failures on nu.
>>> What can be seen in the logs are the two detected failures on nu
>>> (the "warning: update_failcount:" lines). After the two failures on
>>> nu, the VIP is migrated back to mu, but none of the "support"
>>> resources are promoted with it.
>> 
>> I can't tell much from this output.
>> 
>> Run the steps you use to reproduce this and create a crm_report of the issue so we can see both the logs and pengine transition files that proceed this.
>> 
>> -- Vossel
>> 
>> 
>>> Regards,
>>> James
>>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org