[Pacemaker] Pacemaker resource migration behaviour

Wed Mar 6 01:34:59 EST 2013

On Wed, Feb 6, 2013 at 11:41 PM, James Guthrie <jag at open.ch> wrote:
> Hi David,
>
> Unfortunately crm_report doesn't work correctly on my hosts as we have compiled from source with custom paths and apparently the crm_report and associated tools are not built to use the paths that can be customised with autoconf.

It certainly tries to:

   https://github.com/beekhof/pacemaker/blob/master/tools/report.common#L99

What does it say on your system (or, what paths did you give to autoconf)?

>
> Despite that, I have done some investigation and think I may have found an inconsistency. I have attached the pacemaker-relevant syslog, including the pe-input files.

Great, I'll take a look now.

> The logfile starts where pacemaker detects that sub-squid is not running on mu. It then fails over to nu, where two further failures take place. In order to recover from these failures, the pengine produces transitions 106, 107, 108 and 109, with the corresponding pe-input files 46, 47, 48 and 49.
>
> The way I understand it, pacemaker works through the transitions until something happens from outside, at which point the transitions are recalculated and pacemaker continues on.
>
> Using crm_simulate to observe the transitions that should happen tells me that the transitions that were calculated from pe-input-49 ought to have resulted in the resources conntrackd, condition, sub-ospfd, sub-ripd and sub-squid being promote to master. In fact, this never happens, but the crmd reports the transition as being complete. It appears as though nowhere is it acknowledged that the current state is not the desired outcome as calculated by the pengine. Is it possible that this is a bug?
>
> Regards,
> James
>
>
>
> On Feb 5, 2013, at 7:41 PM, David Vossel <dvossel at redhat.com> wrote:
>
>>
>>
>> ----- Original Message -----
>>> From: "James Guthrie" <jag at open.ch>
>>> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
>>> Sent: Tuesday, February 5, 2013 8:12:57 AM
>>> Subject: Re: [Pacemaker] Pacemaker resource migration behaviour
>>>
>>> Hi all,
>>>
>>> as a follow-up to this, I realised that I needed to slightly change
>>> the way the resource constraints are put together, but I'm still
>>> seeing the same behaviour.
>>>
>
>>> Below are an excerpt from the logs on the host and the revised xml
>>> configuration. In this case, I caused two failures on the host mu,
>>> which forced the resources onto nu then I forced two failures on nu.
>>> What can be seen in the logs are the two detected failures on nu
>>> (the "warning: update_failcount:" lines). After the two failures on
>>> nu, the VIP is migrated back to mu, but none of the "support"
>>> resources are promoted with it.
>>
>> I can't tell much from this output.
>>
>> Run the steps you use to reproduce this and create a crm_report of the issue so we can see both the logs and pengine transition files that proceed this.
>>
>> -- Vossel
>>
>>
>>> Regards,
>>> James
>>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>