[Pacemaker] Pacemaker resource migration behaviour

Wed Mar 6 02:39:34 EST 2013

Evidently this is something that has since been fixed.

In your logs pe-input-47 results in:

<1d>Feb  6 09:37:52 mu pengine[6257]:   notice: LogActions: Demote
conntrackd:1        (Master -> Slave nu)\
<1d>Feb  6 09:37:52 mu pengine[6257]:   notice: LogActions: Demote
condition:1 (Master -> Slave nu)\
<1d>Feb  6 09:37:52 mu pengine[6257]:   notice: LogActions: Demote
sub-ospfd:1 (Master -> Slave nu)\
<1d>Feb  6 09:37:52 mu pengine[6257]:   notice: LogActions: Demote
sub-ripd:1  (Master -> Slave nu)\
<1d>Feb  6 09:37:52 mu pengine[6257]:   notice: LogActions: Demote
sub-squid:0 (Master -> Stopped nu)\
<1d>Feb  6 09:37:52 mu pengine[6257]:   notice: LogActions: Move
eth1-0-192.168.1.10 (Started nu -> mu)\
<1d>Feb  6 09:37:52 mu pengine[6257]:   notice: process_pe_message:
Calculated Transition 107:
/opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-47.bz2\

Testing with the latest code shows:

Transition Summary:
 * Promote conntrackd:0	(Slave -> Master mu)
 * Demote  conntrackd:1	(Master -> Slave nu)
 * Promote condition:0	(Slave -> Master mu)
 * Demote  condition:1	(Master -> Slave nu)
 * Promote sub-ospfd:0	(Slave -> Master mu)
 * Demote  sub-ospfd:1	(Master -> Slave nu)
 * Promote sub-ripd:0	(Slave -> Master mu)
 * Demote  sub-ripd:1	(Master -> Slave nu)
 * Demote  sub-squid:0	(Master -> Slave nu)
 * Start   sub-squid:1	(mu)
 * Promote sub-squid:1	(Stopped -> Master mu)
 * Move    eth1-0-192.168.1.10	(Started nu -> mu)

Which looks more like what you're after.

I'm still very confused about why you're using master/slave though.

On Wed, Feb 6, 2013 at 11:41 PM, James Guthrie <jag at open.ch> wrote:
> Hi David,
>
> Unfortunately crm_report doesn't work correctly on my hosts as we have compiled from source with custom paths and apparently the crm_report and associated tools are not built to use the paths that can be customised with autoconf.
>
> Despite that, I have done some investigation and think I may have found an inconsistency. I have attached the pacemaker-relevant syslog, including the pe-input files. The logfile starts where pacemaker detects that sub-squid is not running on mu. It then fails over to nu, where two further failures take place. In order to recover from these failures, the pengine produces transitions 106, 107, 108 and 109, with the corresponding pe-input files 46, 47, 48 and 49.
>
> The way I understand it, pacemaker works through the transitions until something happens from outside, at which point the transitions are recalculated and pacemaker continues on.
>
> Using crm_simulate to observe the transitions that should happen tells me that the transitions that were calculated from pe-input-49 ought to have resulted in the resources conntrackd, condition, sub-ospfd, sub-ripd and sub-squid being promote to master. In fact, this never happens, but the crmd reports the transition as being complete. It appears as though nowhere is it acknowledged that the current state is not the desired outcome as calculated by the pengine. Is it possible that this is a bug?

Not really, it means something* happened that we didn't expect.
Pacemaker stops the current transition** and automatically asks the
pengine for another set of calculations.

* sub-squid failing by the looks of it
<1c>Feb  6 09:37:52 mu crmd[6258]:  warning: update_failcount:
Updating failcount for sub-squid on nu after failed monitor: rc=9
(update=value++, time=1360139872)\

** Thats what this line is, notice the Skipped=15:

<1d>Feb  6 09:37:52 mu crmd[6258]:   notice: run_graph: Transition 107
(Complete=21, Pending=0, Fired=0, Skipped=15, Incomplete=6,
Source=/opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-47.bz2):
Stopped\

>
> Regards,
> James
>
>
>
> On Feb 5, 2013, at 7:41 PM, David Vossel <dvossel at redhat.com> wrote:
>
>>
>>
>> ----- Original Message -----
>>> From: "James Guthrie" <jag at open.ch>
>>> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
>>> Sent: Tuesday, February 5, 2013 8:12:57 AM
>>> Subject: Re: [Pacemaker] Pacemaker resource migration behaviour
>>>
>>> Hi all,
>>>
>>> as a follow-up to this, I realised that I needed to slightly change
>>> the way the resource constraints are put together, but I'm still
>>> seeing the same behaviour.
>>>
>
>>> Below are an excerpt from the logs on the host and the revised xml
>>> configuration. In this case, I caused two failures on the host mu,
>>> which forced the resources onto nu then I forced two failures on nu.
>>> What can be seen in the logs are the two detected failures on nu
>>> (the "warning: update_failcount:" lines). After the two failures on
>>> nu, the VIP is migrated back to mu, but none of the "support"
>>> resources are promoted with it.
>>
>> I can't tell much from this output.
>>
>> Run the steps you use to reproduce this and create a crm_report of the issue so we can see both the logs and pengine transition files that proceed this.
>>
>> -- Vossel
>>
>>
>>> Regards,
>>> James
>>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>