[Pacemaker] Unique clone instance is stopped too early on move

Mon Jan 19 18:44:37 EST 2015

> On 16 Jan 2015, at 3:59 pm, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
> 
> 16.01.2015 07:44, Andrew Beekhof wrote:
>> 
>>> On 15 Jan 2015, at 3:11 pm, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>> 
>>> 13.01.2015 11:32, Andrei Borzenkov wrote:
>>>> On Tue, Jan 13, 2015 at 10:20 AM, Vladislav Bogdanov
>>>> <bubble at hoster-ok.com> wrote:
>>>>> Hi Andrew, David, all.
>>>>> 
>>>>> I found a little bit strange operation ordering during transition execution.
>>>>> 
>>>>> Could you please look at the following partial configuration (crmsh syntax)?
>>>>> 
>>>>> ===
>>>>> ...
>>>>> clone cl-broker broker \
>>>>>         meta interleave=true target-role=Started
>>>>> clone cl-broker-vips broker-vips \
>>>>>         meta clone-node-max=2 globally-unique=true interleave=true resource-stickiness=0 target-role=Started
>>>>> clone cl-ctdb ctdb \
>>>>>         meta interleave=true target-role=Started
>>>>> colocation broker-vips-with-broker inf: cl-broker-vips cl-broker
>>>>> colocation broker-with-ctdb inf: cl-broker cl-ctdb
>>>>> order broker-after-ctdb inf: cl-ctdb cl-broker
>>>>> order broker-vips-after-broker 0: cl-broker cl-broker-vips
>>>>> ...
>>>>> ===
>>>>> 
>>>>> After I put one node to standby and then back to online, I see the following transition (relevant excerpt):
>>>>> 
>>>>> ===
>>>>>  * Pseudo action:   cl-broker-vips_stop_0
>>>>>  * Resource action: broker-vips:1   stop on c-pa-0
>>>>>  * Pseudo action:   cl-broker-vips_stopped_0
>>>>>  * Pseudo action:   cl-ctdb_start_0
>>>>>  * Resource action: ctdb            start on c-pa-1
>>>>>  * Pseudo action:   cl-ctdb_running_0
>>>>>  * Pseudo action:   cl-broker_start_0
>>>>>  * Resource action: ctdb            monitor=10000 on c-pa-1
>>>>>  * Resource action: broker          start on c-pa-1
>>>>>  * Pseudo action:   cl-broker_running_0
>>>>>  * Pseudo action:   cl-broker-vips_start_0
>>>>>  * Resource action: broker          monitor=10000 on c-pa-1
>>>>>  * Resource action: broker-vips:1   start on c-pa-1
>>>>>  * Pseudo action:   cl-broker-vips_running_0
>>>>>  * Resource action: broker-vips:1   monitor=30000 on c-pa-1
>>>>> ===
>>>>> 
>>>>> What could be a reason to stop unique clone instance so early for move?
>>>>> 
>>>> 
>>>> Do not take it as definitive answer, but cl-broker-vips cannot run
>>>> unless both other resources are started. So if you compute closure of
>>>> all required transitions it looks rather logical. Having
>>>> cl-broker-vips started while broker is still stopped would violate
>>>> constraint.
>>> 
>>> Problem is that broker-vips:1 is stopped on one (source) node unnecessarily early.
>> 
>> It looks to be moving from c-pa-0 to c-pa-1
>> It might be unnecessarily early, but it is what you asked for... we have to unwind the resource stack before we can build it up.
> 
> Yes, I understand that it is valid, but could its stop be delayed until cluster is in the state when all dependencies are satisfied to start it on another node (like migration?)?

No, because "we have to unwind the resource stack before we can build it up."
Doing anything else would be one of those things that is trivial for a human to identify but rather complex for a computer.

Better to look at why broker-vips:1 needed to be moved.

> 
> Like:
> ===
> * Pseudo action:   cl-ctdb_start_0
> * Resource action: ctdb            start on c-pa-1
> * Pseudo action:   cl-ctdb_running_0
> * Pseudo action:   cl-broker_start_0
> * Resource action: ctdb            monitor=10000 on c-pa-1
> * Resource action: broker          start on c-pa-1
> * Pseudo action:   cl-broker_running_0
> * Pseudo action:   cl-broker-vips_start_0
> * Resource action: broker          monitor=10000 on c-pa-1
> * Pseudo action:   cl-broker-vips_stop_0
> * Resource action: broker-vips:1   stop on c-pa-0
> * Pseudo action:   cl-broker-vips_stopped_0
> * Resource action: broker-vips:1   start on c-pa-1
> * Pseudo action:   cl-broker-vips_running_0
> * Resource action: broker-vips:1   monitor=30000 on c-pa-1
> ===
> That would be the great optimization toward five nines...
> 
> Best,
> Vladislav
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org