[Pacemaker] Can't failover Master/Slave with group(primitive x3) setting

Thu Sep 29 05:39:21 UTC 2011

On Tue, Sep 27, 2011 at 2:31 PM, Junko IKEDA <tsukishima.ha at gmail.com> wrote:
> Hi,
>
>> Which version did you check?
>
> Pacemaker 1.0.11.

I meant of 1.1 since you said:

  "Pacemaker 1.1 shows the same behavior."

>
>> The latest from git seems to work fine:
>>
>> Current cluster status:
>> Online: [ bl460g1n13 bl460g1n14 ]
>>
>>  Resource Group: grpDRBD
>>     dummy01    (ocf::pacemaker:Dummy): Started bl460g1n13 FAILED
>>     dummy02    (ocf::pacemaker:Dummy): Started bl460g1n13
>>     dummy03    (ocf::pacemaker:Dummy): Started bl460g1n13
>>  Master/Slave Set: msDRBD [prmDRBD]
>>     Masters: [ bl460g1n13 ]
>>     Slaves: [ bl460g1n14 ]
>>
>> Transition Summary:
>> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Recover
>> dummy01 (Started bl460g1n13)
>> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Restart
>> dummy02 (Started bl460g1n13)
>> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Restart
>> dummy03 (Started bl460g1n13)
>> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Leave
>> prmDRBD:0       (Master bl460g1n13)
>> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Leave
>> prmDRBD:1       (Slave bl460g1n14)
>>
>> Executing cluster transition:
>>  * Executing action 14: dummy03_stop_0 on bl460g1n13
>>  * Executing action 12: dummy02_stop_0 on bl460g1n13
>>  * Executing action 2: dummy01_stop_0 on bl460g1n13
>>  * Executing action 11: dummy01_start_0 on bl460g1n13
>>  * Executing action 1: dummy01_monitor_10000 on bl460g1n13
>>  * Executing action 13: dummy02_start_0 on bl460g1n13
>>  * Executing action 3: dummy02_monitor_10000 on bl460g1n13
>>  * Executing action 15: dummy03_start_0 on bl460g1n13
>>  * Executing action 4: dummy03_monitor_10000 on bl460g1n13
>
> dummy01 got the fail-count,
> so dummy01 should move from bl460g1n13 to bl460g1n14.
> Why does it re-start on the failure node?
>
> I got the latest changeset from hg;
>
> # hg log | head -n 7
> changeset:   15777:a15ead49e20f
> branch:      stable-1.0
> tag:         tip
> user:        Andrew Beekhof <andrew at beekhof.net>
> date:        Thu Aug 25 16:49:59 2011 +1000
> summary:     changeset: 15775:fe18a1ad46f8
>
> # crm
> crm(live)# cib import pe-input-7.bz2
> crm(pe-input-7)# configure ptest vvv
> ptest[19194]: 2011/09/27_11:53:45 notice: unpack_config: On loss of
> CCM Quorum: Ignore
> ptest[19194]: 2011/09/27_11:53:45 WARN: unpack_nodes: Blind faith: not
> fencing unseen nodes
> ptest[19194]: 2011/09/27_11:53:45 notice: group_print:  Resource Group: grpDRBD
> ptest[19194]: 2011/09/27_11:53:45 notice: native_print:      dummy01
>  (ocf::pacemaker:Dummy): Started bl460g1n13
> ptest[19194]: 2011/09/27_11:53:45 notice: native_print:      dummy02
>  (ocf::pacemaker:Dummy): Started bl460g1n13
> ptest[19194]: 2011/09/27_11:53:45 notice: native_print:      dummy03
>  (ocf::pacemaker:Dummy): Started bl460g1n13
> ptest[19194]: 2011/09/27_11:53:45 notice: clone_print:  Master/Slave Set: msDRBD
> ptest[19194]: 2011/09/27_11:53:45 notice: short_print:      Masters: [
> bl460g1n13 ]
> ptest[19194]: 2011/09/27_11:53:45 notice: short_print:      Slaves: [
> bl460g1n14 ]
> ptest[19194]: 2011/09/27_11:53:45 WARN: common_apply_stickiness:
> Forcing dummy01 away from bl460g1n13 after 1 failures (max=1)
> ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Stop    resource
> dummy01  (bl460g1n13)
> ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Stop    resource
> dummy02  (bl460g1n13)
> ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Stop    resource
> dummy03  (bl460g1n13)
> ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Leave   resource
> prmDRBD:0        (Master bl460g1n13)
> ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Leave   resource
> prmDRBD:1        (Slave bl460g1n14)
> INFO: install graphviz to see a transition graph
> crm(pe-input-7)# quit
>
>
> reverts to Pacemaker 1.0.11,
>
> # hg revert -a -r b2e39d318fda
> # make install
>
> # crm
> crm(live)# cib import pe-input-7.bz2
> crm(pe-input-7)# configure ptest vvv
> ptest[751]: 2011/09/27_11:57:50 notice: unpack_config: On loss of CCM
> Quorum: Ignore
> ptest[751]: 2011/09/27_11:57:50 WARN: unpack_nodes: Blind faith: not
> fencing unseen nodes
> ptest[751]: 2011/09/27_11:57:50 notice: group_print:  Resource Group: grpDRBD
> ptest[751]: 2011/09/27_11:57:50 notice: native_print:      dummy01
>  (ocf::pacemaker:Dummy): Started bl460g1n13
> ptest[751]: 2011/09/27_11:57:50 notice: native_print:      dummy02
>  (ocf::pacemaker:Dummy): Started bl460g1n13
> ptest[751]: 2011/09/27_11:57:50 notice: native_print:      dummy03
>  (ocf::pacemaker:Dummy): Started bl460g1n13
> ptest[751]: 2011/09/27_11:57:50 notice: clone_print:  Master/Slave Set: msDRBD
> ptest[751]: 2011/09/27_11:57:50 notice: short_print:      Masters: [
> bl460g1n13 ]
> ptest[751]: 2011/09/27_11:57:50 notice: short_print:      Slaves: [ bl460g1n14 ]
> ptest[751]: 2011/09/27_11:57:50 WARN: common_apply_stickiness: Forcing
> dummy01 away from bl460g1n13 after 1 failures (max=1)
> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp:  Start recurring
> monitor (10s) for dummy01 on bl460g1n14
> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp:  Start recurring
> monitor (10s) for dummy02 on bl460g1n14
> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp:  Start recurring
> monitor (10s) for dummy03 on bl460g1n14
> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp:  Start recurring
> monitor (20s) for prmDRBD:0 on bl460g1n13
> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp:  Start recurring
> monitor (10s) for prmDRBD:1 on bl460g1n14
> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp:  Start recurring
> monitor (20s) for prmDRBD:0 on bl460g1n13
> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp:  Start recurring
> monitor (10s) for prmDRBD:1 on bl460g1n14
> ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Move resource
> dummy01       (Started bl460g1n13 -> bl460g1n14)
> ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Move resource
> dummy02       (Started bl460g1n13 -> bl460g1n14)
> ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Move resource
> dummy03       (Started bl460g1n13 -> bl460g1n14)
> ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Demote prmDRBD:0
>  (Master -> Slave bl460g1n13)
> ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Promote prmDRBD:1
>  (Slave -> Master bl460g1n14)
> INFO: install graphviz to see a transition graph
>
> Pacemaker 1.0.10 moved the failure resource to the other node.
> It's the expected behavior.
>
> I attached the hb_report which includes the above pe-input-7.bz2.
>
> Thanks,
> Junko
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>