[Pacemaker] Can't failover Master/Slave with group(primitive x3) setting
Andrew Beekhof
andrew at beekhof.net
Thu Sep 29 05:39:21 UTC 2011
On Tue, Sep 27, 2011 at 2:31 PM, Junko IKEDA <tsukishima.ha at gmail.com> wrote:
> Hi,
>
>> Which version did you check?
>
> Pacemaker 1.0.11.
I meant of 1.1 since you said:
"Pacemaker 1.1 shows the same behavior."
>
>> The latest from git seems to work fine:
>>
>> Current cluster status:
>> Online: [ bl460g1n13 bl460g1n14 ]
>>
>> Resource Group: grpDRBD
>> dummy01 (ocf::pacemaker:Dummy): Started bl460g1n13 FAILED
>> dummy02 (ocf::pacemaker:Dummy): Started bl460g1n13
>> dummy03 (ocf::pacemaker:Dummy): Started bl460g1n13
>> Master/Slave Set: msDRBD [prmDRBD]
>> Masters: [ bl460g1n13 ]
>> Slaves: [ bl460g1n14 ]
>>
>> Transition Summary:
>> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Recover
>> dummy01 (Started bl460g1n13)
>> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Restart
>> dummy02 (Started bl460g1n13)
>> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Restart
>> dummy03 (Started bl460g1n13)
>> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Leave
>> prmDRBD:0 (Master bl460g1n13)
>> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Leave
>> prmDRBD:1 (Slave bl460g1n14)
>>
>> Executing cluster transition:
>> * Executing action 14: dummy03_stop_0 on bl460g1n13
>> * Executing action 12: dummy02_stop_0 on bl460g1n13
>> * Executing action 2: dummy01_stop_0 on bl460g1n13
>> * Executing action 11: dummy01_start_0 on bl460g1n13
>> * Executing action 1: dummy01_monitor_10000 on bl460g1n13
>> * Executing action 13: dummy02_start_0 on bl460g1n13
>> * Executing action 3: dummy02_monitor_10000 on bl460g1n13
>> * Executing action 15: dummy03_start_0 on bl460g1n13
>> * Executing action 4: dummy03_monitor_10000 on bl460g1n13
>
> dummy01 got the fail-count,
> so dummy01 should move from bl460g1n13 to bl460g1n14.
> Why does it re-start on the failure node?
>
> I got the latest changeset from hg;
>
> # hg log | head -n 7
> changeset: 15777:a15ead49e20f
> branch: stable-1.0
> tag: tip
> user: Andrew Beekhof <andrew at beekhof.net>
> date: Thu Aug 25 16:49:59 2011 +1000
> summary: changeset: 15775:fe18a1ad46f8
>
> # crm
> crm(live)# cib import pe-input-7.bz2
> crm(pe-input-7)# configure ptest vvv
> ptest[19194]: 2011/09/27_11:53:45 notice: unpack_config: On loss of
> CCM Quorum: Ignore
> ptest[19194]: 2011/09/27_11:53:45 WARN: unpack_nodes: Blind faith: not
> fencing unseen nodes
> ptest[19194]: 2011/09/27_11:53:45 notice: group_print: Resource Group: grpDRBD
> ptest[19194]: 2011/09/27_11:53:45 notice: native_print: dummy01
> (ocf::pacemaker:Dummy): Started bl460g1n13
> ptest[19194]: 2011/09/27_11:53:45 notice: native_print: dummy02
> (ocf::pacemaker:Dummy): Started bl460g1n13
> ptest[19194]: 2011/09/27_11:53:45 notice: native_print: dummy03
> (ocf::pacemaker:Dummy): Started bl460g1n13
> ptest[19194]: 2011/09/27_11:53:45 notice: clone_print: Master/Slave Set: msDRBD
> ptest[19194]: 2011/09/27_11:53:45 notice: short_print: Masters: [
> bl460g1n13 ]
> ptest[19194]: 2011/09/27_11:53:45 notice: short_print: Slaves: [
> bl460g1n14 ]
> ptest[19194]: 2011/09/27_11:53:45 WARN: common_apply_stickiness:
> Forcing dummy01 away from bl460g1n13 after 1 failures (max=1)
> ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Stop resource
> dummy01 (bl460g1n13)
> ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Stop resource
> dummy02 (bl460g1n13)
> ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Stop resource
> dummy03 (bl460g1n13)
> ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Leave resource
> prmDRBD:0 (Master bl460g1n13)
> ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Leave resource
> prmDRBD:1 (Slave bl460g1n14)
> INFO: install graphviz to see a transition graph
> crm(pe-input-7)# quit
>
>
> reverts to Pacemaker 1.0.11,
>
> # hg revert -a -r b2e39d318fda
> # make install
>
> # crm
> crm(live)# cib import pe-input-7.bz2
> crm(pe-input-7)# configure ptest vvv
> ptest[751]: 2011/09/27_11:57:50 notice: unpack_config: On loss of CCM
> Quorum: Ignore
> ptest[751]: 2011/09/27_11:57:50 WARN: unpack_nodes: Blind faith: not
> fencing unseen nodes
> ptest[751]: 2011/09/27_11:57:50 notice: group_print: Resource Group: grpDRBD
> ptest[751]: 2011/09/27_11:57:50 notice: native_print: dummy01
> (ocf::pacemaker:Dummy): Started bl460g1n13
> ptest[751]: 2011/09/27_11:57:50 notice: native_print: dummy02
> (ocf::pacemaker:Dummy): Started bl460g1n13
> ptest[751]: 2011/09/27_11:57:50 notice: native_print: dummy03
> (ocf::pacemaker:Dummy): Started bl460g1n13
> ptest[751]: 2011/09/27_11:57:50 notice: clone_print: Master/Slave Set: msDRBD
> ptest[751]: 2011/09/27_11:57:50 notice: short_print: Masters: [
> bl460g1n13 ]
> ptest[751]: 2011/09/27_11:57:50 notice: short_print: Slaves: [ bl460g1n14 ]
> ptest[751]: 2011/09/27_11:57:50 WARN: common_apply_stickiness: Forcing
> dummy01 away from bl460g1n13 after 1 failures (max=1)
> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring
> monitor (10s) for dummy01 on bl460g1n14
> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring
> monitor (10s) for dummy02 on bl460g1n14
> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring
> monitor (10s) for dummy03 on bl460g1n14
> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring
> monitor (20s) for prmDRBD:0 on bl460g1n13
> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring
> monitor (10s) for prmDRBD:1 on bl460g1n14
> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring
> monitor (20s) for prmDRBD:0 on bl460g1n13
> ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp: Start recurring
> monitor (10s) for prmDRBD:1 on bl460g1n14
> ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Move resource
> dummy01 (Started bl460g1n13 -> bl460g1n14)
> ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Move resource
> dummy02 (Started bl460g1n13 -> bl460g1n14)
> ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Move resource
> dummy03 (Started bl460g1n13 -> bl460g1n14)
> ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Demote prmDRBD:0
> (Master -> Slave bl460g1n13)
> ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Promote prmDRBD:1
> (Slave -> Master bl460g1n14)
> INFO: install graphviz to see a transition graph
>
> Pacemaker 1.0.10 moved the failure resource to the other node.
> It's the expected behavior.
>
> I attached the hb_report which includes the above pe-input-7.bz2.
>
> Thanks,
> Junko
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
More information about the Pacemaker
mailing list