[Pacemaker] Can't failover Master/Slave with group(primitive x3) setting
Andrew Beekhof
andrew at beekhof.net
Mon Sep 26 05:02:15 UTC 2011
On Wed, Sep 14, 2011 at 5:25 PM, Junko IKEDA <tsukishima.ha at gmail.com> wrote:
> Hi,
>
> Pacemaker 1.1 shows the same behavior.
Which version did you check?
The latest from git seems to work fine:
Current cluster status:
Online: [ bl460g1n13 bl460g1n14 ]
Resource Group: grpDRBD
dummy01 (ocf::pacemaker:Dummy): Started bl460g1n13 FAILED
dummy02 (ocf::pacemaker:Dummy): Started bl460g1n13
dummy03 (ocf::pacemaker:Dummy): Started bl460g1n13
Master/Slave Set: msDRBD [prmDRBD]
Masters: [ bl460g1n13 ]
Slaves: [ bl460g1n14 ]
Transition Summary:
crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Recover
dummy01 (Started bl460g1n13)
crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Restart
dummy02 (Started bl460g1n13)
crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Restart
dummy03 (Started bl460g1n13)
crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Leave
prmDRBD:0 (Master bl460g1n13)
crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Leave
prmDRBD:1 (Slave bl460g1n14)
Executing cluster transition:
* Executing action 14: dummy03_stop_0 on bl460g1n13
* Executing action 12: dummy02_stop_0 on bl460g1n13
* Executing action 2: dummy01_stop_0 on bl460g1n13
* Executing action 11: dummy01_start_0 on bl460g1n13
* Executing action 1: dummy01_monitor_10000 on bl460g1n13
* Executing action 13: dummy02_start_0 on bl460g1n13
* Executing action 3: dummy02_monitor_10000 on bl460g1n13
* Executing action 15: dummy03_start_0 on bl460g1n13
* Executing action 4: dummy03_monitor_10000 on bl460g1n13
Revised cluster status:
Online: [ bl460g1n13 bl460g1n14 ]
Resource Group: grpDRBD
dummy01 (ocf::pacemaker:Dummy): Started bl460g1n13
dummy02 (ocf::pacemaker:Dummy): Started bl460g1n13
dummy03 (ocf::pacemaker:Dummy): Started bl460g1n13
Master/Slave Set: msDRBD [prmDRBD]
Masters: [ bl460g1n13 ]
Slaves: [ bl460g1n14 ]
> It seems that the following chengeset has the problems.
>
> http://hg.clusterlabs.org/pacemaker/stable-1.0/diff/281c8c03a8c2/pengine/native.c
>
> I could get the expected behavior with the latest Pacemaker 1.0 after
> reverting the above change.
>
> Thanks,
> Junko
>
> 2011/9/13 Junko IKEDA <tsukishima.ha at gmail.com>:
>> Hi,
>>
>> I have the following resource setting;
>>
>> - msDRBD : Master/Slave(drbd)
>> - grpDRBD : group(including 3 Dummy)
>>
>> and location setting is here;
>>
>> location rsc_location-1 msDRBD \
>> rule role=master 200: #uname eq bl460g1n13 \
>> rule role=master 100: #uname eq bl460g1n14
>> colocation rsc_colocation-1 inf: grpDRBD msDRBD:Master
>> order rsc_order-1 0: msDRBD:promote grpDRBD:start
>>
>>
>> * Initial starting;
>> ============
>> Last updated: Tue Sep 13 22:09:17 2011
>> Stack: Heartbeat
>> Current DC: bl460g1n14 (22222222-2222-2222-2222-222222222222) -
>> partition with quorum
>> Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
>> 2 Nodes configured, unknown expected votes
>> 2 Resources configured.
>> ============
>>
>> Online: [ bl460g1n13 bl460g1n14 ]
>>
>> Resource Group: grpDRBD
>> dummy01 (ocf::pacemaker:Dummy): Started bl460g1n13
>> dummy02 (ocf::pacemaker:Dummy): Started bl460g1n13
>> dummy03 (ocf::pacemaker:Dummy): Started bl460g1n13
>> Master/Slave Set: msDRBD
>> Masters: [ bl460g1n13 ]
>> Slaves: [ bl460g1n14 ]
>>
>>
>> * break dummy01;
>> ============
>> Last updated: Tue Sep 13 22:09:44 2011
>> Stack: Heartbeat
>> Current DC: bl460g1n14 (22222222-2222-2222-2222-222222222222) -
>> partition with quorum
>> Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
>> 2 Nodes configured, unknown expected votes
>> 2 Resources configured.
>> ============
>>
>> Online: [ bl460g1n13 bl460g1n14 ]
>>
>> Resource Group: grpDRBD
>> dummy01 (ocf::pacemaker:Dummy): Started bl460g1n13 FAILED
>> dummy02 (ocf::pacemaker:Dummy): Started bl460g1n13
>> dummy03 (ocf::pacemaker:Dummy): Stopped
>> Master/Slave Set: msDRBD
>> Masters: [ bl460g1n13 ]
>> Slaves: [ bl460g1n14 ]
>>
>> Failed actions:
>> dummy01_monitor_10000 (node=bl460g1n13, call=13, rc=7,
>> status=complete): not running
>>
>>
>> * grpDRBD can't failover...
>> ============
>> Last updated: Tue Sep 13 22:09:48 2011
>> Stack: Heartbeat
>> Current DC: bl460g1n14 (22222222-2222-2222-2222-222222222222) -
>> partition with quorum
>> Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
>> 2 Nodes configured, unknown expected votes
>> 2 Resources configured.
>> ============
>>
>> Online: [ bl460g1n13 bl460g1n14 ]
>>
>> Master/Slave Set: msDRBD
>> Masters: [ bl460g1n13 ]
>> Slaves: [ bl460g1n14 ]
>>
>> Failed actions:
>> dummy01_monitor_10000 (node=bl460g1n13, call=13, rc=7,
>> status=complete): not running
>>
>>
>> Please see the attached hb_report.
>>
>> I tried to reduce the primitive resource in group from 3 to 2,
>> and grpDRBD can failover in this case.
>>
>> If dummy02 or dummy03 break down instead dummy01,
>> grpDRMD can can failover, too.
>>
>> Master/Slave and group which has more than 3 resources won't work.
>>
>> Regards,
>> Junko IKEDA
>>
>>
>> NTT DATA INTELLILINK CORPORATION
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
More information about the Pacemaker
mailing list