[Pacemaker] Can't failover Master/Slave with group(primitive x3) setting

Wed Sep 14 07:25:04 UTC 2011

Hi,

Pacemaker 1.1 shows the same behavior.
It seems that the following chengeset has the problems.

http://hg.clusterlabs.org/pacemaker/stable-1.0/diff/281c8c03a8c2/pengine/native.c

I could get the expected behavior with the latest Pacemaker 1.0 after
reverting the above change.

Thanks,
Junko

2011/9/13 Junko IKEDA <tsukishima.ha at gmail.com>:
> Hi,
>
> I have the following resource setting;
>
> - msDRBD : Master/Slave(drbd)
> - grpDRBD : group(including 3 Dummy)
>
> and location setting is here;
>
> location rsc_location-1 msDRBD \
>        rule role=master  200: #uname eq bl460g1n13 \
>        rule role=master  100: #uname eq bl460g1n14
> colocation rsc_colocation-1 inf: grpDRBD msDRBD:Master
> order rsc_order-1 0: msDRBD:promote grpDRBD:start
>
>
> * Initial starting;
> ============
> Last updated: Tue Sep 13 22:09:17 2011
> Stack: Heartbeat
> Current DC: bl460g1n14 (22222222-2222-2222-2222-222222222222) -
> partition with quorum
> Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
> 2 Nodes configured, unknown expected votes
> 2 Resources configured.
> ============
>
> Online: [ bl460g1n13 bl460g1n14 ]
>
>  Resource Group: grpDRBD
>     dummy01    (ocf::pacemaker:Dummy): Started bl460g1n13
>     dummy02    (ocf::pacemaker:Dummy): Started bl460g1n13
>     dummy03    (ocf::pacemaker:Dummy): Started bl460g1n13
>  Master/Slave Set: msDRBD
>     Masters: [ bl460g1n13 ]
>     Slaves: [ bl460g1n14 ]
>
>
> * break dummy01;
> ============
> Last updated: Tue Sep 13 22:09:44 2011
> Stack: Heartbeat
> Current DC: bl460g1n14 (22222222-2222-2222-2222-222222222222) -
> partition with quorum
> Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
> 2 Nodes configured, unknown expected votes
> 2 Resources configured.
> ============
>
> Online: [ bl460g1n13 bl460g1n14 ]
>
>  Resource Group: grpDRBD
>     dummy01    (ocf::pacemaker:Dummy): Started bl460g1n13 FAILED
>     dummy02    (ocf::pacemaker:Dummy): Started bl460g1n13
>     dummy03    (ocf::pacemaker:Dummy): Stopped
>  Master/Slave Set: msDRBD
>     Masters: [ bl460g1n13 ]
>     Slaves: [ bl460g1n14 ]
>
> Failed actions:
>    dummy01_monitor_10000 (node=bl460g1n13, call=13, rc=7,
> status=complete): not running
>
>
> * grpDRBD can't failover...
> ============
> Last updated: Tue Sep 13 22:09:48 2011
> Stack: Heartbeat
> Current DC: bl460g1n14 (22222222-2222-2222-2222-222222222222) -
> partition with quorum
> Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
> 2 Nodes configured, unknown expected votes
> 2 Resources configured.
> ============
>
> Online: [ bl460g1n13 bl460g1n14 ]
>
>  Master/Slave Set: msDRBD
>     Masters: [ bl460g1n13 ]
>     Slaves: [ bl460g1n14 ]
>
> Failed actions:
>    dummy01_monitor_10000 (node=bl460g1n13, call=13, rc=7,
> status=complete): not running
>
>
> Please see the attached hb_report.
>
> I tried to reduce the primitive resource in group from 3 to 2,
> and grpDRBD can failover in this case.
>
> If dummy02 or dummy03 break down instead dummy01,
> grpDRMD can can failover, too.
>
> Master/Slave and group which has more than 3 resources won't work.
>
> Regards,
> Junko IKEDA
>
>
> NTT DATA INTELLILINK CORPORATION
>