[Pacemaker] the behavior of clone resource
Andrew Beekhof
andrew at beekhof.net
Thu Mar 18 10:22:59 UTC 2010
2010/3/16 Junko IKEDA <ikedaj at intellilink.co.jp>:
> Hi,
>
> There is just a little strange clone behavior.
> I found that;
>
> (1) start the group which contains three primitive resources,
> and clone set
>
> # crm_mon -1
>
> ============
> Last updated: Tue Mar 16 21:39:10 2010
> Stack: openais
> Current DC: cspm01 - partition with quorum
> Version: 1.0.8-a77303a7adce stable-1.0 tip
> 4 Nodes configured, 4 expected votes
> 2 Resources configured.
> ============
>
> Online: [ cspm01 cspm02 cspm03 cspm04 ]
>
> Resource Group: UMgroup01
> UmDummy01 (ocf::heartbeat:Dummy): Started cspm01
> UmDummy02 (ocf::heartbeat:Dummy): Started cspm01
> UmDummy03 (ocf::heartbeat:Dummy): Started cspm01
> Clone Set: clnUMgroup01
> Started: [ cspm01 cspm04 ]
>
> (2) edit Dummy RA to create clnUMgroup01 stop NG.
>
> # vim /usr/lib/ocf/resource.d/heartbeat/Dummy01
> -----------------------------------------------
> dummy_stop() {
> exit $OCF_ERR_GENERIC # intentional error
>
> dummy_monitor
> if [ $? = $OCF_SUCCESS ]; then
> rm ${OCF_RESKEY_state}
> fi
> return $OCF_SUCCESS
> }
> -----------------------------------------------
>
> (on cspm01)
> # rm -f /var/run/heartbeat/rsctmp/Dummy-clnUMdummy01:0.state
>
> (3) check the status of each resources
>
> # crm_mon -1
>
> ============
> Last updated: Tue Mar 16 21:40:11 2010
> Stack: openais
> Current DC: cspm01 - partition with quorum
> Version: 1.0.8-a77303a7adce stable-1.0 tip
> 4 Nodes configured, 4 expected votes
> 2 Resources configured.
> ============
>
> Online: [ cspm01 cspm02 cspm03 cspm04 ]
>
> Clone Set: clnUMgroup01
> Resource Group: clnUmResource:0
> clnUMdummy01:0 (ocf::heartbeat:Dummy01): Started cspm01
> (unmanaged) FAILED
> clnUMdummy02:0 (ocf::heartbeat:Dummy02): Stopped
> Started: [ cspm04 ]
>
> Failed actions:
> clnUMdummy01:0_monitor_10000 (node=cspm01, call=8, rc=7,
> status=complete): not running
> clnUMdummy01:0_stop_0 (node=cspm01, call=18, rc=1,
> status=complete):
> unknown error
> UmDummy03_monitor_10000 (node=cspm01, call=16, rc=7,
> status=complete):
> not running
> UmDummy01_monitor_10000 (node=cspm01, call=12, rc=7,
> status=complete):
> not running
> clnUMdummy02:0_monitor_10000 (node=cspm01, call=10, rc=7,
> status=complete): not running
>
>
> In this case, clone instance on cspm04 keeps running.
Which makes sense. It has't failed, there's no reason to stop it.
>
> but when I added the other resource in group, like this;
>
> ============
> Last updated: Tue Mar 16 21:53:26 2010
> Stack: openais
> Current DC: cspm01 - partition with quorum
> Version: 1.0.8-a77303a7adce stable-1.0 tip
> 4 Nodes configured, 4 expected votes
> 2 Resources configured.
> ============
>
> Online: [ cspm01 cspm02 cspm03 cspm04 ]
>
> Resource Group: UMgroup01
> UmDummy01 (ocf::heartbeat:Dummy): Started cspm01
> UmDummy02 (ocf::heartbeat:Dummy): Started cspm01
> UmDummy03 (ocf::heartbeat:Dummy): Started cspm01
> UmDummy04 (ocf::heartbeat:Dummy): Started cspm01
> Clone Set: clnUMgroup01
> Started: [ cspm01 cspm04 ]
>
>
> after the same error as the above,
> the result of crm_mon was strange.
>
> ============
> Last updated: Tue Mar 16 21:54:46 2010
> Stack: openais
> Current DC: cspm01 - partition with quorum
> Version: 1.0.8-a77303a7adce stable-1.0 tip
> 4 Nodes configured, 4 expected votes
> 2 Resources configured.
> ============
>
> Online: [ cspm01 cspm02 cspm03 cspm04 ]
>
> Clone Set: clnUMgroup01
> Resource Group: clnUmResource:0
> clnUMdummy01:0 (ocf::heartbeat:Dummy01): Started cspm01
> (unmanaged) FAILED
> clnUMdummy02:0 (ocf::heartbeat:Dummy02): Stopped
> Stopped: [ clnUmResource:1 ]
>
> Failed actions:
> clnUMdummy01:0_monitor_10000 (node=cspm01, call=9, rc=7,
> status=complete): not running
> clnUMdummy01:0_stop_0 (node=cspm01, call=21, rc=1,
> status=complete):
> unknown error
>
>
> In this case, clone instance on cspm04 was stopped.
> I didn't change the rsc_colocation or order setting.
> Which case is the expected?
The first. You could be seeing a bug thats already fixed though.
With 1.0.8 it wants to start the clone:
[11:22 AM] beekhof at mobile ~/Development/pacemaker/stable-1.0 #
pengine/ptest -VVV -x
/Users/beekhof/Downloads/Dummy_x4/cspm01/pengine/pe-warn-2.bz2
ptest[21686]: 2010/03/18_11:22:08 WARN: unpack_nodes: Blind faith: not
fencing unseen nodes
ptest[21686]: 2010/03/18_11:22:08 WARN: unpack_rsc_op: Processing
failed op clnUMdummy01:0_monitor_10000 on cspm01: not running (7)
ptest[21686]: 2010/03/18_11:22:08 WARN: unpack_rsc_op: Processing
failed op clnUMdummy01:0_stop_0 on cspm01: unknown error (1)
ptest[21686]: 2010/03/18_11:22:08 notice: group_print: Resource
Group: UMgroup01
ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
UmDummy01 (ocf::heartbeat:Dummy): Started cspm01
ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
UmDummy02 (ocf::heartbeat:Dummy): Started cspm01
ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
UmDummy03 (ocf::heartbeat:Dummy): Started cspm01
ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
UmDummy04 (ocf::heartbeat:Dummy): Started cspm01
ptest[21686]: 2010/03/18_11:22:08 notice: clone_print: Clone Set: clnUMgroup01
ptest[21686]: 2010/03/18_11:22:08 notice: group_print: Resource
Group: clnUmResource:0
ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
clnUMdummy01:0 (ocf::heartbeat:Dummy01): Started cspm01 (unmanaged)
FAILED
ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
clnUMdummy02:0 (ocf::heartbeat:Dummy02): Stopped
ptest[21686]: 2010/03/18_11:22:08 notice: short_print: Stopped: [
clnUmResource:1 ]
ptest[21686]: 2010/03/18_11:22:08 WARN: common_apply_stickiness:
Forcing clnUMgroup01 away from cspm01 after 1000000 failures (max=10)
ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start
recurring monitor (10s) for UmDummy01 on cspm04
ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start
recurring monitor (10s) for UmDummy02 on cspm04
ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start
recurring monitor (10s) for UmDummy03 on cspm04
ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start
recurring monitor (10s) for UmDummy04 on cspm04
ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start
recurring monitor (10s) for clnUMdummy01:1 on cspm04
ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start
recurring monitor (10s) for clnUMdummy02:1 on cspm04
ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource
UmDummy01 (Started cspm01 -> cspm04)
ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource
UmDummy02 (Started cspm01 -> cspm04)
ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource
UmDummy03 (Started cspm01 -> cspm04)
ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource
UmDummy04 (Started cspm01 -> cspm04)
ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Leave resource
clnUMdummy01:0 (Started unmanaged)
ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Leave resource
clnUMdummy02:0 (Stopped)
ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Start
clnUMdummy01:1 (cspm04)
ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Start
clnUMdummy02:1 (cspm04)
More information about the Pacemaker
mailing list