[Pacemaker] the behavior of clone resource

Thu Mar 18 10:22:59 UTC 2010

2010/3/16 Junko IKEDA <ikedaj at intellilink.co.jp>:
> Hi,
>
> There is just a little strange clone behavior.
> I found that;
>
> (1) start the group which contains three primitive resources,
>          and clone set
>
> # crm_mon -1
>
> ============
> Last updated: Tue Mar 16 21:39:10 2010
> Stack: openais
> Current DC: cspm01 - partition with quorum
> Version: 1.0.8-a77303a7adce stable-1.0 tip
> 4 Nodes configured, 4 expected votes
> 2 Resources configured.
> ============
>
> Online: [ cspm01 cspm02 cspm03 cspm04 ]
>
>       Resource Group: UMgroup01
>           UmDummy01  (ocf::heartbeat:Dummy): Started cspm01
>           UmDummy02  (ocf::heartbeat:Dummy): Started cspm01
>           UmDummy03  (ocf::heartbeat:Dummy): Started cspm01
>       Clone Set: clnUMgroup01
>           Started: [ cspm01 cspm04 ]
>
> (2) edit Dummy RA to create clnUMgroup01 stop NG.
>
> # vim /usr/lib/ocf/resource.d/heartbeat/Dummy01
> -----------------------------------------------
> dummy_stop() {
>          exit $OCF_ERR_GENERIC # intentional error
>
>          dummy_monitor
>          if [ $? =  $OCF_SUCCESS ]; then
>              rm ${OCF_RESKEY_state}
>          fi
>          return $OCF_SUCCESS
> }
> -----------------------------------------------
>
> (on cspm01)
> # rm -f /var/run/heartbeat/rsctmp/Dummy-clnUMdummy01:0.state
>
> (3) check the status of each resources
>
> # crm_mon -1
>
> ============
> Last updated: Tue Mar 16 21:40:11 2010
> Stack: openais
> Current DC: cspm01 - partition with quorum
> Version: 1.0.8-a77303a7adce stable-1.0 tip
> 4 Nodes configured, 4 expected votes
> 2 Resources configured.
> ============
>
> Online: [ cspm01 cspm02 cspm03 cspm04 ]
>
>       Clone Set: clnUMgroup01
>           Resource Group: clnUmResource:0
>               clnUMdummy01:0 (ocf::heartbeat:Dummy01):       Started cspm01
> (unmanaged) FAILED
>               clnUMdummy02:0 (ocf::heartbeat:Dummy02):       Stopped
>           Started: [ cspm04 ]
>
> Failed actions:
>          clnUMdummy01:0_monitor_10000 (node=cspm01, call=8, rc=7,
> status=complete): not running
>          clnUMdummy01:0_stop_0 (node=cspm01, call=18, rc=1,
> status=complete):
> unknown error
>          UmDummy03_monitor_10000 (node=cspm01, call=16, rc=7,
> status=complete):
> not running
>          UmDummy01_monitor_10000 (node=cspm01, call=12, rc=7,
> status=complete):
> not running
>          clnUMdummy02:0_monitor_10000 (node=cspm01, call=10, rc=7,
> status=complete): not running
>
>
> In this case, clone instance on cspm04 keeps running.

Which makes sense.  It has't failed, there's no reason to stop it.

>
> but when I added the other resource in group, like this;
>
> ============
> Last updated: Tue Mar 16 21:53:26 2010
> Stack: openais
> Current DC: cspm01 - partition with quorum
> Version: 1.0.8-a77303a7adce stable-1.0 tip
> 4 Nodes configured, 4 expected votes
> 2 Resources configured.
> ============
>
> Online: [ cspm01 cspm02 cspm03 cspm04 ]
>
>       Resource Group: UMgroup01
>           UmDummy01  (ocf::heartbeat:Dummy): Started cspm01
>           UmDummy02  (ocf::heartbeat:Dummy): Started cspm01
>           UmDummy03  (ocf::heartbeat:Dummy): Started cspm01
>           UmDummy04  (ocf::heartbeat:Dummy): Started cspm01
>       Clone Set: clnUMgroup01
>           Started: [ cspm01 cspm04 ]
>
>
> after the same error as the above,
> the result of crm_mon was strange.
>
> ============
> Last updated: Tue Mar 16 21:54:46 2010
> Stack: openais
> Current DC: cspm01 - partition with quorum
> Version: 1.0.8-a77303a7adce stable-1.0 tip
> 4 Nodes configured, 4 expected votes
> 2 Resources configured.
> ============
>
> Online: [ cspm01 cspm02 cspm03 cspm04 ]
>
>       Clone Set: clnUMgroup01
>           Resource Group: clnUmResource:0
>               clnUMdummy01:0 (ocf::heartbeat:Dummy01):       Started cspm01
> (unmanaged) FAILED
>               clnUMdummy02:0 (ocf::heartbeat:Dummy02):       Stopped
>           Stopped: [ clnUmResource:1 ]
>
> Failed actions:
>          clnUMdummy01:0_monitor_10000 (node=cspm01, call=9, rc=7,
> status=complete): not running
>          clnUMdummy01:0_stop_0 (node=cspm01, call=21, rc=1,
> status=complete):
> unknown error
>
>
> In this case, clone instance on cspm04 was stopped.
> I didn't change the rsc_colocation or order setting.
> Which case is the expected?

The first. You could be seeing a bug thats already fixed though.
With 1.0.8 it wants to start the clone:

[11:22 AM] beekhof at mobile ~/Development/pacemaker/stable-1.0 #
pengine/ptest -VVV -x
/Users/beekhof/Downloads/Dummy_x4/cspm01/pengine/pe-warn-2.bz2
ptest[21686]: 2010/03/18_11:22:08 WARN: unpack_nodes: Blind faith: not
fencing unseen nodes
ptest[21686]: 2010/03/18_11:22:08 WARN: unpack_rsc_op: Processing
failed op clnUMdummy01:0_monitor_10000 on cspm01: not running (7)
ptest[21686]: 2010/03/18_11:22:08 WARN: unpack_rsc_op: Processing
failed op clnUMdummy01:0_stop_0 on cspm01: unknown error (1)
ptest[21686]: 2010/03/18_11:22:08 notice: group_print:  Resource
Group: UMgroup01
ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
UmDummy01	(ocf::heartbeat:Dummy):	Started cspm01
ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
UmDummy02	(ocf::heartbeat:Dummy):	Started cspm01
ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
UmDummy03	(ocf::heartbeat:Dummy):	Started cspm01
ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
UmDummy04	(ocf::heartbeat:Dummy):	Started cspm01
ptest[21686]: 2010/03/18_11:22:08 notice: clone_print:  Clone Set: clnUMgroup01
ptest[21686]: 2010/03/18_11:22:08 notice: group_print:      Resource
Group: clnUmResource:0
ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
clnUMdummy01:0	(ocf::heartbeat:Dummy01):	Started cspm01 (unmanaged)
FAILED
ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
clnUMdummy02:0	(ocf::heartbeat:Dummy02):	Stopped
ptest[21686]: 2010/03/18_11:22:08 notice: short_print:      Stopped: [
clnUmResource:1 ]
ptest[21686]: 2010/03/18_11:22:08 WARN: common_apply_stickiness:
Forcing clnUMgroup01 away from cspm01 after 1000000 failures (max=10)
ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp:  Start
recurring monitor (10s) for UmDummy01 on cspm04
ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp:  Start
recurring monitor (10s) for UmDummy02 on cspm04
ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp:  Start
recurring monitor (10s) for UmDummy03 on cspm04
ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp:  Start
recurring monitor (10s) for UmDummy04 on cspm04
ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp:  Start
recurring monitor (10s) for clnUMdummy01:1 on cspm04
ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp:  Start
recurring monitor (10s) for clnUMdummy02:1 on cspm04
ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource
UmDummy01	(Started cspm01 -> cspm04)
ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource
UmDummy02	(Started cspm01 -> cspm04)
ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource
UmDummy03	(Started cspm01 -> cspm04)
ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource
UmDummy04	(Started cspm01 -> cspm04)
ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Leave resource
clnUMdummy01:0	(Started unmanaged)
ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Leave resource
clnUMdummy02:0	(Stopped)
ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Start
clnUMdummy01:1	(cspm04)
ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Start
clnUMdummy02:1	(cspm04)