[Pacemaker] the behavior of clone resource

Mon Apr 19 09:19:25 UTC 2010

I tried to look at this one (finally!) but the two PE files i need
(pe-input-7.bz2 and pe-input-8.bz2 from cspm01) are missing.
Very strange.

2010/3/18 Andrew Beekhof <andrew at beekhof.net>:
> 2010/3/16 Junko IKEDA <ikedaj at intellilink.co.jp>:
>> Hi,
>>
>> There is just a little strange clone behavior.
>> I found that;
>>
>> (1) start the group which contains three primitive resources,
>>          and clone set
>>
>> # crm_mon -1
>>
>> ============
>> Last updated: Tue Mar 16 21:39:10 2010
>> Stack: openais
>> Current DC: cspm01 - partition with quorum
>> Version: 1.0.8-a77303a7adce stable-1.0 tip
>> 4 Nodes configured, 4 expected votes
>> 2 Resources configured.
>> ============
>>
>> Online: [ cspm01 cspm02 cspm03 cspm04 ]
>>
>>       Resource Group: UMgroup01
>>           UmDummy01  (ocf::heartbeat:Dummy): Started cspm01
>>           UmDummy02  (ocf::heartbeat:Dummy): Started cspm01
>>           UmDummy03  (ocf::heartbeat:Dummy): Started cspm01
>>       Clone Set: clnUMgroup01
>>           Started: [ cspm01 cspm04 ]
>>
>> (2) edit Dummy RA to create clnUMgroup01 stop NG.
>>
>> # vim /usr/lib/ocf/resource.d/heartbeat/Dummy01
>> -----------------------------------------------
>> dummy_stop() {
>>          exit $OCF_ERR_GENERIC # intentional error
>>
>>          dummy_monitor
>>          if [ $? =  $OCF_SUCCESS ]; then
>>              rm ${OCF_RESKEY_state}
>>          fi
>>          return $OCF_SUCCESS
>> }
>> -----------------------------------------------
>>
>> (on cspm01)
>> # rm -f /var/run/heartbeat/rsctmp/Dummy-clnUMdummy01:0.state
>>
>> (3) check the status of each resources
>>
>> # crm_mon -1
>>
>> ============
>> Last updated: Tue Mar 16 21:40:11 2010
>> Stack: openais
>> Current DC: cspm01 - partition with quorum
>> Version: 1.0.8-a77303a7adce stable-1.0 tip
>> 4 Nodes configured, 4 expected votes
>> 2 Resources configured.
>> ============
>>
>> Online: [ cspm01 cspm02 cspm03 cspm04 ]
>>
>>       Clone Set: clnUMgroup01
>>           Resource Group: clnUmResource:0
>>               clnUMdummy01:0 (ocf::heartbeat:Dummy01):       Started cspm01
>> (unmanaged) FAILED
>>               clnUMdummy02:0 (ocf::heartbeat:Dummy02):       Stopped
>>           Started: [ cspm04 ]
>>
>> Failed actions:
>>          clnUMdummy01:0_monitor_10000 (node=cspm01, call=8, rc=7,
>> status=complete): not running
>>          clnUMdummy01:0_stop_0 (node=cspm01, call=18, rc=1,
>> status=complete):
>> unknown error
>>          UmDummy03_monitor_10000 (node=cspm01, call=16, rc=7,
>> status=complete):
>> not running
>>          UmDummy01_monitor_10000 (node=cspm01, call=12, rc=7,
>> status=complete):
>> not running
>>          clnUMdummy02:0_monitor_10000 (node=cspm01, call=10, rc=7,
>> status=complete): not running
>>
>>
>> In this case, clone instance on cspm04 keeps running.
>
> Which makes sense.  It has't failed, there's no reason to stop it.
>
>>
>> but when I added the other resource in group, like this;
>>
>> ============
>> Last updated: Tue Mar 16 21:53:26 2010
>> Stack: openais
>> Current DC: cspm01 - partition with quorum
>> Version: 1.0.8-a77303a7adce stable-1.0 tip
>> 4 Nodes configured, 4 expected votes
>> 2 Resources configured.
>> ============
>>
>> Online: [ cspm01 cspm02 cspm03 cspm04 ]
>>
>>       Resource Group: UMgroup01
>>           UmDummy01  (ocf::heartbeat:Dummy): Started cspm01
>>           UmDummy02  (ocf::heartbeat:Dummy): Started cspm01
>>           UmDummy03  (ocf::heartbeat:Dummy): Started cspm01
>>           UmDummy04  (ocf::heartbeat:Dummy): Started cspm01
>>       Clone Set: clnUMgroup01
>>           Started: [ cspm01 cspm04 ]
>>
>>
>> after the same error as the above,
>> the result of crm_mon was strange.
>>
>> ============
>> Last updated: Tue Mar 16 21:54:46 2010
>> Stack: openais
>> Current DC: cspm01 - partition with quorum
>> Version: 1.0.8-a77303a7adce stable-1.0 tip
>> 4 Nodes configured, 4 expected votes
>> 2 Resources configured.
>> ============
>>
>> Online: [ cspm01 cspm02 cspm03 cspm04 ]
>>
>>       Clone Set: clnUMgroup01
>>           Resource Group: clnUmResource:0
>>               clnUMdummy01:0 (ocf::heartbeat:Dummy01):       Started cspm01
>> (unmanaged) FAILED
>>               clnUMdummy02:0 (ocf::heartbeat:Dummy02):       Stopped
>>           Stopped: [ clnUmResource:1 ]
>>
>> Failed actions:
>>          clnUMdummy01:0_monitor_10000 (node=cspm01, call=9, rc=7,
>> status=complete): not running
>>          clnUMdummy01:0_stop_0 (node=cspm01, call=21, rc=1,
>> status=complete):
>> unknown error
>>
>>
>> In this case, clone instance on cspm04 was stopped.
>> I didn't change the rsc_colocation or order setting.
>> Which case is the expected?
>
> The first. You could be seeing a bug thats already fixed though.
> With 1.0.8 it wants to start the clone:
>
> [11:22 AM] beekhof at mobile ~/Development/pacemaker/stable-1.0 #
> pengine/ptest -VVV -x
> /Users/beekhof/Downloads/Dummy_x4/cspm01/pengine/pe-warn-2.bz2
> ptest[21686]: 2010/03/18_11:22:08 WARN: unpack_nodes: Blind faith: not
> fencing unseen nodes
> ptest[21686]: 2010/03/18_11:22:08 WARN: unpack_rsc_op: Processing
> failed op clnUMdummy01:0_monitor_10000 on cspm01: not running (7)
> ptest[21686]: 2010/03/18_11:22:08 WARN: unpack_rsc_op: Processing
> failed op clnUMdummy01:0_stop_0 on cspm01: unknown error (1)
> ptest[21686]: 2010/03/18_11:22:08 notice: group_print:  Resource
> Group: UMgroup01
> ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
> UmDummy01       (ocf::heartbeat:Dummy): Started cspm01
> ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
> UmDummy02       (ocf::heartbeat:Dummy): Started cspm01
> ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
> UmDummy03       (ocf::heartbeat:Dummy): Started cspm01
> ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
> UmDummy04       (ocf::heartbeat:Dummy): Started cspm01
> ptest[21686]: 2010/03/18_11:22:08 notice: clone_print:  Clone Set: clnUMgroup01
> ptest[21686]: 2010/03/18_11:22:08 notice: group_print:      Resource
> Group: clnUmResource:0
> ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
> clnUMdummy01:0  (ocf::heartbeat:Dummy01):       Started cspm01 (unmanaged)
> FAILED
> ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
> clnUMdummy02:0  (ocf::heartbeat:Dummy02):       Stopped
> ptest[21686]: 2010/03/18_11:22:08 notice: short_print:      Stopped: [
> clnUmResource:1 ]
> ptest[21686]: 2010/03/18_11:22:08 WARN: common_apply_stickiness:
> Forcing clnUMgroup01 away from cspm01 after 1000000 failures (max=10)
> ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp:  Start
> recurring monitor (10s) for UmDummy01 on cspm04
> ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp:  Start
> recurring monitor (10s) for UmDummy02 on cspm04
> ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp:  Start
> recurring monitor (10s) for UmDummy03 on cspm04
> ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp:  Start
> recurring monitor (10s) for UmDummy04 on cspm04
> ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp:  Start
> recurring monitor (10s) for clnUMdummy01:1 on cspm04
> ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp:  Start
> recurring monitor (10s) for clnUMdummy02:1 on cspm04
> ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource
> UmDummy01       (Started cspm01 -> cspm04)
> ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource
> UmDummy02       (Started cspm01 -> cspm04)
> ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource
> UmDummy03       (Started cspm01 -> cspm04)
> ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource
> UmDummy04       (Started cspm01 -> cspm04)
> ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Leave resource
> clnUMdummy01:0  (Started unmanaged)
> ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Leave resource
> clnUMdummy02:0  (Stopped)
> ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Start
> clnUMdummy01:1  (cspm04)
> ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Start
> clnUMdummy02:1  (cspm04)
>