[Pacemaker] the behavior of clone resource
Andrew Beekhof
andrew at beekhof.net
Mon Apr 19 09:19:25 UTC 2010
I tried to look at this one (finally!) but the two PE files i need
(pe-input-7.bz2 and pe-input-8.bz2 from cspm01) are missing.
Very strange.
2010/3/18 Andrew Beekhof <andrew at beekhof.net>:
> 2010/3/16 Junko IKEDA <ikedaj at intellilink.co.jp>:
>> Hi,
>>
>> There is just a little strange clone behavior.
>> I found that;
>>
>> (1) start the group which contains three primitive resources,
>> and clone set
>>
>> # crm_mon -1
>>
>> ============
>> Last updated: Tue Mar 16 21:39:10 2010
>> Stack: openais
>> Current DC: cspm01 - partition with quorum
>> Version: 1.0.8-a77303a7adce stable-1.0 tip
>> 4 Nodes configured, 4 expected votes
>> 2 Resources configured.
>> ============
>>
>> Online: [ cspm01 cspm02 cspm03 cspm04 ]
>>
>> Resource Group: UMgroup01
>> UmDummy01 (ocf::heartbeat:Dummy): Started cspm01
>> UmDummy02 (ocf::heartbeat:Dummy): Started cspm01
>> UmDummy03 (ocf::heartbeat:Dummy): Started cspm01
>> Clone Set: clnUMgroup01
>> Started: [ cspm01 cspm04 ]
>>
>> (2) edit Dummy RA to create clnUMgroup01 stop NG.
>>
>> # vim /usr/lib/ocf/resource.d/heartbeat/Dummy01
>> -----------------------------------------------
>> dummy_stop() {
>> exit $OCF_ERR_GENERIC # intentional error
>>
>> dummy_monitor
>> if [ $? = $OCF_SUCCESS ]; then
>> rm ${OCF_RESKEY_state}
>> fi
>> return $OCF_SUCCESS
>> }
>> -----------------------------------------------
>>
>> (on cspm01)
>> # rm -f /var/run/heartbeat/rsctmp/Dummy-clnUMdummy01:0.state
>>
>> (3) check the status of each resources
>>
>> # crm_mon -1
>>
>> ============
>> Last updated: Tue Mar 16 21:40:11 2010
>> Stack: openais
>> Current DC: cspm01 - partition with quorum
>> Version: 1.0.8-a77303a7adce stable-1.0 tip
>> 4 Nodes configured, 4 expected votes
>> 2 Resources configured.
>> ============
>>
>> Online: [ cspm01 cspm02 cspm03 cspm04 ]
>>
>> Clone Set: clnUMgroup01
>> Resource Group: clnUmResource:0
>> clnUMdummy01:0 (ocf::heartbeat:Dummy01): Started cspm01
>> (unmanaged) FAILED
>> clnUMdummy02:0 (ocf::heartbeat:Dummy02): Stopped
>> Started: [ cspm04 ]
>>
>> Failed actions:
>> clnUMdummy01:0_monitor_10000 (node=cspm01, call=8, rc=7,
>> status=complete): not running
>> clnUMdummy01:0_stop_0 (node=cspm01, call=18, rc=1,
>> status=complete):
>> unknown error
>> UmDummy03_monitor_10000 (node=cspm01, call=16, rc=7,
>> status=complete):
>> not running
>> UmDummy01_monitor_10000 (node=cspm01, call=12, rc=7,
>> status=complete):
>> not running
>> clnUMdummy02:0_monitor_10000 (node=cspm01, call=10, rc=7,
>> status=complete): not running
>>
>>
>> In this case, clone instance on cspm04 keeps running.
>
> Which makes sense. It has't failed, there's no reason to stop it.
>
>>
>> but when I added the other resource in group, like this;
>>
>> ============
>> Last updated: Tue Mar 16 21:53:26 2010
>> Stack: openais
>> Current DC: cspm01 - partition with quorum
>> Version: 1.0.8-a77303a7adce stable-1.0 tip
>> 4 Nodes configured, 4 expected votes
>> 2 Resources configured.
>> ============
>>
>> Online: [ cspm01 cspm02 cspm03 cspm04 ]
>>
>> Resource Group: UMgroup01
>> UmDummy01 (ocf::heartbeat:Dummy): Started cspm01
>> UmDummy02 (ocf::heartbeat:Dummy): Started cspm01
>> UmDummy03 (ocf::heartbeat:Dummy): Started cspm01
>> UmDummy04 (ocf::heartbeat:Dummy): Started cspm01
>> Clone Set: clnUMgroup01
>> Started: [ cspm01 cspm04 ]
>>
>>
>> after the same error as the above,
>> the result of crm_mon was strange.
>>
>> ============
>> Last updated: Tue Mar 16 21:54:46 2010
>> Stack: openais
>> Current DC: cspm01 - partition with quorum
>> Version: 1.0.8-a77303a7adce stable-1.0 tip
>> 4 Nodes configured, 4 expected votes
>> 2 Resources configured.
>> ============
>>
>> Online: [ cspm01 cspm02 cspm03 cspm04 ]
>>
>> Clone Set: clnUMgroup01
>> Resource Group: clnUmResource:0
>> clnUMdummy01:0 (ocf::heartbeat:Dummy01): Started cspm01
>> (unmanaged) FAILED
>> clnUMdummy02:0 (ocf::heartbeat:Dummy02): Stopped
>> Stopped: [ clnUmResource:1 ]
>>
>> Failed actions:
>> clnUMdummy01:0_monitor_10000 (node=cspm01, call=9, rc=7,
>> status=complete): not running
>> clnUMdummy01:0_stop_0 (node=cspm01, call=21, rc=1,
>> status=complete):
>> unknown error
>>
>>
>> In this case, clone instance on cspm04 was stopped.
>> I didn't change the rsc_colocation or order setting.
>> Which case is the expected?
>
> The first. You could be seeing a bug thats already fixed though.
> With 1.0.8 it wants to start the clone:
>
> [11:22 AM] beekhof at mobile ~/Development/pacemaker/stable-1.0 #
> pengine/ptest -VVV -x
> /Users/beekhof/Downloads/Dummy_x4/cspm01/pengine/pe-warn-2.bz2
> ptest[21686]: 2010/03/18_11:22:08 WARN: unpack_nodes: Blind faith: not
> fencing unseen nodes
> ptest[21686]: 2010/03/18_11:22:08 WARN: unpack_rsc_op: Processing
> failed op clnUMdummy01:0_monitor_10000 on cspm01: not running (7)
> ptest[21686]: 2010/03/18_11:22:08 WARN: unpack_rsc_op: Processing
> failed op clnUMdummy01:0_stop_0 on cspm01: unknown error (1)
> ptest[21686]: 2010/03/18_11:22:08 notice: group_print: Resource
> Group: UMgroup01
> ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
> UmDummy01 (ocf::heartbeat:Dummy): Started cspm01
> ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
> UmDummy02 (ocf::heartbeat:Dummy): Started cspm01
> ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
> UmDummy03 (ocf::heartbeat:Dummy): Started cspm01
> ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
> UmDummy04 (ocf::heartbeat:Dummy): Started cspm01
> ptest[21686]: 2010/03/18_11:22:08 notice: clone_print: Clone Set: clnUMgroup01
> ptest[21686]: 2010/03/18_11:22:08 notice: group_print: Resource
> Group: clnUmResource:0
> ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
> clnUMdummy01:0 (ocf::heartbeat:Dummy01): Started cspm01 (unmanaged)
> FAILED
> ptest[21686]: 2010/03/18_11:22:08 notice: native_print:
> clnUMdummy02:0 (ocf::heartbeat:Dummy02): Stopped
> ptest[21686]: 2010/03/18_11:22:08 notice: short_print: Stopped: [
> clnUmResource:1 ]
> ptest[21686]: 2010/03/18_11:22:08 WARN: common_apply_stickiness:
> Forcing clnUMgroup01 away from cspm01 after 1000000 failures (max=10)
> ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start
> recurring monitor (10s) for UmDummy01 on cspm04
> ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start
> recurring monitor (10s) for UmDummy02 on cspm04
> ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start
> recurring monitor (10s) for UmDummy03 on cspm04
> ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start
> recurring monitor (10s) for UmDummy04 on cspm04
> ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start
> recurring monitor (10s) for clnUMdummy01:1 on cspm04
> ptest[21686]: 2010/03/18_11:22:08 notice: RecurringOp: Start
> recurring monitor (10s) for clnUMdummy02:1 on cspm04
> ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource
> UmDummy01 (Started cspm01 -> cspm04)
> ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource
> UmDummy02 (Started cspm01 -> cspm04)
> ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource
> UmDummy03 (Started cspm01 -> cspm04)
> ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Move resource
> UmDummy04 (Started cspm01 -> cspm04)
> ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Leave resource
> clnUMdummy01:0 (Started unmanaged)
> ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Leave resource
> clnUMdummy02:0 (Stopped)
> ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Start
> clnUMdummy01:1 (cspm04)
> ptest[21686]: 2010/03/18_11:22:08 notice: LogActions: Start
> clnUMdummy02:1 (cspm04)
>
More information about the Pacemaker
mailing list