[Pacemaker] Apparent delay in starting resource during migration
Phil Armstrong
pma at sgi.com
Tue Mar 22 14:35:47 UTC 2011
Hi,
I have 4 primitive resources that I was in the process of migrating from
one node (lotus) to the other (aston). I was using the 'crm resource
migrate' command (4 times). Normally this process works just fine, but
in this instance, one of the resources (copan_shelf_07) got 'stuck', in
that it didn't get started on the target node for about 15 minutes after
the others (04, 05, 06), and then it seems it was kicked off by
crm_timer_popped. I'm hoping someone can tell me why copan_shelf_07 took
so long to be started. I have the complete hb_report if that is
required. Here are the basics:
Corosync Cluster Engine, version '1.2.6' SVN revision 'exported'
Copyright (c) 2006-2009 Red Hat, Inc.
cluster-glue: 1.0.6 (ba146a145a3ede967af48e8936ac414984aa1e5f)
resource-agents: # Build version: d206bc8f13039b332e76a93a86e8e550b67781da
CRM Version: 1.1.2 (ecb1e2ea172ba2551f0bd763e557fccde68c849b)
pacemaker 1.1.2-0.7.1 - SUSE Linux Enterprise 11 x86_64
libpacemaker3 1.1.2-0.7.1 - SUSE Linux Enterprise 11 x86_64
openais 1.1.3-0.2.3 - SUSE Linux Enterprise 11 x86_64
corosync 1.2.6-0.2.2 - SUSE Linux Enterprise 11 x86_64
libcorosync4 1.2.6-0.2.2 - SUSE Linux Enterprise 11 x86_64
resource-agents 1.0.3-0.4.8 - SUSE Linux Enterprise 11 x86_64
cluster-glue 1.0.6-0.3.7 - SUSE Linux Enterprise 11 x86_64
libglue2 1.0.6-0.3.7 - SUSE Linux Enterprise 11 x86_64
ldirectord 1.0.3-0.4.8 - SUSE Linux Enterprise 11 x86_64
Here are the copan_shelf_07 syslog messages:
Mar 22 07:45:07 aston pengine: [10961]: notice: LogActions: Leave
resource copan_shelf_07 (Started lotus)
Mar 22 07:45:08 aston cib: [10958]: info: log_data_element: cib:diff:
+ <rsc_location id="cli-prefer-copan_shelf_07"
rsc="copan_shelf_07" __crm_diff_marker__="added:top" >
Mar 22 07:45:08 aston cib: [10958]: info: log_data_element: cib:diff:
+ <rule id="cli-prefer-rule-copan_shelf_07" score="INFINITY"
boolean-op="and" >
Mar 22 07:45:08 aston cib: [10958]: info: log_data_element: cib:diff:
+ <expression id="cli-prefer-expr-copan_shelf_07"
attribute="#uname" operation="eq" value="aston" type="string" />
Mar 22 07:45:10 aston pengine: [10961]: notice: native_print:
copan_shelf_07 (ocf::sgi:copan_ov_client): Started lotus
Mar 22 07:45:10 aston pengine: [10961]: notice: RecurringOp: Start
recurring monitor (30s) for copan_shelf_07 on aston
Mar 22 07:45:10 aston pengine: [10961]: notice: LogActions: Move
resource copan_shelf_07 (Started lotus -> aston)
Mar 22 07:45:10 aston pengine: [10961]: notice: native_print:
copan_shelf_07 (ocf::sgi:copan_ov_client): Started lotus
Mar 22 07:45:10 aston pengine: [10961]: notice: RecurringOp: Start
recurring monitor (30s) for copan_shelf_07 on aston
Mar 22 07:45:10 aston pengine: [10961]: notice: LogActions: Move
resource copan_shelf_07 (Started lotus -> aston)
Mar 22 07:45:10 aston crmd: [10962]: info: te_rsc_command: Initiating
action 17: stop copan_shelf_07_stop_0 on lotus
Mar 22 07:45:12 aston crmd: [10962]: WARN: status_from_rc: Action 17
(copan_shelf_07_stop_0) on lotus failed (target: 0 vs. rc: 1): Error
Mar 22 07:45:12 aston crmd: [10962]: WARN: update_failcount: Updating
failcount for copan_shelf_07 on lotus after failed stop: rc=1
(update=INFINITY, time=1300797912)
Mar 22 07:45:12 aston crmd: [10962]: info: abort_transition_graph:
match_graph_event:276 - Triggered transition abort (complete=0,
tag=lrm_rsc_op, id=copan_shelf_07_stop_0,
magic=0:1;17:509:0:cebe491c-0397-4ee8-bdd0-ae6ccd1e5c7d, cib=0.920.5) :
Event failed
Mar 22 07:45:12 aston crmd: [10962]: info: match_graph_event: Action
copan_shelf_07_stop_0 (17) confirmed on lotus (rc=4)
Mar 22 07:45:17 aston pengine: [10961]: WARN: unpack_rsc_op: Processing
failed op copan_shelf_07_stop_0 on lotus: unknown error (1)
Mar 22 07:45:17 aston pengine: [10961]: ERROR: unpack_rsc_op: Making
sure copan_shelf_07 doesn't come up again
Mar 22 07:45:17 aston pengine: [10961]: notice: native_print:
copan_shelf_07 (ocf::sgi:copan_ov_client): Started lotus FAILED
Mar 22 07:45:17 aston pengine: [10961]: info: get_failcount:
copan_shelf_07 has failed INFINITY times on lotus
Mar 22 07:45:17 aston pengine: [10961]: WARN: common_apply_stickiness:
Forcing copan_shelf_07 away from lotus after 1000000 failures (max=1)
Mar 22 07:45:17 aston pengine: [10961]: info: rsc_merge_weights:
cxfs-client-clone: Rolling back scores from copan_shelf_07
Mar 22 07:45:17 aston pengine: [10961]: info: native_color: Resource
copan_shelf_07 cannot run anywhere
Mar 22 07:45:17 aston pengine: [10961]: notice: LogActions: Stop
resource copan_shelf_07 (lotus)
Mar 22 07:45:17 aston pengine: [10961]: WARN: unpack_rsc_op: Processing
failed op copan_shelf_07_stop_0 on lotus: unknown error (1)
Mar 22 07:45:17 aston pengine: [10961]: ERROR: unpack_rsc_op: Making
sure copan_shelf_07 doesn't come up again
Mar 22 07:45:17 aston pengine: [10961]: notice: native_print:
copan_shelf_07 (ocf::sgi:copan_ov_client): Started lotus FAILED
Mar 22 07:45:17 aston pengine: [10961]: info: get_failcount:
copan_shelf_07 has failed INFINITY times on lotus
Mar 22 07:45:17 aston pengine: [10961]: WARN: common_apply_stickiness:
Forcing copan_shelf_07 away from lotus after 1000000 failures (max=1)
Mar 22 07:45:17 aston pengine: [10961]: info: rsc_merge_weights:
cxfs-client-clone: Rolling back scores from copan_shelf_07
Mar 22 07:45:17 aston pengine: [10961]: info: native_color: Resource
copan_shelf_07 cannot run anywhere
Mar 22 07:45:17 aston pengine: [10961]: notice: LogActions: Stop
resource copan_shelf_07 (lotus)
Mar 22 07:45:18 aston crmd: [10962]: info: te_rsc_command: Initiating
action 11: stop copan_shelf_07_stop_0 on lotus
Mar 22 07:45:19 aston crmd: [10962]: WARN: status_from_rc: Action 11
(copan_shelf_07_stop_0) on lotus failed (target: 0 vs. rc: 1): Error
Mar 22 07:45:19 aston crmd: [10962]: WARN: update_failcount: Updating
failcount for copan_shelf_07 on lotus after failed stop: rc=1
(update=INFINITY, time=1300797919)
Mar 22 07:45:19 aston crmd: [10962]: info: abort_transition_graph:
match_graph_event:276 - Triggered transition abort (complete=0,
tag=lrm_rsc_op, id=copan_shelf_07_stop_0,
magic=0:1;11:511:0:cebe491c-0397-4ee8-bdd0-ae6ccd1e5c7d, cib=0.920.16) :
Event failed
Mar 22 07:45:19 aston crmd: [10962]: info: match_graph_event: Action
copan_shelf_07_stop_0 (11) confirmed on lotus (rc=4)
Mar 22 07:45:22 aston pengine: [10961]: WARN: unpack_rsc_op: Processing
failed op copan_shelf_07_stop_0 on lotus: unknown error (1)
Mar 22 07:45:22 aston pengine: [10961]: ERROR: unpack_rsc_op: Making
sure copan_shelf_07 doesn't come up again
Mar 22 07:45:22 aston pengine: [10961]: notice: native_print:
copan_shelf_07 (ocf::sgi:copan_ov_client): Started lotus FAILED
Mar 22 07:45:22 aston pengine: [10961]: info: get_failcount:
copan_shelf_07 has failed INFINITY times on lotus
Mar 22 07:45:22 aston pengine: [10961]: WARN: common_apply_stickiness:
Forcing copan_shelf_07 away from lotus after 1000000 failures (max=1)
Mar 22 07:45:22 aston pengine: [10961]: info: rsc_merge_weights:
cxfs-client-clone: Rolling back scores from copan_shelf_07
Mar 22 07:45:22 aston pengine: [10961]: info: native_color: Resource
copan_shelf_07 cannot run anywhere
Mar 22 07:45:22 aston pengine: [10961]: notice: LogActions: Stop
resource copan_shelf_07 (lotus)
Mar 22 07:45:23 aston crmd: [10962]: info: te_rsc_command: Initiating
action 14: stop copan_shelf_07_stop_0 on lotus
Mar 22 07:45:24 aston crmd: [10962]: info: match_graph_event: Action
copan_shelf_07_stop_0 (14) confirmed on lotus (rc=0)
Mar 22 08:00:25 aston pengine: [10961]: notice: native_print:
copan_shelf_07 (ocf::sgi:copan_ov_client): Stopped
Mar 22 08:00:25 aston pengine: [10961]: info: get_failcount:
copan_shelf_07 has failed INFINITY times on lotus
Mar 22 08:00:25 aston pengine: [10961]: WARN: common_apply_stickiness:
Forcing copan_shelf_07 away from lotus after 1000000 failures (max=1)
Mar 22 08:00:25 aston pengine: [10961]: notice: RecurringOp: Start
recurring monitor (30s) for copan_shelf_07 on aston
Mar 22 08:00:25 aston pengine: [10961]: notice: LogActions: Start
copan_shelf_07 (aston)
Mar 22 08:00:25 aston crmd: [10962]: info: te_rsc_command: Initiating
action 16: start copan_shelf_07_start_0 on aston (local)
Mar 22 08:00:25 aston crmd: [10962]: info: do_lrm_rsc_op: Performing
key=16:513:0:cebe491c-0397-4ee8-bdd0-ae6ccd1e5c7d
op=copan_shelf_07_start_0 )
Mar 22 08:00:25 aston lrmd: [10959]: debug: on_msg_perform_op:2359:
copying parameters for rsc copan_shelf_07
Mar 22 08:00:25 aston lrmd: [10959]: debug: on_msg_perform_op: add an
operation operation start[142] on ocf::copan_ov_client::copan_shelf_07
for client 10962, its parameters: CRM_meta_name=[start]
crm_feature_set=[3.0.2] CRM_meta_on_fail=[restart]
CRM_meta_timeout=[60000] shelf_name=[c07] to the operation list.
Mar 22 08:00:25 aston lrmd: [10959]: info: rsc:copan_shelf_07:142: start
Thanks,
Phil
--
Phil Armstrong pma at sgi.com
Phone: 651-683-5561 VNET 233-5561
More information about the Pacemaker
mailing list