[Pacemaker] Critical: Monitor operation of IPaddr2 timing out, taking more than 60s. Fails to recover.
Parshvi
parshvi.17 at gmail.com
Thu Aug 9 05:44:19 UTC 2012
Parshvi <parshvi.17 at ...> writes:
>
> Hi,
>
> The monitor operation of IPaddr2 rsc agent is timing out.
> Interval: 5s
> Timeout: 60s
> The timeout was increased from an earlier 20s to now 60s. Even then, there are
> multiple logs of monitor op. timing out.
>
> 1) What can cause the monitor to take so long ?
> 2) Looking at the pe-input, what contributes to the operation time ? Is it
just
> the exec-time or exec-time + queue-time ?
> 3) Any solution proposed ?
>
> I have lrm pe-input when the timeout was configured at 20s:
> Here, is pe-input snapshot where monitor op. timed out (with timeout=20s)
>
> <lrm_resource id="Group_1_ClusterIP" type="IPaddr2" class="ocf"
> provider="heartbeat">
> <lrm_rsc_op id="Group_1_ClusterIP_monitor_0" operation="monitor"
> crm-debug-origin="build_active_RAs" crm_feature_set="3.0.1" transition-
> key="28:0:7:6b445452-980a-455f-8616-7bd12f20843e" transition-
> magic="0:7;28:0:7:6b445452-980a-455f-8616-7bd12f20843e" call-id="10" rc-
code="7"
> op-status="0" interval="0" last-run="1343738096" last-rc-change="1343738096"
> exec-time="20" queue-time="30" op-digest="f22a042c86b227078b239707d4e4d4a2"/>
>
> <lrm_rsc_op id="Group_1_ClusterIP_start_0" operation="start" crm-
> debug-origin="do_update_resource" crm_feature_set="3.0.1" transition-
> key="87:27957:0:6b445452-980a-455f-8616-7bd12f20843e" transition-
> magic="0:0;87:27957:0:6b445452-980a-455f-8616-7bd12f20843e" call-id="83503"
rc-
> code="0" op-status="0" interval="0" last-run="1343928908" last-rc-
> change="1343928908" exec-time="280" queue-time="20" op-
> digest="f22a042c86b227078b239707d4e4d4a2"/>
>
> <lrm_rsc_op id="Group_1_ClusterIP_monitor_5000"
operation="monitor"
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" transition-
> key="12:27957:0:6b445452-980a-455f-8616-7bd12f20843e" transition-
> magic="2:-2;12:27957:0:6b445452-980a-455f-8616-7bd12f20843e" call-id="83504"
rc-
> code="-2" op-status="2" interval="5000" last-rc-change="1343928921" exec-
> time="20000" queue-time="0" op-digest="79c3bdd01c6e0fd819484536a54bf7a2"/>
> (Please note exec-time=20000)
>
Following are the details of packages:
cluster-glue: 1.0.6 (1c87a0c58c59fc384b93ec11476cefdbb6ddc1e1)
resource-agents: # Build version: 7a11934b142d1daf42a04fbaa0391a3ac47cee4c
CRM Version: 1.0.12 (unknown)
pacemaker 1.0.12-1.el5.centos - (none) x86_64
corosync 1.2.7-1.1.el5 - (none) x86_64
resource-agents 1.0.4-1.1.el5 - (none) x86_64
There are 4 virtual IP resources configued:
Out of these, 3 recover after a monitor timeout but one Virtual IP rsc does not
recover. Following are the logs that are observed:
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: run_graph: Transition 63579
(Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=8,
Source=/var/lib/pengine/pe-input-1660.bz2): Terminated
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: ERROR: te_graph_trigger: Transition
failed: terminated
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_graph: Graph 63579 (9
actions in 9 synapses): batch-limit=30 jobs, network-delay=60000ms
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_graph: Synapse 0 was
confirmed (priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_graph: Synapse 1 is pending
(priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_elem: [Action 8]:
Pending (id: Rsc1_GroupClusterIP_stop_0, loc: CSS-FU-1, priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_elem: * [Input 103]:
Pending (id: Rsc2_stop_0, loc: CSS-FU-1, priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_graph: Synapse 2 is pending
(priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_elem: [Action 97]:
Pending (id: Rsc1_GroupClusterIP_start_0, loc: CSS-FU-2, priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_elem: * [Input 8]:
Pending (id: Rsc1_GroupClusterIP_stop_0, loc: CSS-FU-1, priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_graph: Synapse 3 is pending
(priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_elem: [Action 98]:
Pending (id: Rsc1_GroupClusterIP_monitor_1000, loc: CSS-FU-2, priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_elem: * [Input 97]:
Pending (id: Rsc1_GroupClusterIP_start_0, loc: CSS-FU-2, priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_graph: Synapse 4 is pending
(priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_elem: [Action 99]:
Pending (id: Rsc3_start_0, loc: CSS-FU-2, priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_elem: * [Input 97]:
Pending (id: Rsc1_GroupClusterIP_start_0, loc: CSS-FU-2, priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_graph: Synapse 5 is pending
(priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_elem: [Action 100]:
Pending (id: Rsc3_monitor_1000, loc: CSS-FU-2, priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_elem: * [Input 99]:
Pending (id: Rsc3_start_0, loc: CSS-FU-2, priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_graph: Synapse 6 is pending
(priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_elem: [Action 101]:
Pending (id: Rsc4_start_0, loc: CSS-FU-2, priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_elem: * [Input 97]:
Pending (id: Rsc1_GroupClusterIP_start_0, loc: CSS-FU-2, priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_graph: Synapse 7 is pending
(priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_elem: [Action 102]:
Pending (id: Rsc4_monitor_1000, loc: CSS-FU-2, priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_elem: * [Input 101]:
Pending (id: Rsc4_start_0, loc: CSS-FU-2, priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_graph: Synapse 8 is pending
(priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_elem: [Action 36]:
Pending (id: all_stopped, type: pseduo, priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: WARN: print_elem: * [Input 8]:
Pending (id: Rsc1_GroupClusterIP_stop_0, loc: CSS-FU-1, priority: 0)
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: info: te_graph_trigger: Transition 63579
is now complete
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: info: notify_crmd: Transition 63579
status: done - <null>
Jul 29 13:41:52 CSS-FU-1 crmd: [11165]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
1) Please help as to why a monitor is timing out ?
2) Why does one of the VIP's fails to recover after a timeout ?
More information about the Pacemaker
mailing list