[Pacemaker] Critical: Monitor operation of IPaddr2 timing out, taking more than 60s. Fails to recover.
Parshvi
parshvi.17 at gmail.com
Thu Aug 9 05:14:02 UTC 2012
Hi,
The monitor operation of IPaddr2 rsc agent is timing out.
Interval: 5s
Timeout: 60s
The timeout was increased from an earlier 20s to now 60s. Even then, there are
multiple logs of monitor op. timing out.
1) What can cause the monitor to take so long ?
2) Looking at the pe-input, what contributes to the operation time ? Is it just
the exec-time or exec-time + queue-time ?
3) Any solution proposed ?
I have lrm pe-input when the timeout was configured at 20s:
Here, is pe-input snapshot where monitor op. timed out (with timeout=20s)
<lrm_resource id="Group_1_ClusterIP" type="IPaddr2" class="ocf"
provider="heartbeat">
<lrm_rsc_op id="Group_1_ClusterIP_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.1" transition-
key="28:0:7:6b445452-980a-455f-8616-7bd12f20843e" transition-
magic="0:7;28:0:7:6b445452-980a-455f-8616-7bd12f20843e" call-id="10" rc-code="7"
op-status="0" interval="0" last-run="1343738096" last-rc-change="1343738096"
exec-time="20" queue-time="30" op-digest="f22a042c86b227078b239707d4e4d4a2"/>
<lrm_rsc_op id="Group_1_ClusterIP_start_0" operation="start" crm-
debug-origin="do_update_resource" crm_feature_set="3.0.1" transition-
key="87:27957:0:6b445452-980a-455f-8616-7bd12f20843e" transition-
magic="0:0;87:27957:0:6b445452-980a-455f-8616-7bd12f20843e" call-id="83503" rc-
code="0" op-status="0" interval="0" last-run="1343928908" last-rc-
change="1343928908" exec-time="280" queue-time="20" op-
digest="f22a042c86b227078b239707d4e4d4a2"/>
<lrm_rsc_op id="Group_1_ClusterIP_monitor_5000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" transition-
key="12:27957:0:6b445452-980a-455f-8616-7bd12f20843e" transition-
magic="2:-2;12:27957:0:6b445452-980a-455f-8616-7bd12f20843e" call-id="83504" rc-
code="-2" op-status="2" interval="5000" last-rc-change="1343928921" exec-
time="20000" queue-time="0" op-digest="79c3bdd01c6e0fd819484536a54bf7a2"/>
(Please note exec-time=20000)
<lrm_rsc_op id="Group_1_ClusterIP_stop_0" operation="stop" crm-
debug-origin="do_update_resource" crm_feature_set="3.0.1" transition-
key="13:27957:0:6b445452-980a-455f-8616-7bd12f20843e" transition-
magic="0:0;13:27957:0:6b445452-980a-455f-8616-7bd12f20843e" call-id="83497" rc-
code="0" op-status="0" interval="0" last-run="1343928906" last-rc-
change="1343928906" exec-time="1190" queue-time="30" op-
digest="f22a042c86b227078b239707d4e4d4a2"/>
</lrm_resource>
Please tell me if any other input is required. I would appreciate any early
help/solution.
Thanks,
Parshvi
More information about the Pacemaker
mailing list