[Pacemaker] Time out issue while stopping resource in pacemaker
Andrew Beekhof
andrew at beekhof.net
Fri Oct 10 01:42:28 UTC 2014
On 10 Oct 2014, at 12:12 pm, Lax <lkota at cisco.com> wrote:
> Hi All,
>
> I ran into a time out issue while failing over from master to the peer
> server and I have a 2 node setup with 2 resources. Though it was working all
> along, this was the first time this issue is seen for me.
>
> It fail with following error 'error: process_lrm_event: LRM operation
> resourceB_stop_0 (40) Timed Out (timeout=20000ms)'.
>
Have you considered making the timeout longer?
>
>
> Here is the complete log snippet from pacemaker, appreciate your help on this.
>
>
> Oct 9 14:57:38 server1 cib[368]: notice: cib:diff: Diff: +++ 0.3.1
> 4e9bfa03cf2fef61843c18e127044d81
> Oct 9 14:57:38 server1 cib[368]: notice: cib:diff: -- <cib
> admin_epoch="0" epoch="2" num_updates="8" />
> Oct 9 14:57:38 server1 crmd[373]: notice: do_state_transition: State
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
> origin=abort_transition_graph ]
> Oct 9 14:57:38 server1 cib[368]: notice: cib:diff: ++
> <instance_attributes id="nodes-server1" >
> Oct 9 14:57:38 server1 cib[368]: notice: cib:diff: ++ <nvpair
> id="nodes-server1-standby" name="standby" value="true" />
> Oct 9 14:57:38 server1 cib[368]: notice: cib:diff: ++
> </instance_attributes>
> Oct 9 14:57:38 server1 pengine[372]: notice: unpack_config: On loss of
> CCM Quorum: Ignore
> Oct 9 14:57:38 server1 pengine[372]: notice: LogActions: Move
> ClusterIP#011(Started server1 -> 172.28.0.64)
> Oct 9 14:57:38 server1 pengine[372]: notice: LogActions: Move
> resourceB#011(Started server1 -> 172.28.0.64)
> Oct 9 14:57:38 server1 pengine[372]: notice: process_pe_message:
> Calculated Transition 11: /var/lib/pacemaker/pengine/pe-input-1710.bz2
> Oct 9 14:57:58 server1 lrmd[370]: warning: child_timeout_callback:
> resourceB_stop_0 process (PID 17327) timed out
> Oct 9 14:57:58 server1 lrmd[370]: warning: operation_finished:
> resourceB_stop_0:17327 - timed out after 20000ms
> Oct 9 14:57:58 server1 lrmd[370]: notice: operation_finished:
> resourceB_stop_0:17327 [ % Total % Received % Xferd Average Speed
> Time Time Time Current ]
> Oct 9 14:57:58 server1 lrmd[370]: notice: operation_finished:
> resourceB_stop_0:17327 [ Dload Upload
> Total Spent Left Speed ]
> Oct 9 14:57:58 server1 lrmd[370]: notice: operation_finished:
> resourceB_stop_0:17327 [ #015 0 0 0 0 0 0 0 0
> --:--:-- --:--:-- --:--:-- 0#015 0 0 0 0 0 0 0
> 0 --:--:-- 0:00:01 --:--:-- 0#015 0 0 0 0 0 0
> 0 0 --:--:-- 0:00:02 --:--:-- 0#015 0 0 0 0 0
> 0 0 0 --:--:-- 0:00:03 --:--:-- 0#015 0 0 0 0
> 0 0 0 0 --:--:-- 0:00:04 --:--:-- 0#015 0 0 0
> 0 0 0 0 0 --:--:-- 0:00:05 -
> Oct 9 14:57:58 server1 crmd[373]: error: process_lrm_event: LRM
> operation resourceB_stop_0 (40) Timed Out (timeout=20000ms)
> Oct 9 14:57:58 server1 crmd[373]: warning: status_from_rc: Action 10
> (resourceB_stop_0) on server1 failed (target: 0 vs. rc: 1): Error
> Oct 9 14:57:58 server1 crmd[373]: warning: update_failcount: Updating
> failcount for resourceB on server1 after failed stop: rc=1 (update=INFINITY,
> time=1412891878)
> Oct 9 14:57:58 server1 attrd[371]: notice: attrd_trigger_update: Sending
> flush op to all hosts for: fail-count-resourceB (INFINITY)
> Oct 9 14:57:58 server1 crmd[373]: warning: update_failcount: Updating
> failcount for resourceB on server1 after failed stop: rc=1 (update=INFINITY,
> time=1412891878)
> Oct 9 14:57:58 server1 crmd[373]: notice: run_graph: Transition 11
> (Complete=2, Pending=0, Fired=0, Skipped=9, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-1710.bz2): Stopped
> Oct 9 14:57:58 server1 attrd[371]: notice: attrd_perform_update: Sent
> update 11: fail-count-resourceB=INFINITY
>
>
> Thanks
> Lax
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20141010/7d8b9726/attachment-0009.sig>
More information about the Pacemaker
mailing list