[ClusterLabs] Antw: Is there a way to ignore a single monitoring timeout

Fri Sep 1 10:52:25 EDT 2017

On 1.09.2017 17:21, Jan Pokorný wrote:
> On 01/09/17 09:48 +0300, Klechomir wrote:
>> I have cases, when for an unknown reason a single monitoring request
>> never returns result.
>> So having bigger timeouts doesn't resolve this problem.
> If I get you right, the pain point here is a command called by the
> resource agents during monitor operation, while this command under
> some circumstances _never_ terminates (for dead waiting, infinite
> loop, or whatever other reason) or possibly terminates based on
> external/asynchronous triggers (e.g. network connection gets
> reestablished).
>
> Stating obvious, the solution should be:
> - work towards fixing such particular command if blocking
>    is an unexpected behaviour (clarify this with upstream
>    if needed)
> - find more reliable way for the agent to monitor the resource
>
> For the planned soft-recovery options Ken talked about, I am not
> sure if it would be trivially possible to differentiate exceeded
> monitor timeout from a plain monitor failure.

In any case currently there is no differentiation between failed 
monitoring request and timeouts, so a parameter for ignoring X fails in 
a row would be very welcome for me.

Here is one very fresh example, entirely unrelated to LV&I/O:
Aug 30 10:44:19 [1686093] CLUSTER-1       crmd:    error: 
process_lrm_event:    LRM operation p_PingD_monitor_0 (1148) Timed Out 
(timeout=20000ms)
Aug 30 10:44:56 [1686093] CLUSTER-1       crmd:   notice: 
process_lrm_event:    LRM operation p_PingD_stop_0 (call=1234, rc=0, 
cib-update=40, confirmed=true) ok
Aug 30 10:45:26 [1686093] CLUSTER-1       crmd:   notice: 
process_lrm_event:    LRM operation p_PingD_start_0 (call=1240, rc=0, 
cib-update=41, confirmed=true) ok
In this case PingD is fencing drbd and causes unneeded (as the next 
monitoring request is ok) restart of all related resources.
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170901/e3ff65bf/attachment-0003.html>