[ClusterLabs] Antw: Is there a way to ignore a single monitoring timeout
Jan Pokorný
jpokorny at redhat.com
Fri Sep 1 10:21:03 EDT 2017
On 01/09/17 09:48 +0300, Klechomir wrote:
> I have cases, when for an unknown reason a single monitoring request
> never returns result.
> So having bigger timeouts doesn't resolve this problem.
If I get you right, the pain point here is a command called by the
resource agents during monitor operation, while this command under
some circumstances _never_ terminates (for dead waiting, infinite
loop, or whatever other reason) or possibly terminates based on
external/asynchronous triggers (e.g. network connection gets
reestablished).
Stating obvious, the solution should be:
- work towards fixing such particular command if blocking
is an unexpected behaviour (clarify this with upstream
if needed)
- find more reliable way for the agent to monitor the resource
For the planned soft-recovery options Ken talked about, I am not
sure if it would be trivially possible to differentiate exceeded
monitor timeout from a plain monitor failure.
--
Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170901/f215bae3/attachment-0003.sig>
More information about the Users
mailing list