[Pacemaker] pingd process dies for no reason
Patrik.Rapposch at knapp.com
Patrik.Rapposch at knapp.com
Tue Jan 11 14:15:57 UTC 2011
hy,
thx i configured these values now. i hope that we won't face this problem
again, otherwise, like i said, i turned on the debug mode of the ping ra,
and if i get the next maintenance window, i'll turn on cluster debog mode.
so we'd have more log info to find the reason for this problem.
thx again.
kr patrik
Mit freundlichen Grüßen / Best Regards
Patrik Rapposch, BSc
System Administration
KNAPP Systemintegration GmbH
Waltenbachstraße 9
8700 Leoben, Austria
Phone: +43 3842 805-915
Fax: +43 3842 82930-500
patrik.rapposch at knapp.com
www.KNAPP.com
Commercial register number: FN 138870x
Commercial register court: Leoben
The information in this e-mail (including any attachment) is confidential
and intended to be for the use of the addressee(s) only. If you have
received the e-mail by mistake, any disclosure, copy, distribution or use
of the contents of the e-mail is prohibited, and you must delete the
e-mail from your system. As e-mail can be changed electronically KNAPP
assumes no responsibility for any alteration to this e-mail or its
attachments. KNAPP has taken every reasonable precaution to ensure that
any attachment to this e-mail has been swept for virus. However, KNAPP
does not accept any liability for damage sustained as a result of such
attachment being virus infected and strongly recommend that you carry out
your own virus check before opening any attachment.
Lars Ellenberg <lars.ellenberg at linbit.com>
11.01.2011 14:47
Bitte antworten an
The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
An
pacemaker at oss.clusterlabs.org
Kopie
Thema
Re: [Pacemaker] pingd process dies for no reason
On Tue, Jan 11, 2011 at 11:24:35AM +0100, Patrik.Rapposch at knapp.com wrote:
> we already made changes to the interval and timeout (<op
> id="pingd-op-monitor-30s" interval="30s" name="monitor"
timeout="10s"/>).
>
> how big should dampen be set?
>
> please correct me, if i am wrong, as i calculate it as following:
> assuming the last check was ok and in the next second, the failures
takes
> place:
> then we there would be 29s till the next check will start, and another
10
> seconds timeout, plus 5 seconds dampen. this would be 44 seconds, isn't
> that enough?
I think "dampen" needs to be larger than the monitoring interval.
And the timeout on the operation should be large enough that
ping, even if the remote is unreachable for the first time,
will timeout by itself (and not killed prematurely by lrmd because
the operation timeout elapsed).
try with interval 15s, dampen 20,
instance parameter timeout: something explicit, if you want to.
instance parameter attempts: something explicit, if you want to.
monitor operation timeout=60s
BTW, someone should really implement the fping based ping RA ...
Or did I miss it?
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110111/ac7c23b5/attachment-0002.htm>
More information about the Pacemaker
mailing list