[Pacemaker] pingd process dies for no reason

Tue Jan 11 14:15:57 UTC 2011

hy,

thx i configured these values now. i hope that we won't face this problem 
again, otherwise, like i said, i turned on the debug mode of the ping ra, 
and if i get the next maintenance window, i'll turn on cluster debog mode. 
so we'd have more log info to find the reason for this problem.

thx again.

kr patrik

Mit freundlichen Grüßen / Best Regards

Patrik Rapposch, BSc
System Administration

KNAPP Systemintegration GmbH
Waltenbachstraße 9
8700 Leoben, Austria 
Phone: +43 3842 805-915
Fax: +43 3842 82930-500
patrik.rapposch at knapp.com 
www.KNAPP.com 

Commercial register number: FN 138870x
Commercial register court: Leoben

The information in this e-mail (including any attachment) is confidential 
and intended to be for the use of the addressee(s) only. If you have 
received the e-mail by mistake, any disclosure, copy, distribution or use 
of the contents of the e-mail is prohibited, and you must delete the 
e-mail from your system. As e-mail can be changed electronically KNAPP 
assumes no responsibility for any alteration to this e-mail or its 
attachments. KNAPP has taken every reasonable precaution to ensure that 
any attachment to this e-mail has been swept for virus. However, KNAPP 
does not accept any liability for damage sustained as a result of such 
attachment being virus infected and strongly recommend that you carry out 
your own virus check before opening any attachment.

Lars Ellenberg <lars.ellenberg at linbit.com> 
11.01.2011 14:47
Bitte antworten an
The Pacemaker cluster resource manager  <pacemaker at oss.clusterlabs.org>

An
pacemaker at oss.clusterlabs.org
Kopie

Thema
Re: [Pacemaker] pingd process dies for no reason

On Tue, Jan 11, 2011 at 11:24:35AM +0100, Patrik.Rapposch at knapp.com wrote:
> we already made changes to the interval and timeout (<op 
> id="pingd-op-monitor-30s" interval="30s" name="monitor" 
timeout="10s"/>).
> 
> how big should dampen be set?
> 
> please correct me, if i am wrong, as i calculate it as following:
> assuming the last check was ok and in the next second, the failures 
takes 
> place:
> then we there would be 29s till the next check will start, and another 
10 
> seconds timeout, plus 5 seconds dampen. this would be 44 seconds, isn't 
> that enough?

I think "dampen" needs to be larger than the monitoring interval.
And the timeout on the operation should be large enough that
ping, even if the remote is unreachable for the first time,
will timeout by itself (and not killed prematurely by lrmd because
the operation timeout elapsed).

try with interval 15s, dampen 20,
  instance parameter timeout: something explicit, if you want to.
  instance parameter attempts: something explicit, if you want to.
 monitor operation timeout=60s 

BTW, someone should really implement the fping based ping RA ...
Or did I miss it?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: 
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110111/ac7c23b5/attachment-0002.htm>