[Pacemaker] pingd process dies for no reason

Tue Jan 11 14:53:29 UTC 2011

On Tue, Jan 11, 2011 at 2:45 PM, Lars Ellenberg
<lars.ellenberg at linbit.com> wrote:
> On Tue, Jan 11, 2011 at 11:24:35AM +0100, Patrik.Rapposch at knapp.com wrote:
>> we already made changes to the interval and timeout (<op
>> id="pingd-op-monitor-30s" interval="30s" name="monitor" timeout="10s"/>).
>>
>> how big should dampen be set?
>>
>> please correct me, if i am wrong, as i calculate it as following:
>> assuming the last check was ok and in the next second, the failures takes
>> place:
>> then we there would be 29s till the next check will start, and another 10
>> seconds timeout, plus 5 seconds dampen. this would be 44 seconds, isn't
>> that enough?
>
> I think "dampen" needs to be larger than the monitoring interval.
> And the timeout on the operation should be large enough that
> ping, even if the remote is unreachable for the first time,
> will timeout by itself (and not killed prematurely by lrmd because
> the operation timeout elapsed).
>
> try with interval 15s, dampen 20,
>  instance parameter timeout: something explicit, if you want to.
>  instance parameter attempts: something explicit, if you want to.
>  monitor operation timeout=60s
>
> BTW, someone should really implement the fping based ping RA ...

Thankyou for volunteering :-)

> Or did I miss it?
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>