[Pacemaker] how to get pacemaker:ping recheck before promoting drbd resources on a node
Jelle de Jong
jelledejong at powercraft.nl
Tue Apr 19 09:54:01 UTC 2011
On 19-04-11 11:31, Andrew Beekhof wrote:
> It the underlying messaging/membership layer goes into spasms -
> there's not much ping can do to help you. What version of corosync
> have you got? Some versions have been better than others.
corosync 1.2.1-4
pacemaker 1.0.9.1+hg15626-1
/etc/debian_version 6.0.1 (stable)
> Correct, its checked periodically.
Can I change the config that a ping check is done before promoting drbd?
I tried adding a seperate ping0: http://pastebin.com/raw.php?i=2WD1HKnC
I thought it worked but ping0 starts and drbd is still promoted probably
because ping0 returns a successful start but does not return an error
because the actual ping failed. So I tried adding additonal location
rules for ping0 but then the resources is not started at anymore:
http://pastebin.com/raw.php?i=DXqRzMNs
> That is something that would be needed to be added to the drbd
> agent. Alternatively, configure the ping resource to update more
> frequently.
How can this be done? crm ra info ocf:ping doesn't show much info. I
tried using attempts="1" dampen="1" timeout="1" and monitor
interval="1". An example how to do frequent fast ping would be welcome.
If I cam make the ping check fast enough to detect network failures
before corosync tell pacemaker the other node disappears/failed this may
provide a workaround solution.
> But you did loose the node. The cluster can't see into the future to
> know that it will come back in a bit. What token timeouts are you
> using?
True, but the node should see his own network is down and see he is the
one that was failing and wait until his network is back and check his
situation again before doing things with his resources.
My corosync.conf with token 3000: http://pastebin.com/Y5Lkf4Ch
Thanks in advance,
Any help is much appreciated,
Kind regards,
Jelle de Jong
More information about the Pacemaker
mailing list