[Pacemaker] How to really deal with gateway restarts?
Andrew Beekhof
andrew at beekhof.net
Mon Jun 14 06:13:59 UTC 2010
On Thu, Jun 10, 2010 at 9:22 PM, Maros Timko <timkom at gmail.com> wrote:
> Hi all,
>
> I know it was requested here number of times, but with no real
> conclusive answer. All of the requests were update Pacemaker and use
> ping RA.
>
> Setup:
> - simple symetric 2 node DRBD-Xen cluster
> - both nodes connected to the same network and gateway
> - cloned ping RA to monitor gateway and update pingd attribute
> - pingd:defined used to migrate resources on node with better
> communication abilities
>
> Scenario:
> - simulate gateway failure or restart
>
> Expected outcome:
> - active node should remain active without touching resources because
> both nodes has the same score (pingd=0) and pingd:defined means "do
> not shutdown resources when node looses connectivity"
>
> Experienced outcome:
> - CRM initiates resource migration
> - Xen VM is stopped
> - CRM aborts resource migration
> - Xen VM is started
> - active node is active again, but VM was restarted
>
> Analyses of the problem:
> - because currently active node is DC (but probably not only for this
> reason) the update of pingd from active node is processed as the first
> one. It is done before the update from standby is processed meaning
> standby has better score. Thus CRM decides to migrate resources.
> - attribute update from standby node is processed, meaning rolling
> back of the migration
>
> Possible resolutions:
> - tweak the standby ping RA to postpone updates a bit (a bit stupid
> and asymetric)
> - ensure that standby is DC (no CLI option and not sure if that would
> help though)
> - ensure that standby monitoring cycle is delayed after active one
> (but how with cloned RA)
> - any other proposal?
>
> I thought "dampen" attribute could help with some of the options, but
> actually it is does not.
It should do. Hard to say without any logs from the two machines.
> The only thing that worked for me was
> restarting of standby CRM, until its monitoring cycle was a bit behind
> the active. But I would not be happy with it.
> Does anybody have any idea if there could be some option like "Hey,
> change of this attribute can trigger resource migration. Let's wait a
> while (configured) for standby value update..."? Or any other crazy
> ideas?
>
> Thanks,
> Tino
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
More information about the Pacemaker
mailing list