[Pacemaker] Split Site 2-way clusters

Andrew Beekhof andrew at beekhof.net
Mon Jan 18 10:14:58 UTC 2010


On Thu, Jan 14, 2010 at 11:44 PM, Miki Shapiro
<Miki.Shapiro at coles.com.au> wrote:
> Confused.
>
>
>
> I *am* running DRBD in dual-master mode

/me cringes... this sounds to me like an impossibly dangerous idea.
Can someone from linbit comment on this please?  Am I imagining this?

> (apologies, I should have mentioned
> that earlier), and there will be both WAN clients as well as
> local-to-datacenter-clients writing to both nodes on both ends. It’s safe to
> assume the clients will know not of the split.
>
>
>
> In a WAN split I need to ensure that the node whose idea of drbd volume will
> be kept once resync happens stays up, and node that’ll get blown away and
> re-synced/overwritten becomes dead asap.

Won't you _always_ loose some data in a WAN split though?
AFAICS, you're doing here is preventing "some" being "lots".

Is master/master really a requirement?

> NodeX(Successfully) taking on data from clients while in
> quorumless-freeze-still-providing-service, then discarding its hitherto
> collected client data when realizing other node has quorum and discarding
> own data isn’t good.

Agreed - freeze isn't an option if you're doing master/master.

>
> To recap what I understood so far:
>
> 1.       CRM Availability on the multicast channel drives DC election, but
> DC election is irrelevant to us here.
>
> 2.       CRM Availability on the multicast channel (rather than resource
> failure) drive who-is-in-quorum-and-who-is-not decisions [not sure here..
> correct?

correct

> Or does resource failure drive quorum? ]

quorum applies to node availability - resource failures have no impact
(unless they lead to fencing with then leads to the node leaving the membership)

>
> 3.       Steve to clarify what happens quorum-wise if 1/3 nodes sees both
> others, but the other two only see the first (“broken triangle”), and
> whether this behaviour may differ based on whether the first node (which is
> different as it sees both others) happens to be the DC at the time or not.

Try in a cluster of 3 VMs?
Just use iptables rules to simulate the broken links

>
> Given that anyone who goes about building a production cluster would want to
> identify all likely failure modes and be able to predict how the cluster
> behaves in each one, is there any user-targeted doco/rtfm material one could
> read regarding how quorum establishment works in such scenarios?

I don't think corosync has such a doc at the moment.

> Setting up a 3-way with intermittent WAN links without getting a clear
> understanding in advance of how the software will behave is … scary J




More information about the Pacemaker mailing list