[Pacemaker] Split Site 2-way clusters

Andrew Beekhof andrew at beekhof.net
Thu Jan 14 03:55:39 EST 2010


On Thu, Jan 14, 2010 at 1:40 AM, Miki Shapiro <Miki.Shapiro at coles.com.au>wrote:

>  When you suggest:
>
> >>> What about setting no-quorum-policy to freeze and making the third
> node a full cluster member (that just doesn't run any resources)?
>
> That way, if you get a 1-1-1 split the nodes will leave all services
> running where they were and while it waits for quorum.
>
> And if it heals into a 1-2 split, then the majority will terminate the
> rogue node and acquire all the services.
>
>
>
> No-quorum-policy ‘Freeze’ rather than ‘Stop’ pretty much ASSURES me of
> getting a split brain for my fileserver cluster. Sounds like the last thing
> I want to have. Any data local clients write to the cutoff node (and its
> DRBD split-brain volume) cannot be later reconciled and will need to be
> discarded. I’d rather not give the local clients that can still reach that
> node a false sense of security of their data having been written to disk (to
> a drbd volume that will be blown away and resynced with the quorum side once
> connectivity is re-established). ‘Stop’ policy sounds safer.
>
Are you using DRBD in dual-master mode?
Because freeze wont start anything new, so if you had one writer before the
split, you'll still only have one during.
Then when the split heals DRBD can resync as normal. Right?


>  *Question 1:* Am I missing anything re stop/ignore no-quorum policy?
>

Possibly for the freeze option.


> Further, I’m having more trouble working out a list of tabulated failure
> modes for this 3-way scenario, where 3-way outage-prone WAN links get
> introduced.
>
> *Question 2:* If one WAN link is broken – (A) can speak to (B), (B) can
> speak to (C), but (A) CANNOT speak to (C), what drives the quorum decision
> and what would happen? In particular, what would happen if the node that can
> see both is the DC?
>

I'm not sure how totem works in this situation, but whether the node that
can see both is the DC is irrelevant.
Membership happens at a much lower level.

I _think_ you'd end up with a 2-1 split, but I'm not sure how it decides if
its A-B or B-C
Steve could probably tell you.

*Question 3:* Just to verify I got this right - what drives pacemaker’s
> STONITH events,
>
> [a] RESOURCE monitoring failure,
>
> or
>
> [b] CRM’s crosstalk that establishes quorum-state / DC-election?
>

Its not an XOR condition.
Membership events (from openais) AND resource failures can both result in
fencing occurring.

But please forget about DCs and DC elections - they are really not relevant
to any of this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100114/15756661/attachment-0001.html>


More information about the Pacemaker mailing list