No subject
Thu Dec 24 21:07:46 UTC 2009
ugh.
I'd like to get a clear idea of what the roadblocks --actually are-- (not =
on a "The WAN link" level but what the WAN link -actually breaks-) to doin=
g what I suggested.
Assuming I can get it to work, are there any other specific reasons it wou=
ldn't?=20
To recap, in my proposed solution, an outage will result in four things:
---
1. A "Race" by both nodes to a 3rd site, to perform an atomic operation (a=
mkdir for instance). Following it, it will be abundantly clear to both no=
des "who is right, and who is dead".
---
2. A hard-iLO-poweroff STONITH (NOT reboot!) from the winner to the loser'=
s iLO. It can also iptables-block all comms from the loser until further =
notice as an extra safety-net.=20
---
3. A hard-own-iLO-poweroff-else-kernel-halt SMITH (NOT reboot!) suicide by=
the loser (SMITH is our pet acronym for Shoot-Myself-...).
---
4. A "WAN-PROBLEM=3D[true|false] flag immediately raised (locally) by the =
winner based on pinging the OTHER SITE's ROUTER. A separate resource on th=
e winner will, in the presence of this flag, monitor the same router of th=
e other site for life, and when the other site comes back up (perhaps -and=
-stays-up-for-an-hour- or some similar flap-avoiding logic) issues a POWER=
ON to the other node's iLO which will come back up as a drbd slave, resync=
and get re-promoted to master.
As an attractive side-benefit, this is a deathmatch-proof design.
----
NOTE: There's a departure from common wisdom here, and I am not sure wheth=
er this one of the issues you're pointing at.=20
Common wisdom states: SMITH BAD, not reliable=20(obvious reasons - no succ=
ess/failure etc)
In this solution I claim: SMIT BAD, not reliable, except in one specific f=
ailure mode (WAN outage) where SMITH GOOD, is reliable, shortcomings can b=
e worked around.
both steps [2] and [3] are issued on EVERY TYPE of outage, regardless of w=
hether it's WAN-related or not.=20
In non-WAN issues the loser is considered compromised, thus making [3] unr=
eliable, but [2] is reliable.
In WAN issues, the WAN is considered compromised, thus making [2] unreliab=
le, but the node itself is sound, so [3] still is reliable.
To sum up, it looks to me like the "data safety" is provided by the layer =
underneath DRBD, not DRBD itself, and if it works as advertised, DRBD shou=
ld have no problem, thus we have a system sufficiently reliable to withsta=
nd any scenario short of a double failure.=20
... thoughts?
--
-----Original Message-----
From: Florian Haas [mailto:florian.haas at linbit.com]=20
Sent: Monday, 18 January 2010 9:36 PM
To: pacemaker at oss.clusterlabs.org
Subject: Re: [Pacemaker] Split Site 2-way clusters
On 2010-01-18 11:14, Andrew Beekhof wrote:
> On Thu, Jan 14, 2010 at 11:44 PM, Miki Shapiro=20
> <Miki.Shapiro at coles.com.au> wrote:
>> Confused.
>>
>>
>>
>> I *am* running DRBD in dual-master mode
>=20
> /me cringes... this sounds to me like an impossibly dangerous idea.
> Can someone from linbit comment on this please? Am I imagining this?
Dual-Primary DRBD in a split site cluster? Really really bad idea.
Anyone attempting this, please search the drbd-user archives for multiple =
discussions about this in the past. Then reconsider.
Hope that makes it clear enough.
Florian
______________________________________________________________________
This email and any attachments may contain privileged and confidential
information and are intended for the named addressee only. If you have
received this e-mail in error, please notify the sender and delete
this e-mail immediately. Any confidentiality, privilege or copyright
is not waived or lost because this e-mail has been sent to you in
error. It is your responsibility to check this e-mail and any
attachments for viruses. No warranty is made that this material is
free from computer virus or any other defect or error. Any
loss/damage incurred by using this material is not the sender's
responsibility. The sender's entire liability will be limited to
resupplying the material.
______________________________________________________________________
More information about the Pacemaker
mailing list