[ClusterLabs] Antw: Re: reproducible split brain

Digimer lists at alteeve.ca
Sat Mar 19 16:43:59 CET 2016


On 19/03/16 10:10 AM, Dennis Jacobfeuerborn wrote:
> On 18.03.2016 00:50, Digimer wrote:
>> On 17/03/16 07:30 PM, Christopher Harvey wrote:
>>> On Thu, Mar 17, 2016, at 06:24 PM, Ken Gaillot wrote:
>>>> On 03/17/2016 05:10 PM, Christopher Harvey wrote:
>>>>> If I ignore pacemaker's existence, and just run corosync, corosync
>>>>> disagrees about node membership in the situation presented in the first
>>>>> email. While it's true that stonith just happens to quickly correct the
>>>>> situation after it occurs it still smells like a bug in the case where
>>>>> corosync in used in isolation. Corosync is after all a membership and
>>>>> total ordering protocol, and the nodes in the cluster are unable to
>>>>> agree on membership.
>>>>>
>>>>> The Totem protocol specifies a ring_id in the token passed in a ring.
>>>>> Since all of the 3 nodes but one have formed a new ring with a new id
>>>>> how is it that the single node can survive in a ring with no other
>>>>> members passing a token with the old ring_id?
>>>>>
>>>>> Are there network failure situations that can fool the Totem membership
>>>>> protocol or is this an implementation problem? I don't see how it could
>>>>> not be one or the other, and it's bad either way.
>>>>
>>>> Neither, really. In a split brain situation, there simply is not enough
>>>> information for any protocol or implementation to reliably decide what
>>>> to do. That's what fencing is meant to solve -- it provides the
>>>> information that certain nodes are definitely not active.
>>>>
>>>> There's no way for either side of the split to know whether the opposite
>>>> side is down, or merely unable to communicate properly. If the latter,
>>>> it's possible that they are still accessing shared resources, which
>>>> without proper communication, can lead to serious problems (e.g. data
>>>> corruption of a shared volume).
>>>
>>> The totem protocol is silent on the topic of fencing and resources, much
>>> the way TCP is.
>>>
>>> Please explain to me what needs to be fenced in a cluster without
>>> resources where membership and total message ordering are the only
>>> concern. If fencing were a requirement for membership and ordering,
>>> wouldn't stonith be part of corosync and not pacemaker?
>>
>> Corosync is a membership and communication layer (and in v2+, a quorum
>> provider). It doesn't care about or manage anything higher up. So it
>> doesn't care about fencing itself.
>>
>> It simply cares about;
>>
>> * Who is in the cluster?
>> * How do the members communicate?
>> * (v2+) Is there enough members for quorum?
>> * Notify resource managers of membership changes (join or loss).
>>
>> The resource manager, pacemaker or rgmanager, care about resources, so
>> it is what cares about making smart decisions. As Ken pointed out,
>> without fencing, it can never tell the difference between no access and
>> dead peer.
>>
>> This is (again) why fencing is critical.
> 
> I think the key issue here is that people think about corosync they
> believe there can only be two state for membership (true or false) when
> in reality there are three possible states: true, false and unknown.
> 
> The problem then is that corosync apparently has no built-in way to deal
> with the "unknown" situation and requires guidance from an external
> entity for that (in this case pacemakers fencing).
> 
> This means that corosync alone simply cannot give you reliable
> membership guarantees. I strictly requires external help to be able to
> provide that.
> 
> Regards,
>   Dennis

I'm not sure that is accurate.

If corosync declares a node lost (failed to receive X tokens in Y time),
the node is declared lost and it reforms a new cluster, without the lost
member. So from corosync's perspective, the lost node is no longer a
member (it won't receive messages). It is possible that the lost node
might itself be alive, in which case it's corosync will do the same
thing (reform a new cluster, possibly with itself as the sole member).

If you're trying to have corosync *do* something, then that is missing
the point of corosync, I think. In all cases I've ever seen, you need a
separate resource manager to actually react to the membership changes.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?



More information about the Users mailing list