[ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

Digimer lists at alteeve.ca
Sat Apr 22 03:20:10 EDT 2017


On 22/04/17 03:05 AM, Andrei Borzenkov wrote:
> 18.04.2017 10:47, Ulrich Windl пишет:
> ...
>>>
>>> Now let me come back to quorum vs. stonith;
>>>
>>> Said simply; Quorum is a tool for when everything is working. Fencing is
>>> a tool for when things go wrong.
>>
>> I'd say: Quorum is the tool to decide who'll be alive and who's going to die,
>> and STONITH is the tool to make nodes die.
> 
> If I had PROD, QA and DEV in a cluster and PROD were separated from
> QA+DEV I'd be very sad if PROD were shut down.
> 
> The notion of simple node majority as kill policy is not appropriate as
> well as simple node based delays. I wish pacemaker supported scoring
> system for resources so that we could base stonith delays on them (the
> most important sub-cluster starts fencing first).
> 
> 
>> If everything is working you need
>> neither quorum nor STONITH.
>>
> 
> I wonder how SBD fits into this discussion. It is marketed as stonith
> agent, but it is based on committing suicide so relies on well-behaving
> nodes. Which we by definition cannot trust to behave well, otherwise
> we'd not need stonith in the first place.

The logic, when using a watchdog timer, is that if the node is alive
enough to kick the watchdog, it's alive enough to not do something dumb
to the cluster. If it's not able to kick the timer, the watchdog timer
will reset the machine. This works *if* all resources hang when messages
stop coming back from the peer (a side effect of corosync's virtual
synchrony).

So as I understand it, for SBD to be safe, it requires a hardware
watchdog timer and a properly configured cluster.

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould




More information about the Users mailing list