[Pacemaker] Antwort: Re: fencing with multiple node cluster

Tue Oct 28 12:32:09 EDT 2014

hi,

Von:    Dejan Muhamedagic <dejanmm at fastmail.fm>
An:     The Pacemaker cluster resource manager 
<pacemaker at oss.clusterlabs.org>
Datum:  28.10.2014 16:45
Betreff:        Re: [Pacemaker] fencing with multiple node cluster

>
>
>Hi,
>
>On Tue, Oct 28, 2014 at 09:51:02AM -0400, Digimer wrote:
>>> On 28/10/14 05:59 AM, philipp.achmueller at arz.at wrote:
>>> hi,
>>>
>>> any recommendation/documentation for a reliable fencing implementation
>>> on a multi-node cluster (4 or 6 nodes on 2 site).
>>> i think of implementing multiple node-fencing devices for each host to
>>> stonith remaining nodes on other site?
>>>
>>> thank you!
>>> Philipp
>>
>> Multi-site clustering is very hard to do well because of fencing 
issues. 
>> How do you distinguish a site failure from severed links?
>
>Indeed. There's a booth server managing the tickets in
>pacemaker, which uses arbitrators to resolve ties. booth source
>is available at github.com and packaged for several
>distributions at OBS
>(
http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/)
>It's also supported in the newly released SLE12.
>
>Thanks,
>
>Dejan
>
hi,

@Digimer. thank you for explaination, but manual failover between sites 
isn't what i'm looking for.

@Dejan. Yes, i already tried a cluster(SLES11SP3) with booth setup. i used 
documentation from sleha11 SP3. 
but i'm afraid it is unclear for me how "fencing" with booth exactly works 
in case of some failures (loss-policy=fence). documentation says something 
like: ...to speed up recovery process nodes get fenced... do i need 
classic node-fencing(IPMI) when i configure booth setup? may you have some 
more information about that?

For correct setup, the arbitrator needs an adequate 3th location. site A 
and site B need separate connection to site C, otherwise some scenarios 
will fail.
any possibilities to get this running with 2 sites?

thank you!

>> Given that a 
>> failed fence action can not be assumed to be a success, then the only 
>> safe option is to block until a human intervenes. This makes your 
>> cluster as reliable as your WAN between the sites, which is too say, 
not 
>> very reliable. In any case, the destruction of a site will require 
>> manual failover, which can be complicated if insufficient nodes remain 
>> to form quorum.
>>
>> Generally, I'd recommend to different clusters, one per site, with 
>> manual/service-level failover in the case of a disaster.
>>
>> In any case; A good fencing setup should have two fence methods. 
>> Personally, I always use IPMI as a primary fence method (routed through 

>> one switch) and a pair of switched PDUs as backup (via a backup 
switch). 
>> This way, when IPMI is available, a confirmed fence is 100% certain to 
>> be good. However, if the node is totally disabled/destroyed, IPMI will 
>> be lost and the cluster will switch to the switched PDUs, cutting the 
>> power outlets feeding the node.
>>
>> I've got a block diagram of how I do this:
>>
>> https://alteeve.ca/w/AN!Cluster_Tutorial_2#A_Map.21
>>
>> It's trivial to scale the idea up to multiple node clusters.
>>
>> Cheers
>>
>> -- 
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without 
>> access to education?
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>_______________________________________________
>Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>Project Home: http://www.clusterlabs.org
>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20141028/f9c5792a/attachment-0003.html>