[Pacemaker] Stonith: How to avoid deathmatch cluster partitioning

Andreas Kurz andreas at hastexo.com
Fri May 17 16:30:54 EDT 2013


On 2013-05-16 11:31, Klaus Darilion wrote:
> Hi Andreas!
> 
> On 15.05.2013 22:55, Andreas Kurz wrote:
>> On 2013-05-15 15:34, Klaus Darilion wrote:
>>> On 15.05.2013 14:51, Digimer wrote:
>>>> On 05/15/2013 08:37 AM, Klaus Darilion wrote:
>>>>> primitive st-pace1 stonith:external/xen0 \
>>>>>           params hostlist="pace1" dom0="xentest1" \
>>>>>           op start start-delay="15s" interval="0"
>>>>
>>>> Try;
>>>>
>>>> primitive st-pace1 stonith:external/xen0 \
>>>>           params hostlist="pace1" dom0="xentest1" delay="15" \
>>>>           op start start-delay="15s" interval="0"
>>>>
>>>> The idea here is that, when both nodes lose contact and initiate a
>>>> fence, 'st-pace1' will get a 15 second reprieve. That is, 'st-pace2'
>>>> will wait 15 seconds before trying to fence 'st-pace1'. If st-pace1 is
>>>> still alive, it will fence 'st-pace2' without delay, so pace2 will be
>>>> dead before it's timer expires, preventing a dual-fence. However, if
>>>> pace1 really is dead, pace2 will fence it and recovery, just with a 15
>>>> second delay.
>>>
>>> Sounds good, but pacemaker does not accept the parameter:
>>>
>>>     ERROR: st-pace1: parameter delay does not exist
>>
>> start-delay is an option of the monitor operation ... in fact means
>> "don't trust that start was successfull, wait for the initial monitor
>> some more time"
>>
>> The problem is, this would only make sense for one single stonith
>> resource that can fence more nodes. In case of a split-brain that would
>> delay the start on that node where the stonith resource was not running
>> before and gives that node a "penalty".
> 
> Thanks for the clarification. I already thought that the start-delay
> workaround is not useful in my setup.
> 
>> In your example with two stonith resources running all the time,
>> Digimer's suggestion is a good idea: use one of the redhat fencing
>> agents, most of them have some sort of "stonith-delay" parameter that
>> you can use with one instance.
> 
> I found it somehow confusing that a generic parameter (delay is useful
> for all stonith agents) is implemented in the agent, not in pacemaker.
> Further, downloading the RH source RPMS and extracting the agents is
> also quite cumbersome.

If you are on an Ubuntu >=12.04 or Debian Wheezy the fence-agents
package is available ... so no need for extra work ;-)

> 
> I think I will add the delay parameter to the relevant fencing agent
> myself. I guess I also have increase the stonith-timeout and add the
> configured delay.
> 
> Do you know how to submit patches for the stonith agents?

Sending them e.g. to the linux-ha-dev mailinglist is an option.

Best regards,
Andreas

> 
> Thanks
> Klaus
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


-- 
Need help with Pacemaker?
http://www.hastexo.com/now


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 287 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130517/ac017da9/attachment-0003.sig>


More information about the Pacemaker mailing list