[Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

Andrew Beekhof andrew at beekhof.net
Mon Jul 1 07:26:24 EDT 2013


On 01/07/2013, at 5:17 PM, Florian Crouzat <gentoo at floriancrouzat.net> wrote:

> Le 29/06/2013 01:22, Andrew Beekhof a écrit :
>> 
>> On 29/06/2013, at 12:22 AM, Digimer <lists at alteeve.ca> wrote:
>> 
>>> On 06/28/2013 06:21 AM, Andrew Beekhof wrote:
>>>> 
>>>> On 28/06/2013, at 5:22 PM, Lars Marowsky-Bree <lmb at suse.com> wrote:
>>>> 
>>>>> On 2013-06-27T12:53:01, Digimer <lists at alteeve.ca> wrote:
>>>>> 
>>>>>> primitive fence_n01_psu1_off stonith:fence_apc_snmp \
>>>>>>       params ipaddr="an-p01" pcmk_reboot_action="off" port="1"
>>>>>> pcmk_host_list="an-c03n01.alteeve.ca"
>>>>>> primitive fence_n01_psu1_on stonith:fence_apc_snmp \
>>>>>>       params ipaddr="an-p01" pcmk_reboot_action="on" port="1"
>>>>>> pcmk_host_list="an-c03n01.alteeve.ca"
>>>>> 
>>>>> So every device twice, including location constraints? I see potential
>>>>> for optimization by improving how the fence code handles this ... That's
>>>>> abhorrently complex. (And I'm not sure the 'action' parameter ought to
>>>>> be overwritten.)
>>>> 
>>>> I'm not crazy about it either because it means the device is tied to a specific command.
>>>> But it seems to be something all the RHCS people try to do...
>>> 
>>> Maybe something in the rhcs water cooler made us all mad... ;)
>>> 
>>>>> Glad you got it working, though.
>>>>> 
>>>>>> location loc_fence_n01_ipmi fence_n01_ipmi -inf: an-c03n01.alteeve.ca
>>>>> [...]
>>>>> 
>>>>> I'm not sure you need any of these location constraints, by the way. Did
>>>>> you test if it works without them?
>>>>> 
>>>>>> Again, this is after just one test. I will want to test it several more
>>>>>> times before I consider it reliable. Ideally, I would love to hear
>>>>>> Andrew or others confirm this looks sane/correct.
>>>>> 
>>>>> It looks correct, but not quite sane. ;-) That seems not to be
>>>>> something you can address, though. I'm thinking that fencing topology
>>>>> should be smart enough to, if multiple fencing devices are specified, to
>>>>> know how to expand them to "first all off (if off fails anywhere, it's a
>>>>> failure), then all on (if on fails, it is not a failure)". That'd
>>>>> greatly simplify the syntax.
>>>> 
>>>> The RH agents have apparently already been updated to support multiple ports.
>>>> I'm really not keen on having the stonith-ng doing this.
>>> 
>>> This doesn't help people who have dual power rails/PDUs for power
>>> redundancy.
>> 
>> I'm yet to be convinced that having two PDUs is helping those people in the first place.
>> If it were actually useful, I suspect more than two/three people would have asked for it in the last decade.
> 
> Well, it's probably because many people are still toying around with pacemaker and I assume that not many advanced RHCS users have yet tried to translate their current RHCS cluster to pacemaker. Digimer and I did, and we both failed having the equivalent <device> configuration we had in our RHCS setup.

Yes, but RHEL isn't the only Enterprise distro out there.
Its not like Pacemaker has never been deployed in critical environments during the last decade.

German Air Traffic Control (http://www.novell.com/success/dfs.html) for example.
Will planes fall out of the sky if your cluster fails?

> 
> I suspect more and more people will hit this issue soon or later.
> 
> Anyway, whatever will follow in terms of configuration primitive or API, thanks to Digimer tests we now have something (even if unelegant) working :)
> 
> 
> -- 
> Cheers,
> Florian Crouzat
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Pacemaker mailing list