[Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

Fri Jun 28 23:27:19 UTC 2013

On 06/28/2013 07:22 PM, Andrew Beekhof wrote:
> 
> On 29/06/2013, at 12:22 AM, Digimer <lists at alteeve.ca> wrote:
> 
>> On 06/28/2013 06:21 AM, Andrew Beekhof wrote:
>>>
>>> On 28/06/2013, at 5:22 PM, Lars Marowsky-Bree <lmb at suse.com> wrote:
>>>
>>>> On 2013-06-27T12:53:01, Digimer <lists at alteeve.ca> wrote:
>>>>
>>>>> primitive fence_n01_psu1_off stonith:fence_apc_snmp \
>>>>>       params ipaddr="an-p01" pcmk_reboot_action="off" port="1"
>>>>> pcmk_host_list="an-c03n01.alteeve.ca"
>>>>> primitive fence_n01_psu1_on stonith:fence_apc_snmp \
>>>>>       params ipaddr="an-p01" pcmk_reboot_action="on" port="1"
>>>>> pcmk_host_list="an-c03n01.alteeve.ca"
>>>>
>>>> So every device twice, including location constraints? I see potential
>>>> for optimization by improving how the fence code handles this ... That's
>>>> abhorrently complex. (And I'm not sure the 'action' parameter ought to
>>>> be overwritten.)
>>>
>>> I'm not crazy about it either because it means the device is tied to a specific command.
>>> But it seems to be something all the RHCS people try to do...
>>
>> Maybe something in the rhcs water cooler made us all mad... ;)
>>
>>>> Glad you got it working, though.
>>>>
>>>>> location loc_fence_n01_ipmi fence_n01_ipmi -inf: an-c03n01.alteeve.ca
>>>> [...]
>>>>
>>>> I'm not sure you need any of these location constraints, by the way. Did
>>>> you test if it works without them?
>>>>
>>>>> Again, this is after just one test. I will want to test it several more
>>>>> times before I consider it reliable. Ideally, I would love to hear
>>>>> Andrew or others confirm this looks sane/correct.
>>>>
>>>> It looks correct, but not quite sane. ;-) That seems not to be
>>>> something you can address, though. I'm thinking that fencing topology
>>>> should be smart enough to, if multiple fencing devices are specified, to
>>>> know how to expand them to "first all off (if off fails anywhere, it's a
>>>> failure), then all on (if on fails, it is not a failure)". That'd
>>>> greatly simplify the syntax.
>>>
>>> The RH agents have apparently already been updated to support multiple ports.
>>> I'm really not keen on having the stonith-ng doing this.
>>
>> This doesn't help people who have dual power rails/PDUs for power
>> redundancy.
> 
> I'm yet to be convinced that having two PDUs is helping those people in the first place.
> If it were actually useful, I suspect more than two/three people would have asked for it in the last decade.

Step 1. Use one PDU
Step 2. Kill PDU

Your node is dead and can not be fenced.

Using two separate UPSes and two separate PDUs to feed either PSU in
each node (and either switch in a two-switch configuration with bonded
network links) means that you can lose a power rail and not have an
interruption.

I can't say why it's not a more common configuration, but I can say that
I do not see another way to provide redundant power. For me, an HA
cluster is not truly HA until all single points of failure have been
removed.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?