[Pacemaker] Problem with dual-PDU fencing node with redundant PSUs

Thu Jun 27 10:56:40 EDT 2013

On 06/27/2013 10:52 AM, Dejan Muhamedagic wrote:
> On Thu, Jun 27, 2013 at 09:54:13AM -0400, Digimer wrote:
>> On 06/27/2013 07:02 AM, Dejan Muhamedagic wrote:
>>> Hi,
>>>
>>> On Wed, Jun 26, 2013 at 03:52:00PM -0400, Digimer wrote:
>>>> This question appears to be the same issue asked here:
>>>>
>>>> http://oss.clusterlabs.org/pipermail/pacemaker/2013-June/018650.html
>>>>
>>>> In my case, I have two fence methods per node; IPMI first with
>>>> action="reboot" and, if that fails, two PDUs (one backing each side of
>>>> the node's redundant PSUs).
>>>>
>>>> Initially I setup the PDUs as action "reboot" figuring that the
>>>> fence_toplogy tied them together, so pacemaker would call "pdu1:port1;
>>>> off -> pdu2:port1; off; (verify both are off) -> pdu1:port1; on ->
>>>> pdu2:port1; on".
>>>>
>>>> This didn't happen though. It called 'pdu1:port1; reboot' then
>>>> "pdu2:port1; reboot", so the first PSU in the node had it's power back
>>>> before the second PSU lost power, meaning the node never powered off.
>>>
>>> I'm not sure if that's supported.
>>
>> Unless I am misunderstood, beekhof indicated that it is/should be.
> 
> I'm pretty sure that it's not, but perhaps things changed in the
> meantime. At least it wasn't when we discussed the
> implementation.
> 
>>>> So next I tried;
>>>>
>>>> pdu1:port1; off -> pdu2:port1; off -> pdu1:port1; on -> pdu1:port1; on
>>>>
>>>> However, this seemed to have actually done;
>>>>
>>>> pdu1:port1; reboot -> pdu2:port1; reboot -> pdu1:port1; reboot ->
>>>> pdu1:port1; reboot
>>>>
>>>> So again, the node never lost power to both PSUs at the same time, so
>>>> the node didn't power off.
>>>>
>>>> This makes PDU fencing unreliable. I know beekhof said:
>>>>
>>>>   "My point would be that action=off is not the correct way to configure
>>>> what you're trying to do."
>>>>
>>>> in the other thread, but there was no elaborating on what *is* the right
>>>> way. So if neither approach works, what is the proper way for configure
>>>> PDU fencing when you have two different PDUs backing either PSU?
>>>
>>> The fence action needs to be defined in the cluster properties
>>> (crm_config/cluster_property_set in XML):
>>>
>>> # crm configure property stonith-action=off
>>>
>>> See the output of:
>>>
>>> $ crm ra info pengine
>>>
>>> for the PE metadata and explanation of properties.
>>
>> In irc last night, beekhof mentioned that action="..." is ignored and
>> replaced. However, it would appear that pcmk_reboot_action="..." should
>> force the issue. I'm planning to test this today.
> 
> Yes, true, though it's a bit of a kludge
> (pcmk_reboot_action="off" if I got that right).
> 
>>>>   I don't want to disable "reboot" globally because I still want the
>>>> IPMI based fencing to do action="reboot".
>>>
>>> I don't think it is possible to define a per-resource fencing
>>> action.
>>>
>>>> If I just do "off", then the
>>>> node will not power back on after a successful fence. This is better
>>>> than nothing, but still quite sub-optimal.
>>>
>>> Yes, if you want to start the cluster stack automatically on
>>> reboot. Anyway, I think that it would be preferred to let a human
>>> check why the node got fenced before letting it join the cluster
>>> again. In that case, one just needs to boot the host manually.
>>>
>>> Thanks,
>>>
>>> Dejan
>>
>> I don't want the cluster stack to start on boot, so I disable
>> pacemaker/corosync. However, I do want the node to power back on so that
>> I can log into it when the alarms go off. Yes, I could log into the good
>> node, manually unfence/boot it and then log in, but this adds minutes to
>> the MTTR that I would realllly like to avoid.
> 
> Certainly it adds a bit of time, but only to the node's MTTR,
> not the cluster's MTTR. Anyway, if pacemaker can turn off the
> node, then a short script can also turn it on.
> 
> Cheers,
> 
> Dejan

If I need to write a script, I will instead write a new fence agent that
handles multiple PDUs in a sensible fashion. I'm already thinking of
"fence_apc_multi" that takes a string of addresses and ports and does a
clean "off" on all, verifies all are off, then an "on" on all. This
would make the pacemaker config a lot simpler and cleaner and allow
"reboot" to remain the default action.

However, this feels like a really bad solution. It's not uncommon to
have two separate power rails feeding either side of the node's PSUs.
Particularly in HA environments. RHCS has supported this for a very long
time and I expect many users will run into this problem as they try to
migrate to RHEL 7. I see no reason why this can't be properly handled in
pacemaker directly.

I'm hoping it is and I am just too new to pacemaker to realize my mistake.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?