[Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs
Digimer
lists at alteeve.ca
Mon Jul 1 16:58:25 UTC 2013
On 07/01/2013 12:43 PM, Lars Marowsky-Bree wrote:
> On 2013-07-01T11:53:29, Digimer <lists at alteeve.ca> wrote:
>
>> You are right, of course. Imagine though that the IPMI BMC's network
>> port or cable could have silently failed some time before the node
>> failed.
>
> Pacemaker can monitor the fencing device if you configure a monitor
> action for it, for exactly this reason.
My *very* initial testing of op monitor="30" didn't detect the failure
or recovery of the fence device. I may very well have screwed something
up though... I still have a lot to learn.
As an aside, RHEL 6.4 introduce 'fence_check' which will do the same if
you cron/script it.
>> Yes, this is two simultaneous failues so not an overall SPoF, but
>> likely enough that it should be addressed.
>
> Yes ;-)
>
> While it's conceivable that the *fencing* network switch doesn't have a
> dual power supply and thus is affected by the outage (and very very few
> management boards have two network ports so that you could connect them
> to two), the answer here could be to - at least for two node scenarios -
> just connect the management ports to a dedicated NIC on the other node.
> (A ring topology for multiple nodes is conceivable.)
>
> Then a single power failure could well cause both methods to fail.
>
> Still, it's a double failure that we, officially, don't protect against
> in all scenarios (the power failure + whatever causes the fence).
I protect against this scenario by using two switches and plugging IPMI
into the first switch and the PDUs into the second switch. All nodes use
bonded links with a leg in either switch. So the failure of an entire
switch will not cause an interruption or the loss of fencing capabilities.
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
More information about the Pacemaker
mailing list