[Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs
Andrew Beekhof
andrew at beekhof.net
Mon Jul 1 22:56:06 UTC 2013
On 02/07/2013, at 2:58 AM, Digimer <lists at alteeve.ca> wrote:
> On 07/01/2013 12:43 PM, Lars Marowsky-Bree wrote:
>> On 2013-07-01T11:53:29, Digimer <lists at alteeve.ca> wrote:
>>
>>> You are right, of course. Imagine though that the IPMI BMC's network
>>> port or cable could have silently failed some time before the node
>>> failed.
>>
>> Pacemaker can monitor the fencing device if you configure a monitor
>> action for it, for exactly this reason.
>
> My *very* initial testing of op monitor="30" didn't detect the failure
> or recovery of the fence device.
That might come down to the quality of the monitor action in the agent though.
> I may very well have screwed something
> up though... I still have a lot to learn.
>
> As an aside, RHEL 6.4 introduce 'fence_check' which will do the same if
> you cron/script it.
>
>>> Yes, this is two simultaneous failues so not an overall SPoF, but
>>> likely enough that it should be addressed.
>>
>> Yes ;-)
>>
>> While it's conceivable that the *fencing* network switch doesn't have a
>> dual power supply and thus is affected by the outage (and very very few
>> management boards have two network ports so that you could connect them
>> to two), the answer here could be to - at least for two node scenarios -
>> just connect the management ports to a dedicated NIC on the other node.
>> (A ring topology for multiple nodes is conceivable.)
>>
>> Then a single power failure could well cause both methods to fail.
>>
>> Still, it's a double failure that we, officially, don't protect against
>> in all scenarios (the power failure + whatever causes the fence).
>
> I protect against this scenario by using two switches and plugging IPMI
> into the first switch and the PDUs into the second switch. All nodes use
> bonded links with a leg in either switch. So the failure of an entire
> switch will not cause an interruption or the loss of fencing capabilities.
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list