[Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

Mon Jul 1 08:52:59 UTC 2013

Hi,

On Sat, Jun 29, 2013 at 10:15:57PM +0200, Lars Ellenberg wrote:
> On Fri, Jun 28, 2013 at 07:27:19PM -0400, Digimer wrote:
> > On 06/28/2013 07:22 PM, Andrew Beekhof wrote:
> > > 
> > > On 29/06/2013, at 12:22 AM, Digimer <lists at alteeve.ca> wrote:
> > > 
> > >> On 06/28/2013 06:21 AM, Andrew Beekhof wrote:
> > >>>
> > >>> On 28/06/2013, at 5:22 PM, Lars Marowsky-Bree <lmb at suse.com> wrote:
> > >>>
> > >>>> On 2013-06-27T12:53:01, Digimer <lists at alteeve.ca> wrote:
> > >>>>
> > >>>>> primitive fence_n01_psu1_off stonith:fence_apc_snmp \
> > >>>>>       params ipaddr="an-p01" pcmk_reboot_action="off" port="1"
> > >>>>> pcmk_host_list="an-c03n01.alteeve.ca"
> > >>>>> primitive fence_n01_psu1_on stonith:fence_apc_snmp \
> > >>>>>       params ipaddr="an-p01" pcmk_reboot_action="on" port="1"
> > >>>>> pcmk_host_list="an-c03n01.alteeve.ca"
> > >>>>
> > >>>> So every device twice, including location constraints? I see potential
> > >>>> for optimization by improving how the fence code handles this ... That's
> > >>>> abhorrently complex. (And I'm not sure the 'action' parameter ought to
> > >>>> be overwritten.)
> > >>>
> > >>> I'm not crazy about it either because it means the device is tied to a specific command.
> > >>> But it seems to be something all the RHCS people try to do...
> > >>
> > >> Maybe something in the rhcs water cooler made us all mad... ;)
> > >>
> > >>>> Glad you got it working, though.
> > >>>>
> > >>>>> location loc_fence_n01_ipmi fence_n01_ipmi -inf: an-c03n01.alteeve.ca
> > >>>> [...]
> > >>>>
> > >>>> I'm not sure you need any of these location constraints, by the way. Did
> > >>>> you test if it works without them?
> > >>>>
> > >>>>> Again, this is after just one test. I will want to test it several more
> > >>>>> times before I consider it reliable. Ideally, I would love to hear
> > >>>>> Andrew or others confirm this looks sane/correct.
> > >>>>
> > >>>> It looks correct, but not quite sane. ;-) That seems not to be
> > >>>> something you can address, though. I'm thinking that fencing topology
> > >>>> should be smart enough to, if multiple fencing devices are specified, to
> > >>>> know how to expand them to "first all off (if off fails anywhere, it's a
> > >>>> failure), then all on (if on fails, it is not a failure)". That'd
> > >>>> greatly simplify the syntax.
> > >>>
> > >>> The RH agents have apparently already been updated to support multiple ports.
> > >>> I'm really not keen on having the stonith-ng doing this.
> > >>
> > >> This doesn't help people who have dual power rails/PDUs for power
> > >> redundancy.
> > > 
> > > I'm yet to be convinced that having two PDUs is helping those people in the first place.
> > > If it were actually useful, I suspect more than two/three people would have asked for it in the last decade.
> > 
> > Step 1. Use one PDU
> > Step 2. Kill PDU
> > 
> > Your node is dead and can not be fenced.
> 
> I have multiple independend cluster communication channels.
> I don't see the node on either of them,
> I cannot reach it's IPMI or equivalent,
> I cannot reach it's PDU
> 
> I'd argue that a failure mode where all of the above was true,
> and that node would still be alive is "sufficiently unlikely"
> to just conclude that it is in fact dead.
> 
> Rather that than a fencing method that returns "yes, I rebooted that
> node" when in fact that node did not even notice...
> 
> > Using two separate UPSes and two separate PDUs to feed either PSU in
> > each node (and either switch in a two-switch configuration with bonded
> > network links) means that you can lose a power rail and not have an
> > interruption.
> >
> > I can't say why it's not a more common configuration, but I can say that
> > I do not see another way to provide redundant power. For me, an HA
> > cluster is not truly HA until all single points of failure have been
> > removed.
> 
> If I do have two independend UPSes and PDUs and PSUs,
> (yes, that is a common setup)
> and I want a second fencing method to fallback from IPMI, then yes,
> it'd would be nice to have some clean and easy way
> to tell pacemaker to do that.
> 
> But not having that fallback fencing method does not introduce a SPOF.
> Both mainboard (or kernel or resource stop failure or whatever)
> and BMC would have to fail at the same time for the cluster to block...

Right. It is often missed that actually more than one failure is
required for that setup to fail. In case of dual PDU/PSU/UPS an
IPMI based fencing is sufficient.

Thanks,

Dejan

> -- 
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org