[Pacemaker] Two-Nodes Cluster fencing : Best Practices
Jake Smith
jsmith at argotec.com
Thu Jul 25 15:33:51 UTC 2013
----- Original Message -----
> From: "Digimer" <lists at alteeve.ca>
> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> Sent: Thursday, July 25, 2013 10:53:27 AM
> Subject: Re: [Pacemaker] Two-Nodes Cluster fencing : Best Practices
>
> With two-node clusters, quorum can't be used. This is fine *if* you
> have
> good fencing. If the nodes partition (ie: network failure), both will
> try to fence the other. In theory, the faster node will power off the
> other node before the slower node can kill the faster node. In
> practice,
> this isn't always the case.
>
> IPMI (and iDRAC, etc) are independent devices. So it is possible for
> both nodes to initiate a power-down on the other before either dies.
> To
> avoid this, you will want to set a delay for the primary/active
> node's
> fence primitive.
>
> Say "node1" is your active node and "node2" is your backup. You would
> set a delay of, say, 15 seconds against "node1". Now if there is a
> partition, node1 would look up how to fence node2 and immediately
> initiate power off. Node 2, however, would look up how to fence
> node1,
> see a 15 second delay, and start a timer before calling the
> power-off.
> Of course, node2 will die before the timer expires.
>
> You can also disabled acpid on the nodes, too. With that disabled,
> "pressing the power button" will result in a near-instant off. If you
> do
> this, reducing your delay to 5 seconds would probably be plenty.
>
> There is another issue to be aware of; "Fence loops". The problem
> with
> two node clusters and not using quorum is that a single node can
> fence
> the other. So lets continue our example above...
>
> Node 2 will eventually reboot. If you have pacemaker set to start on
> boot, it will start, wait to connect to node1 (which it can't because
> the network failure remains), call a fence to put node1 into a known
> state, pause for 15 seconds and then initiate a power off. Node 1
> dies
> and the services recover on Node 2. Now, node1 boots back up, starts
> it's pacemaker.... Endless loop of fence -> recover until the network
> is
> fixed.
>
> To avoid this, simple do not start pacemaker on boot.
>
> As to the specifics, you can test fencing configurations easily by
> directly calling the fence agent at the command line. I do not use
> DRAC,
> so I can't speak to specifics. I think you need to set lanplus and
> possibly define the console prompt to expect.
>
> Using a generic IPMI as an example;
>
> fence_ipmilan -a 192.168.100.1 -l ipmiuser -p ipmipwd -o status
> fence_ipmilan -a 192.168.100.2 -l ipmiuser -p ipmipwd -o status
>
> If this returns the power state, then it is simple to convert to a
> pacemaker config.
>
> configure primitive pStN1 stonith:fence_ipmilan params \
> ipaddr=192.168.100.1 login=ipmiuser passwd=ipmipwd delay=15 \
> op monitor interval=60s
> configure primitive pStN2 stonith:fence_ipmilan params \
> ipaddr=192.168.100.2 login=ipmiuser passwd=ipmipwd \
> op monitor interval=60s
>
> Again, I *think* you need to set a couple extra options for DRAC.
> Experiment at the command line before moving to the pacemaker config.
> Once you have the command line version working, you should be able to
> set it up in pacemaker. If you have trouble though, share the CLI
> call
> and we can help with the pacemaker config.
>
I use external/ipmi with my iDRACs (5's and 6's) with the following pacemaker config:
primitive p_ipmilan_condor stonith:external/ipmi \
params hostname="Condor" ipaddr="192.168.x.x" userid="root" passwd="XXXXXX" \
The iDRAC needs the following settings for this to work:
IPMI over LAN – ON
Security setup – root as the user, set the BMC/iDRAC password
Sounds like you will need to convert to a provided fence agent but hopefully this helps some.
HTH
Jake
> On 25/07/13 05:39, Bruno MACADRÉ wrote:
> > Some modifications about my first mail :
> >
> > After some researches I found that external/ipmi isn't available on
> > my
> > system, so I must use fence-agents.
> >
> > My second question must be modified to relfect this changes like
> > this :
> >
> > configure primitive pStN1 stonith:fence_ipmilan params
> > ipaddr=192.168.100.1 login=ipmiuser passwd=ipmipwd
> > configure primitive pStN2 stonith:fence_ipmilan params
> > ipaddr=192.168.100.2 login=ipmiuser passwd=ipmipwd
> >
> > Regards,
> > Bruno
> >
> > Le 25/07/2013 10:39, Bruno MACADRÉ a écrit :
> >> Hi,
> >>
> >> I've just made a two-nodes Active/Passive cluster to have an
> >> iSCSI
> >> Failover SAN.
> >>
> >> Some details about my configuration :
> >>
> >> - I've two nodes with 2 bonds : 1 for DRBD replication and
> >> 1
> >> for communication
> >> - iSCSI Target, iSCSI Lun and VirtualIP are constraints
> >> together to start on Master DRBD node
> >>
> >> All work fine, but now, I need to configure fencing. I've 2
> >> DELL
> >> PowerEdge servers with iDRAC6.
> >>
> >> First question, is 'external/drac5' compatible with iDrac6
> >> (I've
> >> read all and nothing about this...) ?
> >>
> >> Second question, is that configuration sufficient (with ipmi)
> >> ?
> >>
> >> configure primitive pStN1 stonith:external/ipmi
> >> hostname=node1
> >> ipaddr=192.168.100.1 userid=ipmiuser passwd=ipmipwd interface=lan
> >> configure primitive pStN2 stonith:external/ipmi
> >> hostname=node2
> >> ipaddr=192.168.100.2 userid=ipmiuser passwd=ipmipwd interface=lan
> >> location lStN1 pStN1 inf: node1
> >> location lStN2 pStN2 inf: node2
> >>
> >> And after all :
> >> configure property stonith-enabled=true
> >> configure property stonith-action=poweroff
> >>
> >> Third (and last) question, what about quorum ? At the moment
> >> I've
> >> 'no-quorum-policy="ignore"' but it's a risk isn't it ?
> >>
> >> Don't hesitate to request me for more information if needed,
> >>
> >> Regards,
> >> Bruno.
> >>
> >
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person
> without
> access to education?
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
More information about the Pacemaker
mailing list