[Pacemaker] Two-Nodes Cluster fencing : Best Practices

Thu Jul 25 17:33:51 CEST 2013

----- Original Message -----
> From: "Digimer" <lists at alteeve.ca>
> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> Sent: Thursday, July 25, 2013 10:53:27 AM
> Subject: Re: [Pacemaker] Two-Nodes Cluster fencing : Best Practices
> 
> With two-node clusters, quorum can't be used. This is fine *if* you
> have
> good fencing. If the nodes partition (ie: network failure), both will
> try to fence the other. In theory, the faster node will power off the
> other node before the slower node can kill the faster node. In
> practice,
> this isn't always the case.
> 
> IPMI (and iDRAC, etc) are independent devices. So it is possible for
> both nodes to initiate a power-down on the other before either dies.
> To
> avoid this, you will want to set a delay for the primary/active
> node's
> fence primitive.
> 
> Say "node1" is your active node and "node2" is your backup. You would
> set a delay of, say, 15 seconds against "node1". Now if there is a
> partition, node1 would look up how to fence node2 and immediately
> initiate power off. Node 2, however, would look up how to fence
> node1,
> see a 15 second delay, and start a timer before calling the
> power-off.
> Of course, node2 will die before the timer expires.
> 
> You can also disabled acpid on the nodes, too. With that disabled,
> "pressing the power button" will result in a near-instant off. If you
> do
> this, reducing your delay to 5 seconds would probably be plenty.
> 
> There is another issue to be aware of; "Fence loops". The problem
> with
> two node clusters and not using quorum is that a single node can
> fence
> the other. So lets continue our example above...
> 
> Node 2 will eventually reboot. If you have pacemaker set to start on
> boot, it will start, wait to connect to node1 (which it can't because
> the network failure remains), call a fence to put node1 into a known
> state, pause for 15 seconds and then initiate a power off. Node 1
> dies
> and the services recover on Node 2. Now, node1 boots back up, starts
> it's pacemaker.... Endless loop of fence -> recover until the network
> is
> fixed.
> 
> To avoid this, simple do not start pacemaker on boot.
> 
> As to the specifics, you can test fencing configurations easily by
> directly calling the fence agent at the command line. I do not use
> DRAC,
> so I can't speak to specifics. I think you need to set lanplus and
> possibly define the console prompt to expect.
> 
> Using a generic IPMI as an example;
> 
> fence_ipmilan -a 192.168.100.1 -l ipmiuser -p ipmipwd -o status
> fence_ipmilan -a 192.168.100.2 -l ipmiuser -p ipmipwd -o status
> 
> If this returns the power state, then it is simple to convert to a
> pacemaker config.
> 
> configure primitive pStN1 stonith:fence_ipmilan params \
>   ipaddr=192.168.100.1 login=ipmiuser passwd=ipmipwd delay=15 \
>   op monitor interval=60s
> configure primitive pStN2 stonith:fence_ipmilan params \
>   ipaddr=192.168.100.2 login=ipmiuser passwd=ipmipwd \
>   op monitor interval=60s
> 
> Again, I *think* you need to set a couple extra options for DRAC.
> Experiment at the command line before moving to the pacemaker config.
> Once you have the command line version working, you should be able to
> set it up in pacemaker. If you have trouble though, share the CLI
> call
> and we can help with the pacemaker config.
> 

I use external/ipmi with my iDRACs (5's and 6's) with the following pacemaker config:

primitive p_ipmilan_condor stonith:external/ipmi \
        params hostname="Condor" ipaddr="192.168.x.x" userid="root" passwd="XXXXXX" \

The iDRAC needs the following settings for this to work:
	IPMI over LAN – ON
	Security setup – root as the user, set the BMC/iDRAC password

Sounds like you will need to convert to a provided fence agent but hopefully this helps some.

HTH

Jake

> On 25/07/13 05:39, Bruno MACADRÉ wrote:
> > Some modifications about my first mail :
> >
> > After some researches I found that external/ipmi isn't available on
> > my
> > system, so I must use fence-agents.
> >
> > My second question must be modified to relfect this changes like
> > this :
> >
> >      configure primitive pStN1 stonith:fence_ipmilan params
> > ipaddr=192.168.100.1 login=ipmiuser passwd=ipmipwd
> >      configure primitive pStN2 stonith:fence_ipmilan params
> > ipaddr=192.168.100.2 login=ipmiuser passwd=ipmipwd
> >
> > Regards,
> > Bruno
> >
> > Le 25/07/2013 10:39, Bruno MACADRÉ a écrit :
> >> Hi,
> >>
> >>     I've just made a two-nodes Active/Passive cluster to have an
> >>     iSCSI
> >> Failover SAN.
> >>
> >>     Some details about my configuration :
> >>
> >>         - I've two nodes with 2 bonds : 1 for DRBD replication and
> >>         1
> >> for communication
> >>         - iSCSI Target, iSCSI Lun and VirtualIP are constraints
> >> together to start on Master DRBD node
> >>
> >>     All work fine, but now, I need to configure fencing. I've 2
> >>     DELL
> >> PowerEdge servers with iDRAC6.
> >>
> >>     First question, is 'external/drac5' compatible with iDrac6
> >>     (I've
> >> read all and nothing about this...) ?
> >>
> >>     Second question, is that configuration sufficient (with ipmi)
> >>     ?
> >>
> >>         configure primitive pStN1 stonith:external/ipmi
> >>         hostname=node1
> >> ipaddr=192.168.100.1 userid=ipmiuser passwd=ipmipwd interface=lan
> >>         configure primitive pStN2 stonith:external/ipmi
> >>         hostname=node2
> >> ipaddr=192.168.100.2 userid=ipmiuser passwd=ipmipwd interface=lan
> >>         location lStN1 pStN1 inf: node1
> >>         location lStN2 pStN2 inf: node2
> >>
> >>         And after all :
> >>         configure property stonith-enabled=true
> >>         configure property stonith-action=poweroff
> >>
> >>     Third (and last) question, what about quorum ? At the moment
> >>     I've
> >> 'no-quorum-policy="ignore"' but it's a risk isn't it ?
> >>
> >>     Don't hesitate to request me for more information if needed,
> >>
> >>     Regards,
> >>     Bruno.
> >>
> >
> 
> 
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person
> without
> access to education?
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
>