[Pacemaker] Fwd: Re: How can I make the secondary machine elect itself owner of the floating IP address?

Mon Sep 24 07:22:55 CEST 2012

Forwarding to the list for posterity (i.e. google) - I believe my reply
did solve the problem, BTW.

The crm config in question is:

node scc-bak
node scc-pri
primitive ClusterIP ocf:heartbeat:IPaddr2 \
	params ip="10.1.1.180" cidr_netmask="24" \
	op monitor interval="30s"
primitive drbd_r0 ocf:linbit:drbd \
	params drbd_resource="r0" \
	op monitor interval="15" role="Master" \
	op monitor interval="30" role="Slave" \
primitive fs_r0 ocf:heartbeat:Filesystem \
	params device="/dev/drbd1" directory="/home/scc" fstype="ext3" \
	op monitor interval="10s"
primitive scc-stonith stonith:meatware \
	operations $id="scc-stonith-operations" \
	op monitor interval="3600" timeout="20" start-delay="15" \
	params hostlist="10.1.1.32 10.1.1.31"
group r0 fs_R0 ClusterIP
ms ms_drbd_r0 drbd_ro \
	meta master-max="1" master-node-max="1" clone-max="2" \
	clone-node-max="1" notify="true"
colocation r0_on_drbd inf: r0 ms_drbd_r0:Master
order r0_after_drbd inf: ms_drbd_r0:promote r0:start
property $id="cib-bootstrap-options" \
	dc-version="1.1.6-b988976485d15cb702c9307df55512d323831a5e" \
	cluster-infrastructure="openais" \
	expected-quorum-votes="2" \
	no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
	resource-stickiness="200"

I probably should have noted that "scc-pri" and "scc-bak" aren't really
the best choice of names, because "pri" and "bak" are kind of
meaningless assuming identical nodes (and the nomenclature gets
confusing when you start talking about masters and slaves on top of that).

Anyway...

-------- Original Message --------
Subject: Re: How can I make the secondary machine elect itself owner of
the floating IP address?
Date: Thu, 20 Sep 2012 12:36:03 +1000
From: Tim Serong
To: Epps, Josh

Hi Josh,

On 09/20/2012 10:47 AM, Epps, Josh wrote:
> Hi Tim,
> 
> I saw one of your Gossamer threads and I really need some help.
> 
> I have a two-node cluster running on SLES 11 SP2 with Pacemaker and DRBD.
> When I shutdown the primary with the "shutdown -h now"  the
> ocf:heartbeat:IPaddr2 transfers nicely to the backup server.
> But when I simulate a failure on the primary node by killing the power
> neither the floating IP address or the mount transfer to the secondary
> machine.

What's probably happening is:

- When you do a clean shutdown of one node, the surviving node knows the
first has gone away, and it can safely take over those resources.
- When you cut power, the surviving node doesn't know what state the
first node is in, so will do nothing until the first node is fenced.
- You're using the meatware STONITH plugin (which probably doesn't need
a monitor op, BTW), which means you should see a CRIT message in syslog
on the surviving node, telling you it expects the first node to be fenced.

> 
> How can I make the secondary machine elect itself owner of the floating
> IP address?

Assuming the first machine is really down :) you should be able to tell
the cluster this is so by running "meatclient -c scc-pri" on the
surviving node (but do check syslog to see if you're really getting
warnings about a node needing to be fenced).

> Suse support today said that it can’t be done with just two nodes but we
> just require a one-way failover.

Two node clusters should work fine, they're just more annoying than
three node - see for example "STONITH Deathmatch Explained" at
http://ourobengr.com/ha/

If the above doesn't solve it for you, do you mind if we take this to
the linux-ha or pacemaker public mailing list?  More eyes on a problem
never hurts, and then a solution becomes googlable :)

Regards,

Tim
-- 
Tim Serong
Senior Clustering Engineer
SUSE
tserong at suse.com