[Pacemaker] Trouble setting up IP failover with ping resource
Anlu Wang
anlu at mixpanel.com
Fri Feb 17 03:57:14 UTC 2012
I have three machines named anlutest1, anlutest2, and anlutest3 that I'm
trying to get IP failover working on. I'm using heartbeat for the messaging
layer, and everything works great when a machine goes down. But I also
would like to failover an IP when EITHER the eth0 or eth1 network
interfaces fail. From reading
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch09s03s03.html
it seems the right way to do this is to add a ping resource.
Here is my XML configuration:
http://pastebin.com/05z7eB2s
This config doesn't work for me. Using the showscores.sh script found at:
http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg00410.html
I see that my scores are:
Resource Score Node Stickiness #Fail
Migration-Threshold
address01 0 anlutest3 0 0
address01 1006 anlutest1 0 5
address01 50 anlutest2 0 157
address02 0 anlutest3 0 0
address02 1050 anlutest2 0 2
address02 6 anlutest1 0 0
address03 1000 anlutest3 0 7
address03 50 anlutest2 0
address03 6 anlutest1 0 0
ping:0 0 anlutest1 0 6
ping:0 0 anlutest2 0 14
ping:0 0 anlutest3 0 0
ping:1 0 anlutest2 0
ping:1 0 anlutest3 0 28
ping:1 -1000000 anlutest1 0 0
ping:2 0 anlutest3 0 13
ping:2 -1000000 anlutest1 0 0
ping:2 -1000000 anlutest2 0
which make no sense at all. I don't see how I could be getting these scores
of 50 and 1006. When I take down an interface on anlutest3, I see scores of
4 and 1004, which sort of make sense, just the multiplier of 100 isn't
working. I was experimenting with changing values, so maybe its caching old
values. If so, how do I enforce the new values?
Furthermore, shouldn't there be no scores of 0? If all 6 IPs I am pinging
return successfully, shouldn't my scores be either 600 or 1600?
In my syslog I also see a ton of messages like
Feb 17 03:54:47 anlutest2 lrmd: [1137]: info: perform_op:2877: operations
on resource address01 already delayed
Feb 17 03:54:48 anlutest2 lrmd: [1137]: info: perform_op:2873: operation
monitor[419] on ocf::ping::ping:1 for client 1140, its parameters:
CRM_meta_clone=[1] host_list=[10.54.130.6 10.54.130.8 10.54.130.7
50.97.196.101 50.97.196.103 50.9CRM_meta_clone_max=[3] dampen=[60s]
crm_feature_set=[3.0.1] CRM_meta_globally_unique=[false] multiplier=[10000]
CRM_meta_name=[monitor] CRM_meta_timeout=[60000] CRM_meta_interval=[5000]
for rsc is already running.
Feb 17 03:54:48 anlutest2 lrmd: [1137]: info: perform_op:2883: postponing
all ops on resource ping:1 by 1000 ms
Feb 17 03:54:48 anlutest2 lrmd: [1137]: info: perform_op:2873: operation
monitor[171] on ocf::ping::ping:2 for client 1140, its parameters:
CRM_meta_clone=[2] host_list=[10.54.130.6 10.54.130.8 10.54.130.7
50.97.196.101 50.97.196.103 50.9CRM_meta_clone_max=[3] dampen=[60s]
crm_feature_set=[3.0.1] CRM_meta_globally_unique=[false] multiplier=[1]
CRM_meta_name=[monitor] CRM_meta_timeout=[30000] CRM_meta_interval=[5000]
for rsc is already running.
Feb 17 03:54:48 anlutest2 lrmd: [1137]: info: perform_op:2883: postponing
all ops on resource ping:2 by 1000 ms
and occasionally
Feb 17 03:54:33 anlutest2 attrd: [1139]: info: attrd_trigger_update:
Sending flush op to all hosts for: pingd (4000)
Feb 17 03:54:33 anlutest2 attrd: [1139]: info: attrd_ha_callback: flush
message from anlutest2
Feb 17 03:54:33 anlutest2 attrd: [1139]: WARN: find_nvpair_attr: Multiple
attributes match name=pingd
Feb 17 03:54:33 anlutest2 attrd: [1139]: info: find_nvpair_attr: Value:
50 #011(id=status-d619a94e-ebba-4ed0-8e0f-89837dd7506b-pingd)
Feb 17 03:54:33 anlutest2 attrd: [1139]: info: find_nvpair_attr: Value: 3
#011(id=status-ab3c1a25-9471-48f7-9c0b-c76238abd402-pingd)
Feb 17 03:54:33 anlutest2 attrd: [1139]: info: attrd_perform_update: Sent
update -40: pingd=4000
Feb 17 03:54:33 anlutest2 attrd: [1139]: ERROR: attrd_cib_callback: Update
-40 for pingd=4000 failed: Required data for this CIB API call not found
Could someone just take a look at my config and let me know what I'm doing
wrong? Or if there's a better way to do what I want to do...
Thanks in advance,
Anlu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120216/df07fd0d/attachment-0003.html>
More information about the Pacemaker
mailing list