[Pacemaker] Trouble setting up IP failover with ping resource
Dejan Muhamedagic
dejanmm at fastmail.fm
Fri Feb 17 11:26:44 UTC 2012
Hi,
On Thu, Feb 16, 2012 at 07:57:14PM -0800, Anlu Wang wrote:
> I have three machines named anlutest1, anlutest2, and anlutest3 that I'm
> trying to get IP failover working on. I'm using heartbeat for the messaging
> layer, and everything works great when a machine goes down. But I also
> would like to failover an IP when EITHER the eth0 or eth1 network
> interfaces fail. From reading
>
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch09s03s03.html
>
> it seems the right way to do this is to add a ping resource.
>
> Here is my XML configuration:
>
> http://pastebin.com/05z7eB2s
The configuration seems OK, though obviously monitors are
scheduled back-to-back (the postponed operations messages below).
I guess that you should increase the intervals or reduce the
dampen period. Which version of Pacemaker do you run? Perhaps
also take a look at this thread:
http://oss.clusterlabs.org/pipermail/pacemaker/2011-April/009942.html
Thanks,
Dejan
> This config doesn't work for me. Using the showscores.sh script found at:
>
> http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg00410.html
>
> I see that my scores are:
>
> Resource Score Node Stickiness #Fail
> Migration-Threshold
> address01 0 anlutest3 0 0
>
> address01 1006 anlutest1 0 5
>
> address01 50 anlutest2 0 157
>
> address02 0 anlutest3 0 0
>
> address02 1050 anlutest2 0 2
>
> address02 6 anlutest1 0 0
>
> address03 1000 anlutest3 0 7
>
> address03 50 anlutest2 0
>
> address03 6 anlutest1 0 0
>
> ping:0 0 anlutest1 0 6
>
> ping:0 0 anlutest2 0 14
>
> ping:0 0 anlutest3 0 0
>
> ping:1 0 anlutest2 0
>
> ping:1 0 anlutest3 0 28
>
> ping:1 -1000000 anlutest1 0 0
>
> ping:2 0 anlutest3 0 13
>
> ping:2 -1000000 anlutest1 0 0
>
> ping:2 -1000000 anlutest2 0
>
> which make no sense at all. I don't see how I could be getting these scores
> of 50 and 1006. When I take down an interface on anlutest3, I see scores of
> 4 and 1004, which sort of make sense, just the multiplier of 100 isn't
> working. I was experimenting with changing values, so maybe its caching old
> values. If so, how do I enforce the new values?
>
> Furthermore, shouldn't there be no scores of 0? If all 6 IPs I am pinging
> return successfully, shouldn't my scores be either 600 or 1600?
>
> In my syslog I also see a ton of messages like
>
> Feb 17 03:54:47 anlutest2 lrmd: [1137]: info: perform_op:2877: operations
> on resource address01 already delayed
> Feb 17 03:54:48 anlutest2 lrmd: [1137]: info: perform_op:2873: operation
> monitor[419] on ocf::ping::ping:1 for client 1140, its parameters:
> CRM_meta_clone=[1] host_list=[10.54.130.6 10.54.130.8 10.54.130.7
> 50.97.196.101 50.97.196.103 50.9CRM_meta_clone_max=[3] dampen=[60s]
> crm_feature_set=[3.0.1] CRM_meta_globally_unique=[false] multiplier=[10000]
> CRM_meta_name=[monitor] CRM_meta_timeout=[60000] CRM_meta_interval=[5000]
> for rsc is already running.
> Feb 17 03:54:48 anlutest2 lrmd: [1137]: info: perform_op:2883: postponing
> all ops on resource ping:1 by 1000 ms
> Feb 17 03:54:48 anlutest2 lrmd: [1137]: info: perform_op:2873: operation
> monitor[171] on ocf::ping::ping:2 for client 1140, its parameters:
> CRM_meta_clone=[2] host_list=[10.54.130.6 10.54.130.8 10.54.130.7
> 50.97.196.101 50.97.196.103 50.9CRM_meta_clone_max=[3] dampen=[60s]
> crm_feature_set=[3.0.1] CRM_meta_globally_unique=[false] multiplier=[1]
> CRM_meta_name=[monitor] CRM_meta_timeout=[30000] CRM_meta_interval=[5000]
> for rsc is already running.
> Feb 17 03:54:48 anlutest2 lrmd: [1137]: info: perform_op:2883: postponing
> all ops on resource ping:2 by 1000 ms
>
> and occasionally
>
> Feb 17 03:54:33 anlutest2 attrd: [1139]: info: attrd_trigger_update:
> Sending flush op to all hosts for: pingd (4000)
> Feb 17 03:54:33 anlutest2 attrd: [1139]: info: attrd_ha_callback: flush
> message from anlutest2
> Feb 17 03:54:33 anlutest2 attrd: [1139]: WARN: find_nvpair_attr: Multiple
> attributes match name=pingd
> Feb 17 03:54:33 anlutest2 attrd: [1139]: info: find_nvpair_attr: Value:
> 50 #011(id=status-d619a94e-ebba-4ed0-8e0f-89837dd7506b-pingd)
> Feb 17 03:54:33 anlutest2 attrd: [1139]: info: find_nvpair_attr: Value: 3
> #011(id=status-ab3c1a25-9471-48f7-9c0b-c76238abd402-pingd)
> Feb 17 03:54:33 anlutest2 attrd: [1139]: info: attrd_perform_update: Sent
> update -40: pingd=4000
> Feb 17 03:54:33 anlutest2 attrd: [1139]: ERROR: attrd_cib_callback: Update
> -40 for pingd=4000 failed: Required data for this CIB API call not found
>
> Could someone just take a look at my config and let me know what I'm doing
> wrong? Or if there's a better way to do what I want to do...
>
> Thanks in advance,
> Anlu
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list