[Pacemaker] Problems with Pacemaker 1.1.8 on F17

David Vossel dvossel at redhat.com
Mon Feb 25 22:23:41 UTC 2013


----- Original Message -----
> From: "Lars Kellogg-Stedman" <lars at oddbit.com>
> To: "Andrew Beekhof" <andrew at beekhof.net>
> Cc: pacemaker at clusterlabs.org, "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> Sent: Thursday, February 21, 2013 11:11:11 PM
> Subject: Re: [Pacemaker] Problems with Pacemaker 1.1.8 on F17
> > You'd think that would help, but
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=880035 suggests
> > otherwise.
> > I have one remaining fedora machine where KVM clusters still work,
> > I
> > don't think I'll ever update it now.
> > 
> 
> 
> Well, that was fascinating read.
>
>
> Using the udpu transport seems to have stabilized corosync. If I
> understand that bug report correctly I should also see better
> multicast behavior if I enable the multicast_querier

No, nothing fixes the bridged device multicast bug that I am aware of.  I have tried everything.  It is one of the most frustrating scenarios I've ever encountered.  It appears to be fixed when you start messing around with settings (multicast_querier), only to fall apart in a different way later on (after you think everything is working again)  I have wasted days, probably an entire week trying to get multicast working in my virtual machines again. Eventually I had to downgrade to something rhel 6.3 based. I'm a little bitter about the whole thing at this point.

-- Vossel


>, but I'm happy
> with udpu for now. This lets me focus on the other things that are
> acting oddly:
> 
> Trying to add a monitor to a systemd: resource, like this:
> 
> 
> pcs resource create httpd systemd:httpd op monitor interval=30s
> 
> 
> Which generates this in the cib:
> 
> 
> -- <cib admin_epoch="0" epoch="7" num_updates="25" /> ++ <primitive
> class="systemd" id="httpd" type="httpd" > ++ <instance_attributes
> id="httpd-instance_attributes" /> ++ <operations > ++ <op
> id="httpd-monitor-interval-30s" interval="30s" name="monitor" /> ++
> </operations> ++ </primitive>
> Results in the service never successfully starting:
> 
> 
> notice: process_lrm_event: LRM operation httpd_monitor_0 (call=10,
> rc=7, cib-update=30, confirmed=true) not running notice:
> process_lrm_event: LRM operation httpd_start_0 (call=13, rc=0,
> cib-update=31, confirmed=true) ok notice: process_lrm_event: LRM
> operation httpd_monitor_30000 (call=16, rc=7, cib-update=32,
> confirmed=false) not running warning: status_from_rc: Action 11
> (httpd_monitor_30000) on puppet0 failed (target: 0 vs. rc: 7): Error
> warning: update_failcount: Updating failcount for httpd on puppet0
> after failed monitor: rc=7 (update=value++, time=1361503742) notice:
> run_graph: Transition 2 (Complete=7, Pending=0, Fired=0, Skipped=0,
> Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2278.bz2):
> Complete notice: attrd_trigger_update: Sending flush op to all hosts
> for: fail-count-httpd (1) notice: attrd_perform_update: Sent update
> 11: fail-count-httpd=1 notice: attrd_trigger_update: Sending flush
> op to all hosts for: last-failure-httpd (1361503742) notice:
> attrd_perform_update: Sent update 14: last-failure-httpd=1361503742
> warning: unpack_rsc_op: Processing failed op monitor for httpd on
> puppet0: not running (7) notice: LogActions: Recover
> httpd#011(Started puppet0) notice: process_pe_message: Calculated
> Transition 3: /var/lib/pacemaker/pengine/pe-input-2279.bz2 warning:
> unpack_rsc_op: Processing failed op monitor for httpd on puppet0:
> not running (7) notice: LogActions: Recover httpd#011(Started
> puppet0) notice: process_pe_message: Calculated Transition 4:
> /var/lib/pacemaker/pengine/pe-input-2280.bz2 warning: unpack_rsc_op:
> Processing failed op monitor for httpd on puppet0: not running (7)
> 
> 
> This will continue until pacemaker declares the service FAILED, even
> though httpd (in this example) starts up manually (with "systemctl
> start httpd") without a problem. For what it's worth, the dbus
> method call to get the ActiveState property appears to work:
> 
> 
> # systemctl start httpd
> # gdbus call --system --dest org.freedesktop.systemd1 --object-path
> /org/freedesktop/systemd1/unit/httpd_2eservice -m
> org.freedesktop.DBus.Properties.Get org.freedesktop.systemd1.Unit
> ActiveState
> (<'active'>,)
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 




More information about the Pacemaker mailing list