[Pacemaker] resource agent starting out-of-order
Tim Serong
tserong at novell.com
Thu Mar 3 15:45:06 CET 2011
On 3/3/2011 at 05:05 PM, AP <pacemaker at inml.weebeastie.net> wrote:
> Hi,
>
> Having deep issues with my cluster setup. Everything works ok until
> I add a VirtualDomain RA in. Then things go pearshaped in that it seems
> to ignore the "order" crm config for it and starts as soon as it can.
>
> The crm config is provided below. Basically p-vd_vg.test1 attempts to
> start despite p-libvirtd not being started and p-drbd_vg.test1 not
> being master (or slave for that matter - ie it's not configured at all).
>
> Eventually p-libvirtd and p-drbd_vg.test1 start and p-vd_vg.test1 attempts
> to, pengine on the node where p-vd_vg.test1 is already running complains
> with:
>
> Mar 3 16:49:16 breadnut pengine: [2097]: ERROR: native_create_actions:
> Resource p-vd_vg.test1 (ocf::VirtualDomain) is active on 2 nodes attempting
> recovery
> Mar 3 16:49:16 breadnut pengine: [2097]: WARN: See
> http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information.
>
> Then mass slaughter occurs and p-vd_vg.test1 is restarted where it was
> running previously whilst the other node gets an error for it.
>
> Essentially I cannot restart the 2nd node without it breaking the 1st.
>
> Now, as I understand it, a lone primitive will run once on any node - this
> is just fine by me.
>
> colo-vd_vg.test1 indicates that p-vd_vg.test1 should run where ms-drbd_vg.test1
> is master. ms-drbd_vg.test1 should only be master where clone-libvirtd is
> started.
>
> order-vg.test1 indicates that ms-drbd_vg.test1 should start after clone-lvm_gh
> is started (successfully). (This used to have a promote for ms-drbd_vg.test1
> but then ms-drbd_vg.test1 would be demoted and not stopped on shutdown which
> would cause clone-lvm_gh to error out on stop)
>
> order-vd_vg.test1 indicates p-vd_vg.test1 should only start where
> ms-drbd_vg.test1 and clone-libvirtd have both successfully started (the
> order of their starting being irrelevant).
>
> cli-standby-p-vd_vg.test1 was put there by my migrating p-vd_vg.test1
> about the place.
>
> This happens with or without fencing and with fencing configured as below
> or as just a single primited with both nodes in the hostlist.
>
> Help with this would be awesome and appreciated. I do not know what I am
> missing here. The config makes sense to me so I don't even know where
> to start poking and prodding. I be flailing.
>
> Config and s/w version list is below:
>
> OS: Debian Squeeze
> Kernel: 2.6.37.2
>
> PACKAGES:
>
> ii cluster-agents 1:1.0.4-0ubuntu1~custom1 The
> reusable cluster components for Linux HA
> ii cluster-glue 1.0.7-3ubuntu1~custom1 The
> reusable cluster components for Linux HA
> ii corosync 1.3.0-1ubuntu1~custom1
> Standards-based cluster framework (daemon and modules)
> ii libccs3 3.1.0-0ubuntu1~custom1 Red Hat
> cluster suite - cluster configuration libraries
> ii libcib1 1.1.5-0ubuntu1~ppa1~custom1 The
> Pacemaker libraries - CIB
> ii libcman3 3.1.0-0ubuntu1~custom1 Red Hat
> cluster suite - cluster manager libraries
> ii libcorosync4 1.3.0-1ubuntu1~custom1
> Standards-based cluster framework (libraries)
> ii libcrmcluster1 1.1.5-0ubuntu1~ppa1~custom1 The
> Pacemaker libraries - CRM
> ii libcrmcommon2 1.1.5-0ubuntu1~ppa1~custom1 The
> Pacemaker libraries - common CRM
> ii libfence4 3.1.0-0ubuntu1~custom1 Red Hat
> cluster suite - fence client library
> ii liblrm2 1.0.7-3ubuntu1~custom1 Reusable
> cluster libraries -- liblrm2
> ii libpe-rules2 1.1.5-0ubuntu1~ppa1~custom1 The
> Pacemaker libraries - rules for P-Engine
> ii libpe-status3 1.1.5-0ubuntu1~ppa1~custom1 The
> Pacemaker libraries - status for P-Engine
> ii libpengine3 1.1.5-0ubuntu1~ppa1~custom1 The
> Pacemaker libraries - P-Engine
> ii libpils2 1.0.7-3ubuntu1~custom1 Reusable
> cluster libraries -- libpils2
> ii libplumb2 1.0.7-3ubuntu1~custom1 Reusable
> cluster libraries -- libplumb2
> ii libplumbgpl2 1.0.7-3ubuntu1~custom1 Reusable
> cluster libraries -- libplumbgpl2
> ii libstonith1 1.0.7-3ubuntu1~custom1 Reusable
> cluster libraries -- libstonith1
> ii libstonithd1 1.1.5-0ubuntu1~ppa1~custom1 The
> Pacemaker libraries - stonith
> ii libtransitioner1 1.1.5-0ubuntu1~ppa1~custom1 The
> Pacemaker libraries - transitioner
> ii pacemaker 1.1.5-0ubuntu1~ppa1~custom1 HA
> cluster resource manager
>
> CONFIG:
>
> node breadnut
> node breadnut2 \
> attributes standby="off"
> primitive fencing-bn stonith:meatware \
> params hostlist="breadnut" \
> op start interval="0" timeout="60s" \
> op stop interval="0" timeout="70s" \
> op monitor interval="10" timeout="60s"
> primitive fencing-bn2 stonith:meatware \
> params hostlist="breadnut2" \
> op start interval="0" timeout="60s" \
> op stop interval="0" timeout="70s" \
> op monitor interval="10" timeout="60s"
> primitive p-drbd_vg.test1 ocf:linbit:drbd \
> params drbd_resource="vg.test1" \
> operations $id="ops-drbd_vg.test1" \
> op start interval="0" timeout="240s" \
> op stop interval="0" timeout="100s" \
> op monitor interval="20" role="Master" timeout="20s" \
> op monitor interval="30" role="Slave" timeout="20s"
> primitive p-libvirtd ocf:local:libvirtd \
> meta allow-migrate="off" \
> op start interval="0" timeout="200s" \
> op stop interval="0" timeout="100s" \
> op monitor interval="10" timeout="200s"
> primitive p-lvm_gh ocf:heartbeat:LVM \
> params volgrpname="gh" \
> meta allow-migrate="off" \
> op start interval="0" timeout="90s" \
> op stop interval="0" timeout="100s" \
> op monitor interval="10" timeout="100s"
> primitive p-vd_vg.test1 ocf:heartbeat:VirtualDomain \
> params config="/etc/libvirt/qemu/vg.test1.xml" \
> params migration_transport="tcp" \
> meta allow-migrate="true" is-managed="true" \
> op start interval="0" timeout="120s" \
> op stop interval="0" timeout="120s" \
> op migrate_to interval="0" timeout="120s" \
> op migrate_from interval="0" timeout="120s" \
> op monitor interval="10s" timeout="120s"
> ms ms-drbd_vg.test1 p-drbd_vg.test1 \
> meta resource-stickines="100" notify="true" master-max="2"
> target-role="Master"
> clone clone-libvirtd p-libvirtd \
> meta interleave="true"
> clone clone-lvm_gh p-lvm_gh \
> meta interleave="true"
> location cli-standby-p-vd_vg.test1 p-vd_vg.test1 \
> rule $id="cli-standby-rule-p-vd_vg.test1" -inf: #uname eq breadnut2
> location loc-fencing-bn fencing-bn -inf: breadnut
> location loc-fencing-bn2 fencing-bn2 -inf: breadnut2
> colocation colo-vd_vg.test1 inf: p-vd_vg.test1:Started ms-drbd_vg.test1:Master
> clone-libvirtd:Started
> order order-vd_vg.test1 inf: ( ms-drbd_vg.test1:start clone-libvirtd:start )
> p-vd_vg.test1:start
Does it behave any differently if you remove the above two constraints, and
replace them with something like the following (more verbose, but should do
what you want, assuming I didn't make any typos):
colocation vd-with-libvirt inf: p-vd_vg.test1 clone-libvirtd
order vd-after-libvirt inf: clone-libvirtd p-vd_vg.test1
colocation vd-with-drbd inf: p-vd_vg.test1:Started ms-drbd_vg.test1:Master
order vd-after-drbd inf: ms-drbd_vg.test1:promote p-vd_vg.test1:start
Note I've got MS:promote before VD:start, rather than MS:start.
I asssume/believe that the condensed syntax as you have it is meant to
work, but I've never quite managed to stick the nuances of that form into
my brain permanently.
Regards,
Tim
--
Tim Serong <tserong at novell.com>
Senior Clustering Engineer, OPS Engineering, Novell Inc.
More information about the Pacemaker
mailing list