[Pacemaker] chicken-egg-problem with libvirtd and a VM within cluster
Andrew Beekhof
andrew at beekhof.net
Fri Oct 12 03:18:40 CEST 2012
This has been a topic that has popped up occasionally over the years.
Unfortunately we still don't have a good answer for you.
The "least worst" practice has been to have the RA return OCF_STOPPED
for non-recurring monitor operations (aka. startup probes) IFF its
pre-requistites (ie. binaries, or things that might be on a cluster
file system) are not available.
Possibly we need to begin using the ordering constraints (normally
used for ordering start operations) for the startup probes too.
Ie. order(A, B) ==> A.start before B.(monitor_0, start)
I had been resisting that move, but perhaps its time.
(It would also help avoid slamming the cluster with a bazillion
operations in parallel when several nodes start up together)
Lars? Florian? Comments?
On Fri, Oct 12, 2012 at 3:09 AM, Tom Fernandes <anyaddress at gmx.net> wrote:
> Hi all,
>
> I have a 2-node-cluster running DRBD, libvirtd and a virtual machine.
>
> I observed that when I stop and start corosync on one of the nodes, pacemaker
> (when starting corosync again) wants to check the status of the vm before
> starting libvirtd. This check fails as libvirtd needs to be running for this
> check. After trying for 20s libvirtd starts. The vm gets restarted after those
> 20s and then runs on one of the nodes. I am left with a monitoring-error to
> cleanup and my vm has rebooted.
>
> One solution seems to be to run libvirtd outside the cluster, being managed by
> the OS.
>
> I followed the ha-kvm.pdf guide and other peoples advise with my setup and
> wonder if either the guide is wrong / untested or if I'm missing something?
>
> This was also discussed with some of the folks on #linux-ha a couple of hours
> back.
>
>
> warm regards,
>
>
> Tom
>
>
>
> node pcmk-1 \
> attributes standby="off"
> node pcmk-2 \
> attributes standby="off"
> primitive vm1 ocf:heartbeat:VirtualDomain \
> params config="/etc/libvirt/qemu/vm1.xml" \
> meta allow-migrate="false" target-role="Started" \
> op monitor interval="60" timeout="30" \
> op start interval="0" timeout="90" \
> op stop interval="0" timeout="120"
> primitive drbd_vm1 ocf:linbit:drbd \
> params drbd_resource="vm1"
> primitive libvirtd lsb:libvirt-bin
> ms ms-drbd_vm1 drbd_vm1 \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-
> max="1" notify="true" target-role="Started"
> clone cl-libvirtd libvirtd \
> meta interleave="true" clone-max="2"
> colocation vm1_on_drbd inf: vm1 ms-drbd_vm1:Master
> order cl-libvirtd_before_vm1 inf: cl-libvirtd:start vm1:start
> order drbd_before_vm1 inf: ms-drbd_vm1:promote vm1:start
> property $id="cib-bootstrap-options" \
> dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1349962834"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100"
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list