[Pacemaker] reboot of non-vm host results in VM restart -- of chickens and eggs and VMs

Thu Dec 19 15:30:29 EST 2013

remove the libvirtd from pacemaker and chkconfig libvirtd on every node,
like that the cluster just manage the vm, maybe i wrong but i don't see any
reason for put libvirtd as primitivi in pacemaker


2013/12/19 Bob Haxo <bhaxo at sgi.com>

>  Hi Emmanuel,
>
> Thanks for the suggestions. It is pretty clear what is the problem; it's
> just not clear what is the fix or the work-around.
>
> Search the Pacemaker email archive for the email of Andrew Beekhof, 12 Oct
> 2012, "Re: [Pacemaker] chicken-egg-problem with libvirtd and a VM within
> cluster", and the email to which he is responding (from Tom Fernandes).
>
> The status/monitor function of VirtualDomain fails because the
> /var/run/libvirt/libvirt-sock has not been created.  This socket is
> created by the lsb:libvirtd, but that is not started (as a resource) until
> Pacemaker has heard back from heartbeat:VirtualDomain, which will never
> happen until /var/run/libvirt/libvirt-sock has been created ("service
> libvirtd start" during this wait period does enable Pacemaker to continue
> starting resources).  After the VirtualDomain monitor function timeout,
> Pacemaker deals with the failing logic loop, resulting in a re-start of the
> VM.
>
> I hoping that "Unfortunately we still don't have a good answer for you."
> is no longer the case, and that there is a fix or that there is a community
> accepted workaround for the issue.
>
>
> Regards,
> Bob Haxo
>
>
>
>
>
>
> On Thu, 2013-12-19 at 19:48 +0100, emmanuel segura wrote:
>
> Maybe the problem is this, the cluster try to start the vm and libvirtd
> isn't started
>
>
>
>  2013/12/19 emmanuel segura <emi2fast at gmail.com>
>
>  if don't set your vm to start at boot time, you don't to put in cluster
> libvirtd, maybe the problem isn't this, but why put the os services in
> cluster, for example crond ...... :)
>
>
>
>   2013/12/19 Bob Haxo <bhaxo at sgi.com>
>
>   Hello,
>
> Earlier emails related to this topic:
> [pacemaker] chicken-egg-problem with libvirtd and a VM within cluster
> [pacemaker] VirtualDomain problem after reboot of one node
>
>
> My configuration:
>
> RHEL6.5/CMAN/gfs2/Pacemaker/crmsh
>
> pacemaker-libs-1.1.10-14.el6_5.1.x86_64
> pacemaker-cli-1.1.10-14.el6_5.1.x86_64
> pacemaker-1.1.10-14.el6_5.1.x86_64
> pacemaker-cluster-libs-1.1.10-14.el6_5.1.x86_64
>
> Two node HA VM cluster using real shared drive, not drbd.
>
> Resources (relevant to this discussion):
> primitive p_fs_images ocf:heartbeat:Filesystem \
> primitive p_libvirtd lsb:libvirtd \
> primitive virt ocf:heartbeat:VirtualDomain \
>
> services chkconfig on: cman, clvmd, pacemaker
> services chkconfig off: corosync, gfs2, libvirtd
>
> Observation:
>
> Rebooting the NON-host system results in the restart of the VM merrily
> running on the host system.
>
> Apparent cause:
>
> Upon startup, Pacemaker apparently checks the status of configured
> resources. However, the status request for the virt
> (ocf:heartbeat:VirtualDomain) resource fails with:
>
> Dec 18 12:19:30 [4147] mici-admin2       lrmd:  warning: child_timeout_callback:        virt_monitor_0 process (PID 4158) timed outDec 18 12:19:30 [4147] mici-admin2       lrmd:  warning: operation_finished:    virt_monitor_0:4158 - timed out after 200000msDec 18 12:19:30 [4147] mici-admin2       lrmd:   notice: operation_finished:    virt_monitor_0:4158:stderr [ error: Failed to reconnect to the hypervisor ]Dec 18 12:19:30 [4147] mici-admin2       lrmd:   notice: operation_finished:    virt_monitor_0:4158:stderr [ error: no valid connection ]Dec 18 12:19:30 [4147] mici-admin2       lrmd:   notice: operation_finished:    virt_monitor_0:4158:stderr [ error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory ]
>
> This failure then snowballs into an "orphan" situation in which the
> running VM is restarted.
>
> There was the suggestion of chkconfig on libvirtd (and presumably deleting
> the resource) so that the /var/run/libvirt/libvirt-sock has been created by
> service libvirtd. With libvirtd started by the system, there is no
> un-needed reboot of the VM.
>
> However, it may be that removing libvirtd from Pacemaker control leaves
> the VM vdisk filesystem susceptible to corruption during a reboot induced
> failover.
>
> Question:
>
> Is there an accepted Pacemaker configuration such that the un-needed
> restart of the VM does not occur with the reboot of the non-host system?
>
> Regards,
> Bob Haxo
>
>
>
>
>
>
>
>
>
>    _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
>  _______________________________________________Pacemaker mailing list: Pacemaker at oss.clusterlabs.orghttp://oss.clusterlabs.org/mailman/listinfo/pacemaker
> Project Home: http://www.clusterlabs.orgGetting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdfBugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131219/7478f769/attachment-0003.html>