[Pacemaker] A caveat in the VirtualDomain resource agent
Cédric Dufour - Idiap Research Institute
cedric.dufour at idiap.ch
Fri Aug 22 14:56:35 UTC 2014
Hello,
On 22/08/14 15:32, Dejan Muhamedagic wrote:
> Hi,
>
> On Fri, Aug 22, 2014 at 10:23:29AM +0200, Cédric Dufour - Idiap Research Institute wrote:
>> Hello,
>>
>> Is this the right place to report this issue? (please redirect me if not)
> Yes. Though bugs/issues/fixes are nowadays mostly handled at
> github.com/ClusterLabs/resource-agents and reports there have
> certainly more visibility.
>
>> As we were experiencing/demonstrating our new cluster yesterday, we stumbled on a caveat in our LibvirtQemu resource agent (derived from VirtualDomain). Since the caveat is the same in the VirtualDomain resource agent; I thought I better report it. Please see the patch below (for LibvirtQemu), which comments should allow you to understand where the problem lies.
> Perhaps I missed something, but may I ask why did you decide to
> create a new RA instead of improving the existing one? Was there
> anything in VirtualDomain making it unsuitable for your use
> case?
Long story:
[1] http://oss.clusterlabs.org/pipermail/pacemaker/2014-August/022432.html
[2] http://oss.clusterlabs.org/pipermail/pacemaker/2014-August/022477.html
Shortly put:
[1] "I sized it [CIB] down from 444 to 277 resources by merging 'VirtualDomain' and 'MailTo' RA/primitives into a custom/single 'LibvirtQemu' one."
[2] "any error in the "MailTo" primitive would be considered "critical" [...] by Pacemaker, resulting in node fencing" and "I simplified the code by assuming a local qemu hypervisor [...]; I did so because I experienced strange delays when the "VirtualDomain" RA was running the "virsh ... uri" command for the sake of acquiring a sensible default value for the "hypervisor" parameter (which always resulted in "qemu:///system")."
The modifications I made are thus quite (?) specific (?) to my use case.
Also, I've been using "custom" RAs since Heartbeat V.1, then V.2 and now Pacemaker 1.1, in order to rely on RAs that are thoroughly tested in my setup rather than one that may change according to distro whims, in ways that may be incompatible with my setup (my experience with HA being: setup, test, test, test, test, freeze... don't touch anything!)
>
>> --- LibvirtQemu.orig 2014-08-22 09:39:21.997201000 +0200
>> +++ LibvirtQemu 2014-08-22 09:50:32.440969000 +0200
>> @@ -154,11 +154,10 @@
>> local virsh_output
>> local domain_name
>>
>> - # Note: passing in the domain name from outside the script is
>> - # intended for testing and debugging purposes only. Don't do this
>> - # in production, instead let the script figure out the domain name
>> - # from the config file. You have been warned.
>> - if [ -z "${DOMAIN_NAME}" ]; then
>> + # NOTE: Re-defining an already defined domain is dangerous! It shall be done only
>> + # if we can reasonably assume the configuration file hasn't changed since the last
>> + # time the domain has been defined.
>> + if [ -z "${DOMAIN_NAME}" ] || [ "${OCF_RESKEY_config}" -ot "${STATEFILE}" ]; then
>> # Spin until we have a domain name
>> while true; do
>> virsh_output="$(virsh ${VIRSH_OPTIONS} define ${OCF_RESKEY_config})"
>> @@ -170,7 +169,7 @@
>> echo "${domain_name}" > "${STATEFILE}"
>> ocf_log info "Domain name '${domain_name}' saved to state file '${STATEFILE}'."
>> else
>> - ocf_log warn "Domain name '${DOMAIN_NAME}' already defined; overriding configuration file '${OCF_RESKEY_config}' (this should NOT ne done in production!)."
>> + ocf_log warn "Domain name '${DOMAIN_NAME}' already defined; overriding by newer configuration file will NOT be done!"
>> fi
>> }
> Under which circumstances did you run into these issues?
1. Stop the resource
2. Undefine the corresponding libvirtd domain from all nodes (without deleting the state file)
3. Start the resource
As I said, this is an edge case (which I stumbled on as I was demonstrating the cluster; I most likely would never have reasons to execute 2. and 3. otherwise... but one never knows)
> There were some recent additions which enable saving the changes
> back to the configuration file. Would that help?
I just had a look at https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/VirtualDomain
I see VirtualDomain has evolved quite a lot compared to the Debian/Wheezy one on which I based my custom RA. You now rely on 'virsh undefine' and 'virsh create' rather than 'virsh define' and 'virsh start' to manage/start VMs. From what I quickly gathered, the latest VirtualDomain should be immune to the issue/circumstances at hand (the changes introduced about 'save_config_on_stop' are thus irrelevant).
PS: Being under pressure of a deadline, I had no time to investigate why the 'virsh uri' command would take 20 seconds to complete (as mentioned in my reply above). I see this would also be a non-issue in the latest VirtualDomain, provide the "hypervisor" parameter is set by the user.
So, apparently, all problems I experimented are SOLVED in the latest VirtualDomain.
Thanks for your comments, which pointed me in the right direction.
Unfortunately, the VirtualDomain distributed by Debian/Wheezy (current stable) is prone to the issue/circumstances at hand.
But that is another story (not yours to deal with)... :-)
Cédric
>
> Cheers,
>
> Dejan
>
>> @@ -205,12 +204,12 @@
>> ;;
>> ''|'no state')
>> # Empty string may be returned when virsh does not
>> - # receive a reply from libvirtd.
>> + # receive a reply from libvirtd or after the domain has
>> + # been undefined.
>> # "no state" may occur when the domain is currently
>> # being migrated (on the migration target only), or
>> # whenever virsh can't reliably obtain the domain
>> # state.
>> - status='no state'
>> if [ "${__OCF_ACTION}" == 'stop' ] && [ ${try} -ge 3 ]; then
>> # During the stop operation, we want to bail out
>> # quickly, so as to be able to force-stop (destroy)
>> @@ -224,6 +223,17 @@
>> ocf_log info "Domain '${DOMAIN_NAME}' currently has no state; retrying."
>> sleep 1
>> fi
>> + if [ "${status}" == '' ] && [ $(( ${try} % 10 )) -eq 0 ]; then
>> + # Could it be that libvirtd is running healthily but the domain
>> + # has been undefined? In that case, let's attempt to re-define it.
>> + # If libvirtd IS running, it can not hurt (given the safeguards in
>> + # LibvirtQemu_Define). If libvirtd is NOT running, then something is
>> + # definitely wrong (and the monitor operation will time-out in
>> + # LibvirtQemu_Define the same way as it would here).
>> + ocf_log warn "Has domain '${DOMAIN_NAME}' been undefined? attempting to re-define it."
>> + LibvirtQemu_Define
>> + fi
>> + status='no state'
>> ;;
>> *)
>> # any other output is unexpected.
>> @@ -487,6 +497,11 @@
>>
>> # Define the domain on startup, and re-define whenever someone deleted
>> # the state file, or touched the config.
>> +# WARNING: There is a caveat here! When the resource is stopped, the state file
>> +# is deleted ONLY on the node where it was running. In case the domain is then
>> +# undefined (from libvirtd), on all nodes, we will end-up with a state file but no
>> +# domain definition on those nodes that were not running the resource. The monitor
>> +# operation MUST handle that situation, should the resource be restarted.
>> if [ ! -e "${STATEFILE}" ] || [ "${OCF_RESKEY_config}" -nt "${STATEFILE}" ]; then
>> LibvirtQemu_Define
>> fi
>>
>> One could ask "why undefine a libvirt domain and then restart it?". The answer is two-fold: 1. experience showed us that we shall undefine a decommissioned domain from libvirt to prevent potential UUID conflict when defining a new domain (which is likely in our setup, since UUID are build from the domain IP address); 2. the "demo-effect" (or potential legitimate reasons), where one would "decommission" a domain and restart it right afterwards ( :-/ ).
>>
>> PS: we now also make sure to delete the VirtualDomain/LibvirtQemu state file when undefining the domain. But best have multiple safe guards as far as this caveat is concerned (thus the patch above).
>>
>> Hope it helps,
>>
>> Cédric
>>
>> --
>>
>> Cédric Dufour @ Idiap Research Institute
>>
>
More information about the Pacemaker
mailing list