[Pacemaker] How to use monitor action in VIrtualDomain resource agent

Andreas Kurz andreas at hastexo.com
Fri Dec 9 13:31:30 UTC 2011


Hello Fil,

On 12/07/2011 07:41 AM, Fil wrote:
> Hi Andreas,
> 
> bellow is the grep you requested. Also while looking into this problem I
> came up with some interesting issues with VirtualDomain resource agent.
> Since my /etc/libvirt/qemu directory is an NFS share VirtualDomain
> sometimes complains it can't read the /etc/libvirt/qemu/test.xml file.

One interesting detail, because as you already saw: reading the config
file is an essential test ... and also the only situation you will see
such an error with VirtualDomain RA.

Already tried to put the config files into a local directory? Do the
errors still occur? ... Next question would be how you mounted your NFS
share and why you encounter sporadic read timeouts on the config file?

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now


> This is a bit puzzling. Looking at the test logic inside VirtualDomain
> file I ran into this code:
> 
> 
>     # check if we can read the config file (otherwise we're unable to
>     # deduce $DOMAIN_NAME from it, see below)
>     if [ ! -r $OCF_RESKEY_config ]; then
>         if ocf_is_probe; then
>             ocf_log info "Configuration file $OCF_RESKEY_config not
> readable during probe."
>         else
>             ocf_log error "Configuration file $OCF_RESKEY_config does
> not exist or is not readable."
>             return $OCF_ERR_INSTALLED
>         fi
>     fi
> 
> the problem here is that -r operator returns true if $OCF_RESKEY_config
> is a regular file or a directory. Shouldn't this be a -f check followed
> by the -r check?
> 
> thanks
> fil
> 
> Dec 07 01:25:53 server01.adriaticsolutions.com pengine: [5297]: info:
> native_print: vm_test	(ocf::adriatic:VirtualDomain):	Started
> server01.adriaticsolutions.com
> Dec 07 01:25:53 server01.adriaticsolutions.com lrmd: [5295]: info:
> cancel_op: operation monitor[10] on ocf::VirtualDomain::vm_test for
> client 5298, its parameters: CRM_meta_timeout=[30000] depth=[0]
> CRM_meta_name=[monitor] crm_feature_set=[3.0.5]
> config=[/etc/libvirt/qemu/test.xml] CRM_meta_interval=[10000]
> hypervisor=[qemu:///system] CRM_meta_depth=[0] migration_transport=[tcp]
>  cancelled
> Dec 07 01:25:53 server01.adriaticsolutions.com lrmd: [5295]: debug:
> on_msg_perform_op: add an operation operation migrate_to[11] on
> ocf::VirtualDomain::vm_test for client 5298, its parameters:
> CRM_meta_timeout=[120000] CRM_meta_name=[migrate_to]
> crm_feature_set=[3.0.5] config=[/etc/libvirt/qemu/test.xml]
> CRM_meta_migrate_source=[server01.adriaticsolutions.com]
> CRM_meta_migrate_target=[server02.adriaticsolutions.com]
> hypervisor=[qemu:///system] migration_transport=[tcp]  to the operation
> list.
> Dec 07 01:25:57 server01.adriaticsolutions.com lrmd: [5295]: debug:
> on_msg_perform_op: add an operation operation stop[12] on
> ocf::VirtualDomain::vm_test for client 5298, its parameters:
> crm_feature_set=[3.0.5]  to the operation list.
> Dec 07 01:25:58 server01.adriaticsolutions.com pengine: [5297]: info:
> native_print: vm_test	(ocf::adriatic:VirtualDomain):	Started
> server02.adriaticsolutions.com FAILED
> Dec 07 01:26:10 server01.adriaticsolutions.com lrmd: [5295]: debug:
> on_msg_perform_op: add an operation operation start[13] on
> ocf::VirtualDomain::vm_test for client 5298, its parameters:
> crm_feature_set=[3.0.5] CRM_meta_name=[start]
> config=[/etc/libvirt/qemu/test.xml] migration_transport=[tcp]
> CRM_meta_timeout=[120000] hypervisor=[qemu:///system]  to the operation
> list.
> Dec 07 01:26:11 server01.adriaticsolutions.com lrmd: [5295]: debug:
> on_msg_perform_op: add an operation operation monitor[14] on
> ocf::VirtualDomain::vm_test for client 5298, its parameters:
> CRM_meta_timeout=[30000] depth=[0] CRM_meta_name=[monitor]
> crm_feature_set=[3.0.5] config=[/etc/libvirt/qemu/test.xml]
> CRM_meta_interval=[10000] hypervisor=[qemu:///system] CRM_meta_depth=[0]
> migration_transport=[tcp]  to the operation list.
> 
> 
> 
> Dec  7 01:25:53 server01 pengine: [5297]: info: native_print:
> vm_test#011(ocf::adriatic:VirtualDomain):#011Started
> server01.adriaticsolutions.com
> Dec  7 01:25:53 server01 lrmd: [5295]: info: cancel_op: operation
> monitor[10] on ocf::VirtualDomain::vm_test for client 5298, its
> parameters: CRM_meta_timeout=[30000] depth=[0] CRM_meta_name=[monitor]
> crm_feature_set=[3.0.5] config=[/etc/libvirt/qemu/test.xml]
> CRM_meta_interval=[10000] hypervisor=[qemu:///system] CRM_meta_depth=[0]
> migration_transport=[tcp]  cancelled
> Dec  7 01:25:53 server01 VirtualDomain[8680]: INFO: test: Starting live
> migration to server02.adriaticsolutions.com (using remote hypervisor URI
> qemu+tcp://server02.adriaticsolutions.com/system ).
> Dec  7 01:25:57 server01 VirtualDomain[8680]: INFO: test: live migration
> to server02.adriaticsolutions.com succeeded.
> Dec  7 01:25:57 server01 VirtualDomain[8725]: INFO: Domain name "test"
> saved to /var/run/heartbeat/rsctmp/VirtualDomain-vm_test.state.
> Dec  7 01:25:58 server01 VirtualDomain[8725]: INFO: Domain test already
> stopped.
> Dec  7 01:25:58 server01 pengine: [5297]: info: native_print:
> vm_test#011(ocf::adriatic:VirtualDomain):#011Started
> server02.adriaticsolutions.com FAILED
> 
> 
> 
> On 12/06/2011 07:56 PM, Andreas Kurz wrote:
>> Hello,
>>
>> On 12/05/2011 05:27 AM, Fil wrote:
>>> Hi,
>>>
>>> I have a 2 node cluster (corosync 1.4.2 pacemaker 1.1.6). I need to
>>> control couple of virtual machines in this cluster and be able to live
>>> migrate them between nodes. Up until now all my tests worked, but as
>>> soon as I started using monitor action of VirtualDomain my virtual
>>> machines are failing to migrate and sometimes they don't even start
>>> cleanly. Every time I need to manually cleanup the resource group and
>>> then it seems it seems to work. Could you please explain if I need
>>> monitor action and how do I make it work.
>>>
>>> thanks
>>> fil
>>>
>>> Here are the error messages I get:
>>>
>>>     vm_test_monitor_10000 (node=server02.adriaticsolutions.com, call=46,
>>> rc=5, status=complete): not installed
>>>     vm_test_start_0 (node=server01.adriaticsolutions.com, call=52, rc=5,
>>> status=complete): not installed
>>
>> Any reslust when doing a grep for "VirtualDomain"? Would be interesting
>> what the resource agents is telling us ...
>>
>> Regards,
>> Andreas
>>
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 286 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111209/71ba66d4/attachment-0004.sig>


More information about the Pacemaker mailing list