[Pacemaker] Trouble with KVM Resource

Cliff Massey cliffm101 at cliffmassey.com
Tue Nov 1 00:53:17 EDT 2011


  I looked at /usr/lib/ocf/resource.d/heartbeat/VirtualDomain and also checked the permissions of the state file and it was the same as the working one. It was only empty on one node. 

I did an echo Convirt > /var/run/heartbeat/rsctmp/VirtualDomain-convirt-kvm.state on the empty node and restarted pacemaker and the resource came up. I restarted it a few times without manually editing the state file and it still started up.

 The issue was very strange indeed. Thanks for the speedy help.


On Nov 1, 2011, at 12:04 AM, Tim Serong wrote:

> On 11/01/2011 02:23 PM, Cliff Massey wrote:
>> 
>>  I am having a problem with my kvm resource. It was working until I decided to re-install the kvm machine. The libvirt xml file and the pacemaker configuration did not change. I can start the kvm outside of pacemaker just fine. When I check the libvirt log, it shows no attempt to start the kvm machine from pacemaker.
>> 
>> crm_mon -1 shows:
>> 
>> Online: [ admin01 admin02 ]
>> 
>>  convirt-kvm	(ocf::heartbeat:VirtualDomain):	Started admin01 (unmanaged) FAILED
>>  Master/Slave Set: ms-convirt [convirt-drbd]
>>      Masters: [ admin02 ]
>>      Slaves: [ admin01 ]
>>  sitescope-kvm	(ocf::heartbeat:VirtualDomain):	Started admin02After digging
>>  Master/Slave Set: ms-sitescope [sitescope-drbd]
>>      Masters: [ admin02 ]
>>      Slaves: [ admin01 ]
>> 
>> Failed actions:
>>     convirt-kvm_monitor_0 (node=admin01, call=2, rc=1, status=complete): unknown error
>>     convirt-kvm_stop_0 (node=admin01, call=6, rc=1, status=complete): unknown error
>> 
>> My other kvm machine with the same config works just fine.
> 
> I can't tell you why it doesn't work anymore, but...
> 
>> 
>> my logs are at:   http://pastebin.com/peFw5KKp
> 
> The relevant bit of that log is (pardon the formatting):
> 
> Nov  1 03:14:37 admin01 crmd: [15349]: info: te_rsc_command: Initiating action 4: monitor convirt-kvm_monitor_0 on admin01 (local)
> ...
> Nov  1 03:14:38 admin01 VirtualDomain[15370]: ERROR: /var/run/heartbeat/rsctmp/VirtualDomain-convirt-kvm.state is empty. This is unexpected. Cannot determine domain name.
> ...
> Nov  1 03:14:38 admin01 lrmd: [15346]: WARN: Managed convirt-kvm:monitor process 15370 exited with return code 1.
> ...
> Nov  1 03:14:38 admin01 crmd: [15349]: info: process_lrm_event: LRM operation convirt-kvm_monitor_0 (call=2, rc=1, cib-update=29, confirmed=true) unknown error
> 
> So the probe (and presumably subsequent stop) for that resource failed, hence no attempt to start it.  As for how the state file is empty, I'm not sure.  Look at VirtualDomain_Define() in /usr/lib/ocf/resource.d/heartbeat/VirtualDomain (line ~200 onwards), by my reading it shouldn't be possible for that state file to be empty. Unless, somehow (wild guess), permissions on the state file or some parent directory prohibit writing?
> 
> Regards,
> 
> Tim
> -- 
> Tim Serong
> Senior Clustering Engineer
> SUSE
> tserong at suse.com
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





More information about the Pacemaker mailing list