[Pacemaker] Why Did Pacemaker Restart this VirtualDomain Resource?

Tue Jun 19 12:12:46 EDT 2012

Hi Emmanuel, 

Thanks for the idea. I looked through the rest of the log and these "return code 8" errors on the ocf:linbit:drbd resources are occurring at other intervals (e.g. today) when the VirtualDomain resource is unaffected. This seems to indicate that these soft errors do not trigger a restart of the VirtualDomain resource. Is there anything else in the log that could indicate what caused this, or is there somewhere else I can look? 

Thanks, 

Andrew 

----- Original Message -----

From: "emmanuel segura" < emi2fast at gmail.com > 
To: "The Pacemaker cluster resource manager" < pacemaker at oss.clusterlabs.org > 
Sent: Tuesday, June 19, 2012 9:57:19 AM 
Subject: Re: [Pacemaker] Why Did Pacemaker Restart this VirtualDomain Resource? 

I didn't see any error in your config, the only thing i seen it's this 
========================================================== 
Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: rsc:p_drbd_vmstore:0 
monitor[55] (pid 12323) 
Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: rsc:p_drbd_mount2:0 monitor[53] 
(pid 12324) 
Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: operation monitor[55] on 
p_drbd_vmstore:0 for client 3856: pid 12323 exited with return code 8 
Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: operation monitor[53] on 
p_drbd_mount2:0 for client 3856: pid 12324 exited with return code 8 
Jun 14 15:35:31 vmhost1 lrmd: [3853]: info: rsc:p_drbd_mount1:0 monitor[54] 
(pid 12396) 
========================================================= 
it can be a drbd problem, but i tell you the true i'm not sure 

====================================================== 
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html 
========================================================= 

2012/6/19 Andrew Martin < amartin at xes-inc.com > 

> Hello, 
> 
> I have a 3 node Pacemaker+Heartbeat cluster (two real nodes and one 
> "standby" quorum node) with Ubuntu 10.04 LTS on the nodes and using the 
> Pacemaker+Heartbeat packages from the Ubuntu HA Team PPA ( 
> https://launchpad.net/~ubuntu-ha-maintainers/+archive/ppa < https://launchpad.net/%7Eubuntu-ha-maintainers/+archive/ppa >). 
> I have configured 3 DRBD resources, a filesystem mount, and a KVM-based 
> virtual machine (using the VirtualDomain resource). I have constraints in 
> place so that the DRBD devices must become primary and the filesystem must 
> be mounted before the VM can start: 
> node $id="1ab0690c-5aa0-4d9c-ae4e-b662e0ca54e5" vmhost1 
> node $id="219e9bf6-ea99-41f4-895f-4c2c5c78484a" quorumnode \ 
> attributes standby="on" 
> node $id="645e09b4-aee5-4cec-a241-8bd4e03a78c3" vmhost2 
> primitive p_drbd_mount2 ocf:linbit:drbd \ 
> params drbd_resource="mount2" \ 
> op start interval="0" timeout="240" \ 
> op stop interval="0" timeout="100" \ 
> op monitor interval="10" role="Master" timeout="30" \ 
> op monitor interval="20" role="Slave" timeout="30" 
> primitive p_drbd_mount1 ocf:linbit:drbd \ 
> params drbd_resource="mount1" \ 
> op start interval="0" timeout="240" \ 
> op stop interval="0" timeout="100" \ 
> op monitor interval="10" role="Master" timeout="30" \ 
> op monitor interval="20" role="Slave" timeout="30" 
> primitive p_drbd_vmstore ocf:linbit:drbd \ 
> params drbd_resource="vmstore" \ 
> op start interval="0" timeout="240" \ 
> op stop interval="0" timeout="100" \ 
> op monitor interval="10" role="Master" timeout="30" \ 
> op monitor interval="20" role="Slave" timeout="30" 
> primitive p_fs_vmstore ocf:heartbeat:Filesystem \ 
> params device="/dev/drbd0" directory="/mnt/storage/vmstore" 
> fstype="ext4" \ 
> op start interval="0" timeout="60" \ 
> op stop interval="0" timeout="60" \ 
> op monitor interval="20" timeout="40" 
> primitive p_ping ocf:pacemaker:ping \ 
> params name="p_ping" host_list="192.168.1.25 192.168.1.26" 
> multiplier="1000" \ 
> op start interval="0" timeout="60" \ 
> op monitor interval="20s" timeout="60" 
> primitive p_sysadmin_notify ocf:heartbeat:MailTo \ 
> params email=" alert at example.com " \ 
> params subject="Pacemaker Change" \ 
> op start interval="0" timeout="10" \ 
> op stop interval="0" timeout="10" \ 
> op monitor interval="10" timeout="10" 
> primitive p_vm_myvm ocf:heartbeat:VirtualDomain \ 
> params config="/mnt/storage/vmstore/config/myvm.xml" \ 
> meta allow-migrate="false" target-role="Started" is-managed="true" 
> \ 
> op start interval="0" timeout="180" \ 
> op stop interval="0" timeout="180" \ 
> op monitor interval="10" timeout="30" 
> primitive stonithquorumnode stonith:external/webpowerswitch \ 
> params wps_ipaddr="192.168.3.100" wps_port="x" wps_username="xxx" 
> wps_password="xxx" hostname_to_stonith="quorumnode" 
> primitive stonithvmhost1 stonith:external/webpowerswitch \ 
> params wps_ipaddr="192.168.3.100" wps_port="x" wps_username="xxx" 
> wps_password="xxx" hostname_to_stonith="vmhost1" 
> primitive stonithvmhost2 stonith:external/webpowerswitch \ 
> params wps_ipaddr="192.168.3.100" wps_port="x" wps_username="xxx" 
> wps_password="xxx" hostname_to_stonith="vmhost2" 
> group g_vm p_fs_vmstore p_vm_myvm 
> ms ms_drbd_mount2 p_drbd_mount2 \ 
> meta master-max="1" master-node-max="1" clone-max="2" 
> clone-node-max="1" notify="true" 
> ms ms_drbd_mount1 p_drbd_mount1 \ 
> meta master-max="1" master-node-max="1" clone-max="2" 
> clone-node-max="1" notify="true" 
> ms ms_drbd_vmstore p_drbd_vmstore \ 
> meta master-max="1" master-node-max="1" clone-max="2" 
> clone-node-max="1" notify="true" 
> clone cl_ping p_ping \ 
> meta interleave="true" 
> clone cl_sysadmin_notify p_sysadmin_notify 
> location loc_run_on_most_connected g_vm \ 
> rule $id="loc_run_on_most_connected-rule" p_ping: defined p_ping 
> location loc_st_nodescan stonithquorumnode -inf: vmhost1 
> location loc_st_vmhost1 stonithvmhost1 -inf: vmhost1 
> location loc_st_vmhost2 stonithvmhost2 -inf: vmhost2 
> colocation c_drbd_libvirt_vm inf: g_vm ms_drbd_vmstore:Master 
> ms_drbd_tools:Master ms_drbd_crm:Master 
> order o_drbd-fs-vm inf: ms_drbd_vmstore:promote ms_drbd_tools:promote 
> ms_drbd_crm:promote g_vm:start 
> property $id="cib-bootstrap-options" \ 
> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ 
> cluster-infrastructure="Heartbeat" \ 
> stonith-enabled="true" \ 
> no-quorum-policy="freeze" \ 
> last-lrm-refresh="1337746179" 
> 
> This has been working well, however last week Pacemaker all of a sudden 
> stopped the p_vm_myvm resource and then started it up again. I have 
> attached the relevant section of /var/log/daemon.log - I am unable to 
> determine what caused Pacemaker to restart this resource. Based on the log, 
> could you tell me what event triggered this? 
> 
> Thanks, 
> 
> Andrew 
> 
> _______________________________________________ 
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 
> 
> 

-- 
esta es mi vida e me la vivo hasta que dios quiera 

_______________________________________________ 
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker 

Project Home: http://www.clusterlabs.org 
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
Bugs: http://bugs.clusterlabs.org 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120619/cb5a08b5/attachment-0003.html>