[Pacemaker] Xen/DRBD cluster issuse when putting a node in standby mode

Mon Jul 26 21:32:34 UTC 2010

  Hi All,

I am using a simple two-nodes cluster with Xen on top of DRBD in 
primary/primary mode (necessary for live migration).  My configuration 
is quite simple:

primitive appyul1 ocf:heartbeat:Xen \
         params xmfile="/etc/xen/appyul1.cfg" shutdown_timeout="299" \
         op monitor interval="10s" timeout="300s" \
         op start interval="0s" timeout="180s" \
         op stop interval="0s" timeout="300s" \
         op migrate_from interval="0s" timeout="180s" \
         op migrate_to interval="0s" timeout="180s" \
         meta target-role="Started" allow-migrate="true" is-managed="true"
primitive appyul1slash-DRBD ocf:linbit:drbd \
         params drbd_resource="appyul1slash" \
         operations $id="appyul1slash-DRBD-ops" \
         op monitor interval="20s" role="Master" timeout="300s" \
         op monitor interval="30s" role="Slave" timeout="300s"
primitive appyul1swap-DRBD ocf:linbit:drbd \
         params drbd_resource="appyul1swap" \
         operations $id="appyul1swap-DRBD-ops" \
         op monitor interval="20s" role="Master" timeout="300s" \
         op monitor interval="30s" role="Slave" timeout="300s"
ms appyul1slash-MS appyul1slash-DRBD \
         meta master-max="2" notify="true" interleave="true" 
target-role="Started" is-managed="true"
ms appyul1swap-MS appyul1swap-DRBD \
         meta master-max="2" notify="true" interleave="true" 
target-role="Started" is-managed="true"
order appyul1-after-drbd inf: appyul1slash-MS:promote 
appyul1swap-MS:promote appyul1:start

So to summerize:
- A  resource for Xen
- Two Master/Slave DRBD ressources for the VM filesystem (/ and swap). 
master-max is set to 2 to have both node in primary DRBD state.
- a "order" directive to start the VM after drbd has been promoted.

Node startup is ok, the VM is started after DRBD is promoted.

Node shutdown is problematic. Assuming the Xen VM runs on node A :
-  When puting node A in standby when node B is active, a live migration 
is started, BUT in the same second, pacemaker tries to demote DRBD 
volumes on A (while live migration is in progress).
- When putting node A in standby when node B is also in standby, the VM 
is stopped, BUT in the same second, pacemaker tries to demote DRBD 
volumes on A (while shutdown is still in progress).

All this results in "failed actions" in the CRM, and cause unwanted 
stonith actions (when enabled). I tried to add "symmetrical=false" on 
the order constraint, but it did not help.

I do not understand by pacemaker does not wait the Xen VM is 
stopped/migrated before demoting DRBD volumes.

Setup is done with corosync and pacemaker packages available on a 
standard Ubuntu Lucid (corosync 1.2.0 and pacemaker 1.0.8).

Thanks for your help,

Pierre

* *
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20100726/66eaa367/attachment.htm>