[Pacemaker] Resource ordering/colocating question (DRBD + LVM + FS)
    Heikki Manninen 
    hma at iki.fi
       
    Thu Sep  5 12:08:07 UTC 2013
    
    
  
Hello,
I'm having a bit of a problem understanding what's going on with my simple two-node demo cluster here. My resources come up correctly after restarting the whole cluster but the LVM and Filesystem resources fail to start after a single node restart or standby/unstandby (after node comes back online - why do they even stop/start after the second node comes back?).
OS: CentOS 6.4 (cman stack)
Pacemaker: pacemaker-1.1.8-7.el6.x86_64
DRBD: drbd84-utils-8.4.3-1.el6.elrepo.x86_64
Everything is configured using: pcs-0.9.26-10.el6_4.1.noarch
Two DRBD resources configured and working: data01 & data02
Two nodes: pgdbsrv01.cl1.local & pgdbsrv02.cl1.local
Configuration:
node pgdbsrv01.cl1.local
node pgdbsrv02.cl1.local
primitive DRBD_data01 ocf:linbit:drbd \
     params drbd_resource="data01" \
     op monitor interval="30s"
primitive DRBD_data02 ocf:linbit:drbd \
     params drbd_resource="data02" \
     op monitor interval="30s"
primitive FS_data01 ocf:heartbeat:Filesystem \
     params device="/dev/mapper/vgdata01-lvdata01" directory="/data01" fstype="ext4" \
     op monitor interval="30s"
primitive FS_data02 ocf:heartbeat:Filesystem \
     params device="/dev/mapper/vgdata02-lvdata02" directory="/data02" fstype="ext4" \
     op monitor interval="30s"
primitive LVM_vgdata01 ocf:heartbeat:LVM \
     params volgrpname="vgdata01" exclusive="true" \
     op monitor interval="30s"
primitive LVM_vgdata02 ocf:heartbeat:LVM \
     params volgrpname="vgdata02" exclusive="true" \
     op monitor interval="30s"
group GRP_data01 LVM_vgdata01 FS_data01
group GRP_data02 LVM_vgdata02 FS_data02
ms DRBD_ms_data01 DRBD_data01 \
     meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
ms DRBD_ms_data02 DRBD_data02 \
     meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
colocation colocation-GRP_data01-DRBD_ms_data01-INFINITY inf: GRP_data01 DRBD_ms_data01:Master
colocation colocation-GRP_data02-DRBD_ms_data02-INFINITY inf: GRP_data02 DRBD_ms_data02:Master
order order-DRBD_data01-GRP_data01-mandatory : DRBD_data01:promote GRP_data01:start
order order-DRBD_data02-GRP_data02-mandatory : DRBD_data02:promote GRP_data02:start
property $id="cib-bootstrap-options" \
     dc-version="1.1.8-7.el6-394e906" \
     cluster-infrastructure="cman" \
     stonith-enabled="false" \
     no-quorum-policy="ignore" \
     migration-threshold="1"
rsc_defaults $id="rsc_defaults-options" \
     resource-stickiness="100"
1) After starting the cluster, everything runs happily:
Last updated: Tue Sep  3 00:11:13 2013
Last change: Tue Sep  3 00:05:15 2013 via cibadmin on pgdbsrv01.cl1.local
Stack: cman
Current DC: pgdbsrv02.cl1.local - partition with quorum
Version: 1.1.8-7.el6-394e906
2 Nodes configured, unknown expected votes
9 Resources configured.
Online: [ pgdbsrv01.cl1.local pgdbsrv02.cl1.local ]
Full list of resources:
Master/Slave Set: DRBD_ms_data01 [DRBD_data01]
     Masters: [ pgdbsrv01.cl1.local ]
     Slaves: [ pgdbsrv02.cl1.local ]
Master/Slave Set: DRBD_ms_data02 [DRBD_data02]
     Masters: [ pgdbsrv01.cl1.local ]
     Slaves: [ pgdbsrv02.cl1.local ]
Resource Group: GRP_data01
     LVM_vgdata01 (ocf::heartbeat:LVM): Started pgdbsrv01.cl1.local
     FS_data01 (ocf::heartbeat:Filesystem): Started pgdbsrv01.cl1.local
Resource Group: GRP_data02
     LVM_vgdata02 (ocf::heartbeat:LVM): Started pgdbsrv01.cl1.local
     FS_data02 (ocf::heartbeat:Filesystem): Started pgdbsrv01.cl1.local
2) Putting node #1 to standby mode - after which everything runs happily on node pgdbsrv02.cl1.local
# pcs cluster standby pgdbsrv01.cl1.local
# pcs status
Last updated: Tue Sep  3 00:16:01 2013
Last change: Tue Sep  3 00:15:55 2013 via crm_attribute on pgdbsrv02.cl1.local
Stack: cman
Current DC: pgdbsrv02.cl1.local - partition with quorum
Version: 1.1.8-7.el6-394e906
2 Nodes configured, unknown expected votes
9 Resources configured.
Node pgdbsrv01.cl1.local: standby
Online: [ pgdbsrv02.cl1.local ]
Full list of resources:
 IP_database     (ocf::heartbeat:IPaddr2):     Started pgdbsrv02.cl1.local
 Master/Slave Set: DRBD_ms_data01 [DRBD_data01]
     Masters: [ pgdbsrv02.cl1.local ]
     Stopped: [ DRBD_data01:1 ]
 Master/Slave Set: DRBD_ms_data02 [DRBD_data02]
     Masters: [ pgdbsrv02.cl1.local ]
     Stopped: [ DRBD_data02:1 ]
 Resource Group: GRP_data01
     LVM_vgdata01     (ocf::heartbeat:LVM):     Started pgdbsrv02.cl1.local
     FS_data01     (ocf::heartbeat:Filesystem):     Started pgdbsrv02.cl1.local
 Resource Group: GRP_data02
     LVM_vgdata02     (ocf::heartbeat:LVM):     Started pgdbsrv02.cl1.local
     FS_data02     (ocf::heartbeat:Filesystem):     Started pgdbsrv02.cl1.local
3) Putting node #1 back online - it seems that all the resources stop (?) and then DRBD gets promoted successfully on node #2 but LVM and FS resources never start
# pcs cluster unstandby pgdbsrv01.cl1.local
# pcs status
Last updated: Tue Sep  3 00:17:00 2013
Last change: Tue Sep  3 00:16:56 2013 via crm_attribute on pgdbsrv02.cl1.local
Stack: cman
Current DC: pgdbsrv02.cl1.local - partition with quorum
Version: 1.1.8-7.el6-394e906
2 Nodes configured, unknown expected votes
9 Resources configured.
Online: [ pgdbsrv01.cl1.local pgdbsrv02.cl1.local ]
Full list of resources:
 IP_database     (ocf::heartbeat:IPaddr2):     Started pgdbsrv02.cl1.local
 Master/Slave Set: DRBD_ms_data01 [DRBD_data01]
     Masters: [ pgdbsrv02.cl1.local ]
     Slaves: [ pgdbsrv01.cl1.local ]
 Master/Slave Set: DRBD_ms_data02 [DRBD_data02]
     Masters: [ pgdbsrv02.cl1.local ]
     Slaves: [ pgdbsrv01.cl1.local ]
 Resource Group: GRP_data01
     LVM_vgdata01     (ocf::heartbeat:LVM):     Stopped
     FS_data01     (ocf::heartbeat:Filesystem):     Stopped
 Resource Group: GRP_data02
     LVM_vgdata02     (ocf::heartbeat:LVM):     Stopped
     FS_data02     (ocf::heartbeat:Filesystem):     Stopped
Any ideas why this is happening/what could be wrong in the resource configuration? The same thing happens when testing the situation with the resources located vice-versa in the beginning. Also, if I stop & start one of the nodes, same thing happens once the node gets back online.
-- 
Heikki Manninen <hma at iki.fi>
    
    
More information about the Pacemaker
mailing list