[Pacemaker] Master/Slave resource cannot start

Diego Remolina diego.remolina at physics.gatech.edu
Fri Aug 7 12:09:20 UTC 2009


Hi,

I am fairly new to pacemaker, and while I had things working correctly 
for a while, in testing failovers and playing with my machines I got 
them to a state where one resource cannot start (ms-drbd_export:1).

============
Last updated: Fri Aug  7 07:27:52 2009
Stack: Heartbeat
Current DC: phys-file02.physics.gatech.edu 
(db786ace-4c9b-4ba1-b272-95b4d81b40a9) - partition with quorum
Version: 1.0.4-6dede86d6105786af3a5321ccf66b44b6914f0aa
2 Nodes configured, unknown expected votes
4 Resources configured.
============

Online: [ phys-file01.physics.gatech.edu phys-file02.physics.gatech.edu ]

Master/Slave Set: ms-drbd_export
         Masters: [ phys-file01.physics.gatech.edu ]
         Stopped: [ drbd_export:1 ]
Master/Slave Set: ms-drbd_scratch
         Masters: [ phys-file01.physics.gatech.edu ]
         Slaves: [ phys-file02.physics.gatech.edu ]
pingd   (ocf::pacemaker:pingd): Started phys-file01.physics.gatech.edu

No matter what I try, the resource ms-drbd_export:1 never starts, so 
there is no slave machine. I have tried clearing and refreshing using 
both the crm resource cleanup and the crm_resource -C commands with no luck.

The log files show:

Aug  7 07:21:47 phys-file02 pengine: [4334]: WARN: native_color: 
Resource drbd_export:1 cannot run anywhere

I am not sure how to find out why is the cluster realizing that it 
cannot run anywhere.

Fail-count also does not seem to be the problem, or maybe I am not 
querying it correctly:

[root at phys-file01 ~]# crm_failcount -r drbd_export
scope=status  name=fail-count-drbd_export value=0
[root at phys-file01 ~]# crm_failcount -r drbd_export:0
scope=status  name=fail-count-drbd_export:0 value=0
[root at phys-file01 ~]# crm_failcount -r drbd_export:1
scope=status  name=fail-count-drbd_export:1 value=0

----------------------crm configuration--------------------------
node $id="55db20ff-ccf4-4797-b0cb-a4f6bceed32d" 
phys-file01.physics.gatech.edu \
         attributes standby="off"
node $id="db786ace-4c9b-4ba1-b272-95b4d81b40a9" 
phys-file02.physics.gatech.edu \
         attributes standby="off"
primitive drbd_export ocf:heartbeat:drbd83 \
         params drbd_resource="export" \
         op monitor interval="59s" role="Master" timeout="30s" \
         op monitor interval="60s" role="Slave" timeout="30s" \
         meta target-role="Started"
primitive drbd_scratch ocf:heartbeat:drbd83 \
         params drbd_resource="scratch" \
         op monitor interval="59s" role="Master" timeout="30s" \
         op monitor interval="60s" role="Slave" timeout="30s" \
         meta target-role="Started"
primitive fs_export ocf:heartbeat:Filesystem \
         params type="ext4dev" device="/dev/drbd0" 
directory="/export/data" options="rw,user_xattr,acl,usrquota,grpquota" \
         meta target-role="Stopped"
primitive fs_scratch ocf:heartbeat:Filesystem \
         params type="ext4dev" device="/dev/drbd1" 
directory="/export/scratch" options="rw,user_xattr,acl,usrquota,grpquota" \
         meta target-role="Stopped"
primitive nfs lsb:nfs \
         op monitor interval="30s" timeout="10s" \
         meta target-role="Started"
primitive pingd ocf:pacemaker:pingd \
         params host_list="130.207.139.1" multiplier="100" \
         op monitor interval="15s" timeout="5s" \
         meta target-role="Started"
primitive samba lsb:smb \
         op monitor interval="30s" timeout="10s" \
         meta target-role="Started"
primitive virtual-ip-1 ocf:heartbeat:IPaddr2 \
         params ip="130.207.139.20" cidr_netmask="24" \
         op monitor interval="20s" timeout="5s" \
         meta target-role="Started"
group fileserver fs_export fs_scratch virtual-ip-1 nfs samba
ms ms-drbd_export drbd_export \
         meta clone_max="2" clone_node_max="1" master_max="1" 
master_node_max="1" notify="yes" globally-unique="false" 
target-role="started"
ms ms-drbd_scratch drbd_scratch \
         meta clone_max="2" clone_node_max="1" master_max="1" 
master_node_max="1" notify="yes" globally-unique="false" 
target-role="started"
colocation fileserver-on-ms-drbd_export inf: fileserver 
ms-drbd_export:Master
colocation ms-drbd_export-on-pingd inf: ms-drbd_export pingd
colocation ms-drbd_scratch-on-ms-drbd_export inf: ms-drbd_scratch:Master 
ms-drbd_export:Master
order ms-drbd_export-before-fileserver inf: ms-drbd_export:promote 
fileserver:start
order ms-drbd_scratch-before-fileserver inf: ms-drbd_scratch:promote 
fileserver:start
property $id="cib-bootstrap-options" \
         dc-version="1.0.4-6dede86d6105786af3a5321ccf66b44b6914f0aa" \
         cluster-infrastructure="Heartbeat" \
         last-lrm-refresh="1249584027"
-----------------------------------------------------------------

If you could help me clear the problem so that the resource can be 
started again, I would greatly appreciate it.

libpacemaker3-1.0.4-23.1
pacemaker-1.0.4-23.1
heartbeat-common-2.99.2-8.1
heartbeat-resources-2.99.2-8.1
heartbeat-2.99.2-8.1
libheartbeat2-2.99.2-8.1

Thanks,

Diego




More information about the Pacemaker mailing list