[Pacemaker] Nodes will not promote DRBD resources to master on failover

Hi Andreas, 

Thanks, I've updated the colocation rule to be in the correct order. I also enabled the STONITH resource (this was temporarily disabled before for some additional testing). DRBD has its own network connection over the br1 interface ( network), a direct crossover cable between node1 and node2: 

global { usage-count no; } 
common { 
syncer { rate 110M; } 
resource vmstore { 
protocol C; 
startup { 
wfc-timeout 15; 
degr-wfc-timeout 60; 
handlers { 
#fence-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; 
fence-peer "/usr/local/bin/fence-peer"; 
split-brain "/usr/lib/drbd/notify-split-brain.sh me at example.com"; 
net { 
after-sb-0pri discard-zero-changes; 
after-sb-1pri discard-secondary; 
after-sb-2pri disconnect; 
cram-hmac-alg md5; 
shared-secret "xxxxx"; 
disk { 
fencing resource-only; 
on node1 { 
device /dev/drbd0; 
disk /dev/sdb1; 
meta-disk internal; 
on node2 { 
device /dev/drbd0; 
disk /dev/sdf1; 
meta-disk internal; 
# and similar for mount1 and mount2 

Also, here is my ha.cf. It uses both the direct link between the nodes (br1) and the shared LAN network on br0 for communicating: 

autojoin none 
mcast br0 694 1 0 
bcast br1 
warntime 5 
deadtime 15 
initdead 60 
keepalive 2 
node node1 
node node2 
node quorumnode 
crm respawn 
respawn hacluster /usr/lib/heartbeat/dopd 
apiauth dopd gid=haclient uid=hacluster 

I am thinking of making the following changes to the CIB (as per the official DRBD guide http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html ) in order to add the DRBD lsb service and require that it start before the ocf:linbit:drbd resources. Does this look correct? 
primitive p_drbd-init lsb:drbd op monitor interval="30" 
colocation c_drbd_together inf: p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master ms_drbd_mount2:Master 
order drbd_init_first inf: ms_drbd_vmstore:promote ms_drbd_mount1:promote ms_drbd_mount2:promote p_drbd-init:start 

This doesn't seem to require that drbd be also running on the node where the ocf:linbit:drbd resources are slave (which it would need to do to be a DRBD SyncTarget) - how can I ensure that drbd is running everywhere? (clone cl_drbd p_drbd-init ?) 


>> Perhaps this is best described through the output of crm_mon: 
>> Online: [ node1 node2 ] 
>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] (unmanaged) 
>> p_drbd_mount1:0 (ocf::linbit:drbd): Started node2 (unmanaged) 
>> p_drbd_mount1:1 (ocf::linbit:drbd): Started node1 
>> (unmanaged) FAILED 
>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2] 
>> p_drbd_mount2:0 (ocf::linbit:drbd): Master node1 
>> (unmanaged) FAILED 
>> Slaves: [ node2 ] 
>> Resource Group: g_core 
>> p_fs_mount1 (ocf::heartbeat:Filesystem): Started node1 
>> p_fs_mount2 (ocf::heartbeat:Filesystem): Started node1 
>> p_ip_nfs (ocf::heartbeat:IPaddr2): Started node1 
>> Resource Group: g_apache 
>> p_fs_mountbind1 (ocf::heartbeat:Filesystem): Started node1 
>> p_fs_mountbind2 (ocf::heartbeat:Filesystem): Started node1 
>> p_fs_mountbind3 (ocf::heartbeat:Filesystem): Started node1 
>> p_fs_varwww (ocf::heartbeat:Filesystem): Started node1 
>> p_apache (ocf::heartbeat:apache): Started node1 
>> Resource Group: g_fileservers 
>> p_lsb_smb (lsb:smbd): Started node1 
>> p_lsb_nmb (lsb:nmbd): Started node1 
>> p_lsb_nfsserver (lsb:nfs-kernel-server): Started node1 
>> p_exportfs_mount1 (ocf::heartbeat:exportfs): Started node1 
>> p_exportfs_mount2 (ocf::heartbeat:exportfs): Started node1 
>> I have read through the Pacemaker Explained 
> <http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained> 
>> documentation, however could not find a way to further debug these 
>> problems. First, I put node1 into standby mode to attempt failover to 
>> the other node (node2). Node2 appeared to start the transition to 
>> master, however it failed to promote the DRBD resources to master (the 
>> first step). I have attached a copy of this session in commands.log and 
>> additional excerpts from /var/log/syslog during important steps. I have 
>> attempted everything I can think of to try and start the DRBD resource 
>> (e.g. start/stop/promote/manage/cleanup under crm resource, restarting 
>> heartbeat) but cannot bring it out of the slave state. However, if I set 
>> it to unmanaged and then run drbdadm primary all in the terminal, 
>> pacemaker is satisfied and continues starting the rest of the resources. 
>> It then failed when attempting to mount the filesystem for mount2, the 
>> p_fs_mount2 resource. I attempted to mount the filesystem myself and was 
>> successful. I then unmounted it and ran cleanup on p_fs_mount2 and then 
>> it mounted. The rest of the resources started as expected until the 
>> p_exportfs_mount2 resource, which failed as follows: 
>> p_exportfs_mount2 (ocf::heartbeat:exportfs): started node2 
>> (unmanaged) FAILED 
>> I ran cleanup on this and it started, however when running this test 
>> earlier today no command could successfully start this exportfs resource. 
>> How can I configure pacemaker to better resolve these problems and be 
>> able to bring the node up successfully on its own? What can I check to 
>> determine why these failures are occuring? /var/log/syslog did not seem 
>> to contain very much useful information regarding why the failures 
> occurred. 
>> Thanks, 
>> Andrew 
>> This body part will be downloaded on demand. 
