[Pacemaker] Nodes will not promote DRBD resources to master on failover

Wed Mar 28 12:27:34 UTC 2012

On 03/28/2012 12:13 AM, Andrew Martin wrote:
> Hi Andreas,
> 
> Thanks, I've updated the colocation rule to be in the correct order. I
> also enabled the STONITH resource (this was temporarily disabled before
> for some additional testing). DRBD has its own network connection over
> the br1 interface (192.168.5.0/24 network), a direct crossover cable
> between node1 and node2:
> global { usage-count no; }
> common {
>         syncer { rate 110M; }
> }
> resource vmstore {
>         protocol C;
>         startup {
>                 wfc-timeout  15;
>                 degr-wfc-timeout 60;
>         }
>         handlers {
>                 #fence-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
>                 fence-peer "/usr/local/bin/fence-peer";

hmm ... what is that fence-peer script doing? If you want to use
resource-level fencing with the help of dopd, activate the
drbd-peer-outdater script in the line above ... and double check if the
path is correct

>                 split-brain "/usr/lib/drbd/notify-split-brain.sh
> me at example.com";
>         }
>         net {
>                 after-sb-0pri discard-zero-changes;
>                 after-sb-1pri discard-secondary;
>                 after-sb-2pri disconnect;
>                 cram-hmac-alg md5;
>                 shared-secret "xxxxx";
>         }
>         disk {
>                 fencing resource-only;
>         }
>         on node1 {
>                 device /dev/drbd0;
>                 disk /dev/sdb1;
>                 address 192.168.5.10:7787;
>                 meta-disk internal;
>         }
>         on node2 {
>                 device /dev/drbd0;
>                 disk /dev/sdf1;
>                 address 192.168.5.11:7787;
>                 meta-disk internal;
>         }
> }
> # and similar for mount1 and mount2
> 
> Also, here is my ha.cf. It uses both the direct link between the nodes
> (br1) and the shared LAN network on br0 for communicating:
> autojoin none
> mcast br0 239.0.0.43 694 1 0
> bcast br1
> warntime 5
> deadtime 15
> initdead 60
> keepalive 2
> node node1
> node node2
> node quorumnode
> crm respawn
> respawn hacluster /usr/lib/heartbeat/dopd
> apiauth dopd gid=haclient uid=hacluster
> 
> I am thinking of making the following changes to the CIB (as per the
> official DRBD
> guide http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html) in
> order to add the DRBD lsb service and require that it start before the
> ocf:linbit:drbd resources. Does this look correct?

Where did you read that? No, deactivate the startup of DRBD on system
boot and let Pacemaker manage it completely.

> primitive p_drbd-init lsb:drbd op monitor interval="30"
> colocation c_drbd_together inf:
> p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master
> ms_drbd_mount2:Master
> order drbd_init_first inf: ms_drbd_vmstore:promote
> ms_drbd_mount1:promote ms_drbd_mount2:promote p_drbd-init:start
> 
> This doesn't seem to require that drbd be also running on the node where
> the ocf:linbit:drbd resources are slave (which it would need to do to be
> a DRBD SyncTarget) - how can I ensure that drbd is running everywhere?
> (clone cl_drbd p_drbd-init ?)

This is really not needed.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> Thanks,
> 
> Andrew
> ------------------------------------------------------------------------
> *From: *"Andreas Kurz" <andreas at hastexo.com>
> *To: *pacemaker at oss.clusterlabs.org
> *Sent: *Monday, March 26, 2012 5:56:22 PM
> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
> master on failover
> 
> On 03/24/2012 08:15 PM, Andrew Martin wrote:
>> Hi Andreas,
>>
>> My complete cluster configuration is as follows:
>> ============
>> Last updated: Sat Mar 24 13:51:55 2012
>> Last change: Sat Mar 24 13:41:55 2012
>> Stack: Heartbeat
>> Current DC: node2 (9100538b-7a1f-41fd-9c1a-c6b4b1c32b18) - partition
>> with quorum
>> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
>> 3 Nodes configured, unknown expected votes
>> 19 Resources configured.
>> ============
>>
>> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): OFFLINE (standby)
>> Online: [ node2 node1 ]
>>
>>  Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
>>      Masters: [ node2 ]
>>      Slaves: [ node1 ]
>>  Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]
>>      Masters: [ node2 ]
>>      Slaves: [ node1 ]
>>  Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
>>      Masters: [ node2 ]
>>      Slaves: [ node1 ]
>>  Resource Group: g_vm
>>      p_fs_vmstore(ocf::heartbeat:Filesystem):Started node2
>>      p_vm(ocf::heartbeat:VirtualDomain):Started node2
>>  Clone Set: cl_daemons [g_daemons]
>>      Started: [ node2 node1 ]
>>      Stopped: [ g_daemons:2 ]
>>  Clone Set: cl_sysadmin_notify [p_sysadmin_notify]
>>      Started: [ node2 node1 ]
>>      Stopped: [ p_sysadmin_notify:2 ]
>>  stonith-node1(stonith:external/tripplitepdu):Started node2
>>  stonith-node2(stonith:external/tripplitepdu):Started node1
>>  Clone Set: cl_ping [p_ping]
>>      Started: [ node2 node1 ]
>>      Stopped: [ p_ping:2 ]
>>
>> node $id="6553a515-273e-42fe-ab9e-00f74bd582c3" node1 \
>>         attributes standby="off"
>> node $id="9100538b-7a1f-41fd-9c1a-c6b4b1c32b18" node2 \
>>         attributes standby="off"
>> node $id="c4bf25d7-a6b7-4863-984d-aafd937c0da4" quorumnode \
>>         attributes standby="on"
>> primitive p_drbd_mount2 ocf:linbit:drbd \
>>         params drbd_resource="mount2" \
>>         op monitor interval="15" role="Master" \
>>         op monitor interval="30" role="Slave"
>> primitive p_drbd_mount1 ocf:linbit:drbd \
>>         params drbd_resource="mount1" \
>>         op monitor interval="15" role="Master" \
>>         op monitor interval="30" role="Slave"
>> primitive p_drbd_vmstore ocf:linbit:drbd \
>>         params drbd_resource="vmstore" \
>>         op monitor interval="15" role="Master" \
>>         op monitor interval="30" role="Slave"
>> primitive p_fs_vmstore ocf:heartbeat:Filesystem \
>>         params device="/dev/drbd0" directory="/vmstore" fstype="ext4" \
>>         op start interval="0" timeout="60s" \
>>         op stop interval="0" timeout="60s" \
>>         op monitor interval="20s" timeout="40s"
>> primitive p_libvirt-bin upstart:libvirt-bin \
>>         op monitor interval="30"
>> primitive p_ping ocf:pacemaker:ping \
>>         params name="p_ping" host_list="192.168.1.10 192.168.1.11"
>> multiplier="1000" \
>>         op monitor interval="20s"
>> primitive p_sysadmin_notify ocf:heartbeat:MailTo \
>>         params email="me at example.com" \
>>         params subject="Pacemaker Change" \
>>         op start interval="0" timeout="10" \
>>         op stop interval="0" timeout="10" \
>>         op monitor interval="10" timeout="10"
>> primitive p_vm ocf:heartbeat:VirtualDomain \
>>         params config="/vmstore/config/vm.xml" \
>>         meta allow-migrate="false" \
>>         op start interval="0" timeout="120s" \
>>         op stop interval="0" timeout="120s" \
>>         op monitor interval="10" timeout="30"
>> primitive stonith-node1 stonith:external/tripplitepdu \
>>         params pdu_ipaddr="192.168.1.12" pdu_port="1" pdu_username="xxx"
>> pdu_password="xxx" hostname_to_stonith="node1"
>> primitive stonith-node2 stonith:external/tripplitepdu \
>>         params pdu_ipaddr="192.168.1.12" pdu_port="2" pdu_username="xxx"
>> pdu_password="xxx" hostname_to_stonith="node2"
>> group g_daemons p_libvirt-bin
>> group g_vm p_fs_vmstore p_vm
>> ms ms_drbd_mount2 p_drbd_mount2 \
>>         meta master-max="1" master-node-max="1" clone-max="2"
>> clone-node-max="1" notify="true"
>> ms ms_drbd_mount1 p_drbd_mount1 \
>>         meta master-max="1" master-node-max="1" clone-max="2"
>> clone-node-max="1" notify="true"
>> ms ms_drbd_vmstore p_drbd_vmstore \
>>         meta master-max="1" master-node-max="1" clone-max="2"
>> clone-node-max="1" notify="true"
>> clone cl_daemons g_daemons
>> clone cl_ping p_ping \
>>         meta interleave="true"
>> clone cl_sysadmin_notify p_sysadmin_notify
>> location l-st-node1 stonith-node1 -inf: node1
>> location l-st-node2 stonith-node2 -inf: node2
>> location l_run_on_most_connected p_vm \
>>         rule $id="l_run_on_most_connected-rule" p_ping: defined p_ping
>> colocation c_drbd_libvirt_vm inf: ms_drbd_vmstore:Master
>> ms_drbd_mount1:Master ms_drbd_mount2:Master g_vm
> 
> As Emmanuel already said, g_vm has to be in the first place in this
> collocation constraint .... g_vm must be colocated with the drbd masters.
> 
>> order o_drbd-fs-vm inf: ms_drbd_vmstore:promote ms_drbd_mount1:promote
>> ms_drbd_mount2:promote cl_daemons:start g_vm:start
>> property $id="cib-bootstrap-options" \
>>         dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
>>         cluster-infrastructure="Heartbeat" \
>>         stonith-enabled="false" \
>>         no-quorum-policy="stop" \
>>         last-lrm-refresh="1332539900" \
>>         cluster-recheck-interval="5m" \
>>         crmd-integration-timeout="3m" \
>>         shutdown-escalation="5m"
>>
>> The STONITH plugin is a custom plugin I wrote for the Tripp-Lite
>> PDUMH20ATNET that I'm using as the STONITH device:
>> http://www.tripplite.com/shared/product-pages/en/PDUMH20ATNET.pdf
> 
> And why don't using it? .... stonith-enabled="false"
> 
>>
>> As you can see, I left the DRBD service to be started by the operating
>> system (as an lsb script at boot time) however Pacemaker controls
>> actually bringing up/taking down the individual DRBD devices.
> 
> Don't start drbd on system boot, give Pacemaker the full control.
> 
> The
>> behavior I observe is as follows: I issue "crm resource migrate p_vm" on
>> node1 and failover successfully to node2. During this time, node2 fences
>> node1's DRBD devices (using dopd) and marks them as Outdated. Meanwhile
>> node2's DRBD devices are UpToDate. I then shutdown both nodes and then
>> bring them back up. They reconnect to the cluster (with quorum), and
>> node1's DRBD devices are still Outdated as expected and node2's DRBD
>> devices are still UpToDate, as expected. At this point, DRBD starts on
>> both nodes, however node2 will not set DRBD as master:
>> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): OFFLINE (standby)
>> Online: [ node2 node1 ]
>>
>>  Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
>>      Slaves: [ node1 node2 ]
>>  Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]
>>      Slaves: [ node1 node 2 ]
>>  Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
>>      Slaves: [ node1 node2 ]
> 
> There should really be no interruption of the drbd replication on vm
> migration that activates the dopd ... drbd has its own direct network
> connection?
> 
> Please share your ha.cf file and your drbd configuration. Watch out for
> drbd messages in your kernel log file, that should give you additional
> information when/why the drbd connection was lost.
> 
> Regards,
> Andreas
> 
> -- 
> Need help with Pacemaker?
> http://www.hastexo.com/now
> 
>>
>> I am having trouble sorting through the logging information because
>> there is so much of it in /var/log/daemon.log, but I can't  find an
>> error message printed about why it will not promote node2. At this point
>> the DRBD devices are as follows:
>> node2: cstate = WFConnection dstate=UpToDate
>> node1: cstate = StandAlone dstate=Outdated
>>
>> I don't see any reason why node2 can't become DRBD master, or am I
>> missing something? If I do "drbdadm connect all" on node1, then the
>> cstate on both nodes changes to "Connected" and node2 immediately
>> promotes the DRBD resources to master. Any ideas on why I'm observing
>> this incorrect behavior?
>>
>> Any tips on how I can better filter through the pacemaker/heartbeat logs
>> or how to get additional useful debug information?
>>
>> Thanks,
>>
>> Andrew
>>
>> ------------------------------------------------------------------------
>> *From: *"Andreas Kurz" <andreas at hastexo.com>
>> *To: *pacemaker at oss.clusterlabs.org
>> *Sent: *Wednesday, 1 February, 2012 4:19:25 PM
>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
>> master on failover
>>
>> On 01/25/2012 08:58 PM, Andrew Martin wrote:
>>> Hello,
>>>
>>> Recently I finished configuring a two-node cluster with pacemaker 1.1.6
>>> and heartbeat 3.0.5 on nodes running Ubuntu 10.04. This cluster includes
>>> the following resources:
>>> - primitives for DRBD storage devices
>>> - primitives for mounting the filesystem on the DRBD storage
>>> - primitives for some mount binds
>>> - primitive for starting apache
>>> - primitives for starting samba and nfs servers (following instructions
>>> here <http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf>)
>>> - primitives for exporting nfs shares (ocf:heartbeat:exportfs)
>>
>> not enough information ... please share at least your complete cluster
>> configuration
>>
>> Regards,
>> Andreas
>>
>> --
>> Need help with Pacemaker?
>> http://www.hastexo.com/now
>>
>>>
>>> Perhaps this is best described through the output of crm_mon:
>>> Online: [ node1 node2 ]
>>>
>>>  Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] (unmanaged)
>>>      p_drbd_mount1:0     (ocf::linbit:drbd):     Started node2
> (unmanaged)
>>>      p_drbd_mount1:1     (ocf::linbit:drbd):     Started node1
>>> (unmanaged) FAILED
>>>  Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
>>>      p_drbd_mount2:0       (ocf::linbit:drbd):     Master node1
>>> (unmanaged) FAILED
>>>      Slaves: [ node2 ]
>>>  Resource Group: g_core
>>>      p_fs_mount1 (ocf::heartbeat:Filesystem):    Started node1
>>>      p_fs_mount2   (ocf::heartbeat:Filesystem):    Started node1
>>>      p_ip_nfs   (ocf::heartbeat:IPaddr2):       Started node1
>>>  Resource Group: g_apache
>>>      p_fs_mountbind1    (ocf::heartbeat:Filesystem):    Started node1
>>>      p_fs_mountbind2    (ocf::heartbeat:Filesystem):    Started node1
>>>      p_fs_mountbind3    (ocf::heartbeat:Filesystem):    Started node1
>>>      p_fs_varwww        (ocf::heartbeat:Filesystem):    Started node1
>>>      p_apache   (ocf::heartbeat:apache):        Started node1
>>>  Resource Group: g_fileservers
>>>      p_lsb_smb  (lsb:smbd):     Started node1
>>>      p_lsb_nmb  (lsb:nmbd):     Started node1
>>>      p_lsb_nfsserver    (lsb:nfs-kernel-server):        Started node1
>>>      p_exportfs_mount1   (ocf::heartbeat:exportfs):      Started node1
>>>      p_exportfs_mount2     (ocf::heartbeat:exportfs):      Started node1
>>>
>>> I have read through the Pacemaker Explained
>>>
>>
> <http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained>
>>> documentation, however could not find a way to further debug these
>>> problems. First, I put node1 into standby mode to attempt failover to
>>> the other node (node2). Node2 appeared to start the transition to
>>> master, however it failed to promote the DRBD resources to master (the
>>> first step). I have attached a copy of this session in commands.log and
>>> additional excerpts from /var/log/syslog during important steps. I have
>>> attempted everything I can think of to try and start the DRBD resource
>>> (e.g. start/stop/promote/manage/cleanup under crm resource, restarting
>>> heartbeat) but cannot bring it out of the slave state. However, if I set
>>> it to unmanaged and then run drbdadm primary all in the terminal,
>>> pacemaker is satisfied and continues starting the rest of the resources.
>>> It then failed when attempting to mount the filesystem for mount2, the
>>> p_fs_mount2 resource. I attempted to mount the filesystem myself and was
>>> successful. I then unmounted it and ran cleanup on p_fs_mount2 and then
>>> it mounted. The rest of the resources started as expected until the
>>> p_exportfs_mount2 resource, which failed as follows:
>>> p_exportfs_mount2     (ocf::heartbeat:exportfs):      started node2
>>> (unmanaged) FAILED
>>>
>>> I ran cleanup on this and it started, however when running this test
>>> earlier today no command could successfully start this exportfs resource.
>>>
>>> How can I configure pacemaker to better resolve these problems and be
>>> able to bring the node up successfully on its own? What can I check to
>>> determine why these failures are occuring? /var/log/syslog did not seem
>>> to contain very much useful information regarding why the failures
>> occurred.
>>>
>>> Thanks,
>>>
>>> Andrew
>>>
>>>
>>>
>>>
>>> This body part will be downloaded on demand.
>>
>>
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120328/421bdf58/attachment-0004.sig>