[Pacemaker] Pacemaker/DRBD troubles

David Parker dparker at utica.edu
Mon Sep 23 18:50:21 UTC 2013


Hello,

I'm attempting to set up a simple NFS failover test using Pacemaker and
DRBD on 2 nodes.  The goal is to have one host be the DRBD master, and have
the volume mounted, the NFS server running, and a virtual IP address up.
 The other node is the DRBD slave with no NFS services or virtual IP
running.  The DRBD resource is configured in master-slave (not dual-master)
mode and seems to work fine when it's not being controlled by Pacemaker.

The problem is that both nodes start out as DRBD slaves, and neither node
gets promoted:

# crm_mon -1
============
Last updated: Mon Sep 23 14:39:12 2013
Last change: Mon Sep 23 14:26:15 2013 via cibadmin on test-vm-1
Stack: openais
Current DC: test-vm-1 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
5 Resources configured.
============

Online: [ test-vm-1 test-vm-2 ]

 Master/Slave Set: ms-drbd_r0 [drbd_r0]
     Slaves: [ test-vm-1 test-vm-2 ]

If I try to force a promotion with "crm resource promote ms-drbd_r0" I get
no output, and I see this line in the log:

cib: [27320]: info: cib_process_request: Operation complete: op cib_modify
for section resources (origin=local/crm_resource/4, version=0.65.43): ok
(rc=0)

However, "crm_mon -1" still shows that both nodes are slaves.  I have a
constraint such that the NFS resources will only run on the DRBD master,
and a node will only get promoted to master once the virtual IP is started
on it.  I suspect that the IP is not starting and that's holding up the
promotion, but I can't figure out why the IP wouldn't start.  Looking in
the log, I see a bunch of pending actions to start the IP, but they're not
actually firing:

# grep 'nfs_ip' /var/log/cluster/corosync.log
Sep 23 14:28:24 test-vm-1 pengine: [27324]: notice: LogActions: Start
nfs_ip  (test-vm-1 - blocked)
Sep 23 14:28:24 test-vm-1 crmd: [27325]: info: te_rsc_command: Initiating
action 6: monitor nfs_ip_monitor_0 on test-vm-1 (local)
Sep 23 14:28:24 test-vm-1 lrmd: [27322]: info: rsc:nfs_ip probe[4] (pid
27398)
Sep 23 14:28:25 test-vm-1 lrmd: [27322]: info: operation monitor[4] on
nfs_ip for client 27325: pid 27398 exited with return code 7
Sep 23 14:28:25 test-vm-1 crmd: [27325]: info: process_lrm_event: LRM
operation nfs_ip_monitor_0 (call=4, rc=7, cib-update=28, confirmed=true)
not running
Sep 23 14:28:27 test-vm-1 pengine: [27324]: notice: LogActions: Start
nfs_ip  (test-vm-1)
Sep 23 14:28:27 test-vm-1 crmd: [27325]: WARN: print_elem:      * [Input
7]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0)
Sep 23 14:28:27 test-vm-1 crmd: [27325]: WARN: print_elem:     [Action 8]:
Pending (id: nfs_ip_monitor_10000, loc: test-vm-1, priority: 0)
Sep 23 14:28:27 test-vm-1 crmd: [27325]: WARN: print_elem:      * [Input
7]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0)
Sep 23 14:28:27 test-vm-1 crmd: [27325]: WARN: print_elem:     [Action 7]:
Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0)
Sep 23 14:28:27 test-vm-1 crmd: [27325]: WARN: print_elem:      * [Input
7]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0)
Sep 23 14:28:33 test-vm-1 pengine: [27324]: notice: LogActions: Start
nfs_ip  (test-vm-1)
Sep 23 14:28:33 test-vm-1 crmd: [27325]: info: te_rsc_command: Initiating
action 7: monitor nfs_ip_monitor_0 on test-vm-2
Sep 23 14:28:36 test-vm-1 pengine: [27324]: notice: LogActions: Start
nfs_ip  (test-vm-1)
Sep 23 14:28:37 test-vm-1 pengine: [27324]: notice: LogActions: Start
nfs_ip  (test-vm-1)
Sep 23 14:28:37 test-vm-1 crmd: [27325]: WARN: print_elem:      * [Input
8]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0)
Sep 23 14:28:37 test-vm-1 crmd: [27325]: WARN: print_elem:     [Action 9]:
Pending (id: nfs_ip_monitor_10000, loc: test-vm-1, priority: 0)
Sep 23 14:28:37 test-vm-1 crmd: [27325]: WARN: print_elem:      * [Input
8]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0)
Sep 23 14:28:37 test-vm-1 crmd: [27325]: WARN: print_elem:     [Action 8]:
Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0)
Sep 23 14:28:37 test-vm-1 crmd: [27325]: WARN: print_elem:      * [Input
8]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0)
Sep 23 14:43:37 test-vm-1 pengine: [27324]: notice: LogActions: Start
nfs_ip  (test-vm-1)
Sep 23 14:43:37 test-vm-1 crmd: [27325]: WARN: print_elem:      * [Input
8]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0)
Sep 23 14:43:37 test-vm-1 crmd: [27325]: WARN: print_elem:     [Action 9]:
Pending (id: nfs_ip_monitor_10000, loc: test-vm-1, priority: 0)
Sep 23 14:43:37 test-vm-1 crmd: [27325]: WARN: print_elem:      * [Input
8]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0)
Sep 23 14:43:37 test-vm-1 crmd: [27325]: WARN: print_elem:     [Action 8]:
Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0)
Sep 23 14:43:37 test-vm-1 crmd: [27325]: WARN: print_elem:      * [Input
8]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0)

Any help will be greatly appreciated.

The relevant portion of my CIB is below:

  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure"
name="cluster-infrastructure" value="openais"/>
        <nvpair id="cib-bootstrap-options-expected-quorum-votes"
name="expected-quorum-votes" value="2"/>
        <nvpair id="cib-bootstrap-options-stonith-enabled"
name="stonith-enabled" value="false"/>
        <nvpair id="cib-bootstrap-options-maintenance-mode"
name="maintenance-mode" value="false"/>
        <nvpair id="cib-bootstrap-options-no-quorum-policy"
name="no-quorum-policy" value="ignore"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="test-vm-1" type="normal" uname="test-vm-1"/>
      <node id="test-vm-2" type="normal" uname="test-vm-2"/>
    </nodes>
    <resources>
      <group id="nfs_resources">
        <meta_attributes id="nfs_resources-meta_attributes">
          <nvpair id="nfs_resources-meta_attributes-target-role"
name="target-role" value="Started"/>
        </meta_attributes>
        <primitive class="ocf" id="nfs_fs" provider="heartbeat"
type="Filesystem">
          <instance_attributes id="nfs_fs-instance_attributes">
            <nvpair id="nfs_fs-instance_attributes-device" name="device"
value="/dev/drbd1"/>
            <nvpair id="nfs_fs-instance_attributes-directory"
name="directory" value="/mnt/data/"/>
            <nvpair id="nfs_fs-instance_attributes-fstype" name="fstype"
value="ext3"/>
            <nvpair id="nfs_fs-instance_attributes-options" name="options"
value="noatime,nodiratime"/>
          </instance_attributes>
          <operations>
            <op id="nfs_fs-start-0" interval="0" name="start" timeout="60"/>
            <op id="nfs_fs-stop-0" interval="0" name="stop" timeout="120"/>
          </operations>
        </primitive>
        <primitive class="lsb" id="nfs" type="nfs-kernel-server">
          <operations>
            <op id="nfs-monitor-5s" interval="5s" name="monitor"/>
          </operations>
        </primitive>
        <primitive class="ocf" id="nfs_ip" provider="heartbeat"
type="IPaddr2">
          <instance_attributes id="nfs_ip-instance_attributes">
            <nvpair id="nfs_ip-instance_attributes-ip" name="ip"
value="192.168.25.205"/>
            <nvpair id="nfs_ip-instance_attributes-cidr_netmask"
name="cidr_netmask" value="32"/>
          </instance_attributes>
          <operations>
            <op id="nfs_ip-monitor-10s" interval="10s" name="monitor"/>
          </operations>
          <meta_attributes id="nfs_ip-meta_attributes">
            <nvpair id="nfs_ip-meta_attributes-is-managed"
name="is-managed" value="true"/>
          </meta_attributes>
        </primitive>
      </group>
      <master id="ms-drbd_r0">
        <meta_attributes id="ms-drbd_r0-meta_attributes">
          <nvpair id="ms-drbd_r0-meta_attributes-clone-max"
name="clone-max" value="2"/>
          <nvpair id="ms-drbd_r0-meta_attributes-notify" name="notify"
value="true"/>
          <nvpair id="ms-drbd_r0-meta_attributes-globally-unique"
name="globally-unique" value="false"/>
          <nvpair id="ms-drbd_r0-meta_attributes-target-role"
name="target-role" value="Master"/>
        </meta_attributes>
        <primitive class="ocf" id="drbd_r0" provider="heartbeat"
type="drbd">
          <instance_attributes id="drbd_r0-instance_attributes">
            <nvpair id="drbd_r0-instance_attributes-drbd_resource"
name="drbd_resource" value="r0"/>
          </instance_attributes>
          <operations>
            <op id="drbd_r0-monitor-59s" interval="59s" name="monitor"
role="Master" timeout="30s"/>
            <op id="drbd_r0-monitor-60s" interval="60s" name="monitor"
role="Slave" timeout="30s"/>
          </operations>
        </primitive>
      </master>
    </resources>
    <constraints>
      <rsc_colocation id="drbd-nfs-ha" rsc="ms-drbd_r0" rsc-role="Master"
score="INFINITY" with-rsc="nfs_resources"/>
      <rsc_order first="nfs_ip" first-action="start" id="ip-before-drbd"
score="INFINITY" then="ms-drbd_r0" then-action="promote"/>
      <rsc_order first="ms-drbd_r0" first-action="promote"
id="drbd-before-nfs" score="INFINITY" then="nfs_fs" then-action="start"/>
      <rsc_order first="nfs_fs" first-action="start" id="fs-before-nfs"
score="INFINITY" then="nfs" then-action="start"/>
    </constraints>
    <rsc_defaults>
      <meta_attributes id="rsc-options">
        <nvpair id="rsc-options-resource-stickiness"
name="resource-stickiness" value="100"/>
      </meta_attributes>
    </rsc_defaults>
    <op_defaults/>
  </configuration>

-- 
Dave Parker
Systems Administrator
Utica College
Integrated Information Technology Services
(315) 792-3229
Registered Linux User #408177
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130923/d25cfc94/attachment-0003.html>


More information about the Pacemaker mailing list