[Pacemaker] DRBD active/passive on Pacemaker+CMAN cluster unexpectedly performs STONITH when promoting

Thu Jul 3 02:05:36 UTC 2014

Hi all,
I deployed a 2 nodes (physical) RHCS Pacemaker cluster on CentOS 6.5 x86_64 (fully up-to-date) with:

cman-3.0.12.1-59.el6_5.2.x86_64
pacemaker-1.1.10-14.el6_5.3.x86_64
pcs-0.9.90-2.el6.centos.3.noarch
qemu-kvm-0.12.1.2-2.415.el6_5.10.x86_64
qemu-kvm-tools-0.12.1.2-2.415.el6_5.10.x86_64
drbd-utils-8.9.0-1.el6.x86_64
drbd-udev-8.9.0-1.el6.x86_64
drbd-rgmanager-8.9.0-1.el6.x86_64
drbd-bash-completion-8.9.0-1.el6.x86_64
drbd-pacemaker-8.9.0-1.el6.x86_64
drbd-8.9.0-1.el6.x86_64
drbd-km-2.6.32_431.20.3.el6.x86_64-8.4.5-1.x86_64
kernel-2.6.32-431.20.3.el6.x86_64

The aim is to run KVM virtual machines backed by DRBD (8.4.5) in an active/passive mode (no dual primary and so no live migration).
Just to err on the side of consistency against HA (and to pave the way for a possible dual-primary live-migration-capable setup), I configured DRBD for resource-and-stonith with rhcs_fence (that's why I installed drbd-rgmanager) as fence-peer handler and stonith devices configured in Pacemaker (pcmk-redirect in cluster.conf).

The setup "almost" works (all seems ok with: "pcs status", "crm_mon -Arf1", "corosync-cfgtool -s", "corosync-objctl | grep member") , but every time it needs a resource promotion (to Master, i.e. becoming primary) it either fails or fences the other node (the one supposed to become Slave i.e. secondary) and only then succeeds.
It happens, for example both on initial resource definition (when attempting first start) and on node entering standby (when trying to automatically move the resources by stopping then starting them).

I collected a full "pcs cluster report" and I can provide a CIB dump, but I will initially paste here an excerpt from my configuration just in case it happens to be a simple configuration error that someone can spot on the fly ;> (hoping...)

Keep in mind that the setup has separated redundant network connections for LAN (1 Gib/s LACP to switches), Corosync (1 Gib/s roundrobin back-to-back) and DRBD (10 Gib/s roundrobin back-to-back) and that FQDNs are correctly resolved through /etc/hosts

DRBD:

/etc/drbd.d/global_common.conf:

------------------------------------------------------------------------------------------------------

global {
    usage-count no;
}

common {
    protocol C;
    disk {
        on-io-error        detach;
        fencing            resource-and-stonith;
        disk-barrier        no;
        disk-flushes        no;
        al-extents        3389;
        c-plan-ahead        200;
        c-fill-target        15M;
        c-max-rate        100M;
        c-min-rate        10M;
    }
    net {
        after-sb-0pri        discard-zero-changes;
        after-sb-1pri        discard-secondary;
        after-sb-2pri        disconnect;
        csums-alg        sha1;
        data-integrity-alg    sha1;
        max-buffers        8000;
        max-epoch-size        8000;
        unplug-watermark    16;
        sndbuf-size        0;
        verify-alg        sha1;
    }
    startup {
        wfc-timeout        300;
        outdated-wfc-timeout    80;
        degr-wfc-timeout    120;
    }
    handlers {
        fence-peer        "/usr/lib/drbd/rhcs_fence";
    }
}

------------------------------------------------------------------------------------------------------

Sample DRBD resource (there are others, similar)
/etc/drbd.d/dc_vm.res:

------------------------------------------------------------------------------------------------------

resource dc_vm {
device          /dev/drbd1;
disk            /dev/VolGroup00/dc_vm;
meta-disk       internal;
on cluster1.verolengo.privatelan {
address ipv4 172.16.200.1:7790;
}
on cluster2.verolengo.privatelan {
address ipv4 172.16.200.2:7790;
}
}

------------------------------------------------------------------------------------------------------

RHCS:

/etc/cluster/cluster.conf

------------------------------------------------------------------------------------------------------

<?xml version="1.0"?>
<cluster name="vclu" config_version="14">
  <cman two_node="1" expected_votes="1" keyfile="/etc/corosync/authkey" transport="udpu" port="5405"/>
  <totem consensus="60000" join="6000" token="100000" token_retransmits_before_loss_const="20" rrp_mode="passive" secauth="on"/>
  <clusternodes>
    <clusternode name="cluster1.verolengo.privatelan" votes="1" nodeid="1">
      <altname name="clusterlan1.verolengo.privatelan" port="6405"/>
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="cluster1.verolengo.privatelan"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="cluster2.verolengo.privatelan" votes="1" nodeid="2">
      <altname name="clusterlan2.verolengo.privatelan" port="6405"/>
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="cluster2.verolengo.privatelan"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
    <fencedevice name="pcmk" agent="fence_pcmk"/>
  </fencedevices>
  <fence_daemon clean_start="0" post_fail_delay="30" post_join_delay="30"/>
  <logging debug="on"/>
  <rm disabled="1">
    <failoverdomains/>
    <resources/>
  </rm>
</cluster>

------------------------------------------------------------------------------------------------------

Pacemaker:

PROPERTIES:

pcs property set default-resource-stickiness=100
pcs property set no-quorum-policy=ignore

STONITH:

pcs stonith create ilocluster1 fence_ilo2 action="off" delay="10" \
    ipaddr="ilocluster1.verolengo.privatelan" login="cluster2" passwd="test" power_wait="4" \
    pcmk_host_check="static-list" pcmk_host_list="cluster1.verolengo.privatelan" op monitor interval=60s
pcs stonith create ilocluster2 fence_ilo2 action="off" \
    ipaddr="ilocluster2.verolengo.privatelan" login="cluster1" passwd="test" power_wait="4" \
    pcmk_host_check="static-list" pcmk_host_list="cluster2.verolengo.privatelan" op monitor interval=60s
pcs stonith create pdu1 fence_apc action="off" \
    ipaddr="pdu1.verolengo.privatelan" login="cluster" passwd="test" \
 pcmk_host_map="cluster1.verolengo.privatelan:3,cluster1.verolengo.privatelan:4,cluster2.verolengo.privatelan:6,cluster2.verolengo.privatelan:7" \
    pcmk_host_check="static-list" pcmk_host_list="cluster1.verolengo.privatelan,cluster2.verolengo.privatelan" op monitor interval=60s

pcs stonith level add 1 cluster1.verolengo.privatelan ilocluster1
pcs stonith level add 2 cluster1.verolengo.privatelan pdu1
pcs stonith level add 1 cluster2.verolengo.privatelan ilocluster2
pcs stonith level add 2 cluster2.verolengo.privatelan pdu1

pcs property set stonith-enabled=true
pcs property set stonith-action=off

SAMPLE RESOURCE:

pcs cluster cib dc_cfg
pcs -f dc_cfg resource create DCVMDisk ocf:linbit:drbd \
    drbd_resource=dc_vm op monitor interval="31s" role="Master" \
    op monitor interval="29s" role="Slave" \
    op start interval="0" timeout="120s" \
    op stop interval="0" timeout="180s"
pcs -f dc_cfg resource master DCVMDiskClone DCVMDisk \
    master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \
    notify=true target-role=Started is-managed=true
pcs -f dc_cfg resource create DCVM ocf:heartbeat:VirtualDomain \
    config=/etc/libvirt/qemu/dc.xml migration_transport=tcp migration_network_suffix=-10g \
    hypervisor=qemu:///system meta allow-migrate=false target-role=Started is-managed=true \
    op start interval="0" timeout="120s" \
    op stop interval="0" timeout="120s" \
    op monitor interval="60s" timeout="120s"
pcs -f dc_cfg constraint colocation add DCVM DCVMDiskClone INFINITY with-rsc-role=Master
pcs -f dc_cfg constraint order promote DCVMDiskClone then start DCVM
pcs -f dc_cfg constraint location DCVM prefers cluster2.verolengo.privatelan=50
pcs cluster cib-push firewall_cfg

Since I know that pcs still has some rough edges, I installed crmsh too, but never actually used it.

Many thanks in advance for your attention.

Kind regards,
Giuseppe Ragusa

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140703/3374cab0/attachment-0003.html>