[ClusterLabs] Service pacemaker start kills my cluster and other NFS HA issues

Tue Aug 30 15:49:26 UTC 2016

Hello,

I have set up a DRBD-Corosync-Pacemaker cluster following the instructions from https://wiki.ubuntu.com/ClusterStack/Natty adapting them to CentOS 7 (e.g: using systemd). After testing it in Virtual Machines it seemed to be working fine, so it is now implemented in physical machines, and I have noticed that the failover works fine as long as I kill the master by pulling the AC cable, but not if I issue the halt, reboot or shutdown commands, that makes the cluster get in a situation like this:

Last updated: Tue Aug 30 16:55:58 2016          Last change: Tue Aug 23 11:49:43 2016 by hacluster via crmd on nfsha2
Stack: corosync
Current DC: nfsha2 (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum
2 nodes and 9 resources configured

Online: [ nfsha1 nfsha2 ]

 Master/Slave Set: ms_drbd_export [res_drbd_export]
     Masters: [ nfsha2 ]
     Slaves: [ nfsha1 ]
 Resource Group: rg_export
     res_fs     (ocf::heartbeat:Filesystem):    Started nfsha2
     res_exportfs_export1    (ocf::heartbeat:exportfs):    FAILED nfsha2 (unmanaged)
     res_ip     (ocf::heartbeat:IPaddr2):    Stopped
 Clone Set: cl_nfsserver [res_nfsserver]
     Started: [ nfsha1 ]
 Clone Set: cl_exportfs_root [res_exportfs_root]
     res_exportfs_root  (ocf::heartbeat:exportfs):    FAILED nfsha2
     Started: [ nfsha1 ]

Migration Summary:
* Node 2:
   res_exportfs_export1: migration-threshold=1000000 fail-count=1000000    last-failure='Tue Aug 30 16:55:50 2016'
   res_exportfs_root: migration-threshold=1000000 fail-count=1 last-failure='Tue Aug 30 16:55:48 2016'
* Node 1:

Failed Actions:
* res_exportfs_export1_stop_0 on nfsha2 'unknown error' (1): call=134, status=Timed Out, exitreason='non
e',
    last-rc-change='Tue Aug 30 16:55:30 2016', queued=0ms, exec=20001ms
* res_exportfs_root_monitor_30000 on nfsha2 'not running' (7): call=126, status=complete, exitreason='no
ne',
    last-rc-change='Tue Aug 30 16:55:48 2016', queued=0ms, exec=0ms

This of course blocks it, because the IP and the NFS exports are down. It doesn't even recognize that the other node is down. I am then forced to do "crm_resource -P" to get it back to a working state.

Even when unplugging the master, and booting it up again, trying to get it back in the cluster executing "service pacemaker start" on the node that was unplugged will sometimes just cause the exportfs_root resource on the slave to fail (but the service is still up):

 Master/Slave Set: ms_drbd_export [res_drbd_export]
     Masters: [ nfsha1 ]
     Slaves: [ nfsha2 ]
 Resource Group: rg_export
     res_fs     (ocf::heartbeat:Filesystem):    Started nfsha1
     res_exportfs_export1    (ocf::heartbeat:exportfs):    Started nfsha1
     res_ip     (ocf::heartbeat:IPaddr2):    Started nfsha1
 Clone Set: cl_nfsserver [res_nfsserver]
     Started: [ nfsha1 nfsha2 ]
 Clone Set: cl_exportfs_root [res_exportfs_root]
     Started: [ nfsha1 nfsha2 ]

Migration Summary:
* Node nfsha2:
   res_exportfs_root: migration-threshold=1000000 fail-count=1 last-failure='Tue Aug 30 17:18:17 2016'
* Node nfsha1:

Failed Actions:
* res_exportfs_root_monitor_30000 on nfsha2 'not running' (7): call=34, status=complete, exitreason='non
e',
    last-rc-change='Tue Aug 30 17:18:17 2016', queued=0ms, exec=33ms

BTW I notice that the node attributes are changed:

Node Attributes:
* Node nfsha1:
    + master-res_drbd_export            : 10000
* Node nfsha2:
    + master-res_drbd_export            : 1000

Usually both would have the same weight (10000), so running "crm_resource -P" restores that.

Some other times it will instead cause a service disruption:

Online: [ nfsha1 nfsha2 ]

 Master/Slave Set: ms_drbd_export [res_drbd_export]
     Masters: [ nfsha2 ]
     Slaves: [ nfsha1 ]
 Resource Group: rg_export
     res_fs     (ocf::heartbeat:Filesystem):    Started nfsha2
     res_exportfs_export1    (ocf::heartbeat:exportfs):    FAILED (unmanaged)[ nfsha2 nfsha1 ]
     res_ip     (ocf::heartbeat:IPaddr2):    Stopped
 Clone Set: cl_nfsserver [res_nfsserver]
     Started: [ nfsha1 nfsha2 ]
 Clone Set: cl_exportfs_root [res_exportfs_root]
     Started: [ nfsha1 nfsha2]

Migration Summary:
* Node nfsha2:
   res_exportfs_export1: migration-threshold=1000000 fail-count=1000000    last-failure='Tue Aug 30 17:31:01 2016'
* Node nfsha1:
   res_exportfs_export1: migration-threshold=1000000 fail-count=1000000    last-failure='Tue Aug 30 17:31:01 2016'
   res_exportfs_root: migration-threshold=1000000 fail-count=1 last-failure='Tue Aug 30 17:31:11 2016'

Failed Actions:
* res_exportfs_export1_stop_0 on nfsha2 'unknown error' (1): call=86, status=Timed Out, exitreason='none
',
    last-rc-change='Tue Aug 30 17:30:41 2016', queued=0ms, exec=20002ms
* res_exportfs_export1_stop_0 on nfsha1 'unknown error' (1): call=32, status=Timed Out, exitreason='none
',
    last-rc-change='Tue Aug 30 17:30:41 2016', queued=0ms, exec=20002ms
* res_exportfs_root_monitor_30000 on nfsha1 'not running' (7): call=29, status=complete, exitreason='non
e',
    last-rc-change='Tue Aug 30 17:31:11 2016', queued=0ms, exec=0ms

Then executing "crm_resource -P" brings it back to life, but if that command is not executed the cluster remains blocked until after around 10 mins when it sometimes gets magically back (like an auto execution of crm_resource -P).

In case it helps, the CRM configuration is this one:

node 1: nfsha1
node 2: nfsha2 \
        attributes standby=off
primitive res_drbd_export ocf:linbit:drbd \
        params drbd_resource=export
primitive res_exportfs_export1 exportfs \
        params fsid=1 directory="/mnt/export/export1" options="rw,root_squash,mountpoint" clientspec="*.0/255.255.255.0" wait_for_leasetime_on_stop=true \
        op monitor interval=30s \
        meta target-role=Started
primitive res_exportfs_root exportfs \
        params fsid=0 directory="/mnt/export" options="rw,crossmnt" clientspec="*.0/255.255.255.0" \
        op monitor interval=30s \
        meta target-role=Started
primitive res_fs Filesystem \
        params device="/dev/drbd0" directory="/mnt/export" fstype=ext3 \
        meta target-role=Started
primitive res_ip IPaddr2 \
        params ip=*.46 cidr_netmask=24 nic=eno1
primitive res_nfsserver systemd:nfs-server \
        op monitor interval=30s
group rg_export res_fs res_exportfs_export1 res_ip
ms ms_drbd_export res_drbd_export \
        meta notify=true master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
clone cl_exportfs_root res_exportfs_root
clone cl_nfsserver res_nfsserver
colocation c_export_on_drbd inf: rg_export ms_drbd_export:Master
colocation c_nfs_on_root inf: rg_export cl_exportfs_root
order o_drbd_before_nfs inf: ms_drbd_export:promote rg_export:start
order o_root_before_nfs inf: cl_exportfs_root rg_export:start
property cib-bootstrap-options: \
        maintenance-mode=false \
        stonith-enabled=false \
        no-quorum-policy=ignore \
        have-watchdog=false \
        dc-version=1.1.13-10.el7_2.4-44eb2dd \
        cluster-infrastructure=corosync \
        cluster-name=nfsha

And the corosync.conf:

totem {
version: 2
# Corosync itself works without a cluster name, but DLM needs one.
# The cluster name is also written into the VG metadata of newly
# created shared LVM volume groups, if lvmlockd uses DLM locking.
# It is also used for computing mcastaddr, unless overridden below.
cluster_name: nfsha
# How long before declaring a token lost (ms)
token: 3000
# How many token retransmits before forming a new configuration
token_retransmits_before_loss_const: 10
# Limit generated nodeids to 31-bits (positive signed integers)
clear_node_high_bit: yes
# crypto_cipher and crypto_hash: Used for mutual node authentication.
# If you choose to enable this, then do remember to create a shared
# secret with "corosync-keygen".
# enabling crypto_cipher, requires also enabling of crypto_hash.
# crypto_cipher and crypto_hash should be used instead of deprecated
# secauth parameter.
# Valid values for crypto_cipher are none (no encryption), aes256, aes192,
# aes128 and 3des. Enabling crypto_cipher, requires also enabling of
# crypto_hash.
crypto_cipher: none
# Valid values for crypto_hash are none (no authentication), md5, sha1,
# sha256, sha384 and sha512.
crypto_hash: none
# Optionally assign a fixed node id (integer)
# nodeid: 1234
transport: udpu
}
nodelist {
node {
ring0_addr: *.50
nodeid: 1
}
node {
ring0_addr:*.51
nodeid: 2
}
}
logging {
to_syslog: yes
}

quorum {
# Enable and configure quorum subsystem (default: off)
# see also corosync.conf.5 and votequorum.5
provider: corosync_votequorum
expected_votes: 2
}

So as you can imagine I am really puzzled about all this and would certainly welcome any help about what might be wrong with the current configuration.

Thank you very much, kind regards

Pablo

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160830/22d112d9/attachment-0003.html>