[ClusterLabs] Failed actions .. constraint confusion

Tue Nov 7 10:10:02 EST 2017

On Mon, 2017-11-06 at 19:55 -0800, Aaron Cody wrote:
> Hello
> I have set up an active/passive HA NFS/DRBD cluster ... on RHEL7.2
> ... and I keep getting this 'Failed Action' message .. not always,
> but sometimes.... :
> 
> Stack: corosync
> Current DC: ha-nfs2.lan.aaroncody.com (version 1.1.16-12.el7_4.4-
> 94ff4df) - partition with quorum
> Last updated: Mon Nov  6 22:52:28 2017
> Last change: Mon Nov  6 22:47:20 2017 by hacluster via crmd on ha-
> nfs2.lan.aaroncody.com
> 
> 2 nodes configured
> 8 resources configured
> 
> Online: [ ha-nfs1.lan.aaroncody.com ha-nfs2.lan.aaroncody.com ]
> 
> Full list of resources:
> 
>  Master/Slave Set: nfs-drbd-clone [nfs-drbd]
>      Masters: [ ha-nfs2.lan.aaroncody.com ]
>      Slaves: [ ha-nfs1.lan.aaroncody.com ]
>  nfs-filesystem (ocf::heartbeat:Filesystem):    Started ha-
> nfs2.lan.aaroncody.com
>  nfs-root	(ocf::heartbeat:exportfs):	Started ha-
> nfs2.lan.aaroncody.com
>  nfs-export1    (ocf::heartbeat:exportfs):	Started ha-
> nfs2.lan.aaroncody.com
>  nfs-server     (ocf::heartbeat:nfsserver):     Started ha-
> nfs2.lan.aaroncody.com
>  nfs-ip (ocf::heartbeat:IPaddr2):	Started ha-
> nfs2.lan.aaroncody.com
>  nfs-notify     (ocf::heartbeat:nfsnotify):     Started ha-
> nfs2.lan.aaroncody.com
> 
> Failed Actions:
> * nfs-server_start_0 on ha-nfs1.lan.aaroncody.com 'unknown error'
> (1): call=40, status=complete, exitreason='Failed to start NFS server
> locking daemons',
>     last-rc-change='Mon Nov  6 22:47:25 2017', queued=0ms, exec=202ms
> 
> 
> 
> So, even though I have all my constraints set up to bring everything
> up on the DRBD master, it seems to still insist on trying to start
> NFS Server on the slave...
> 
> Here are my constraints:
> 
> Location Constraints:
> Ordering Constraints:
>   promote nfs-drbd-clone then start nfs-filesystem (kind:Mandatory)
>   start nfs-filesystem then start nfs-ip (kind:Mandatory)
>   start nfs-ip then start nfs-server (kind:Mandatory)
>   start nfs-server then start nfs-notify (kind:Mandatory)
>   start nfs-server then start nfs-root (kind:Mandatory)
>   start nfs-server then start nfs-export1 (kind:Mandatory)
> Colocation Constraints:
>   nfs-filesystem with nfs-drbd-clone (score:INFINITY) (with-rsc-
> role:Master)
>   nfs-ip with nfs-filesystem (score:INFINITY)
>   nfs-server with nfs-ip (score:INFINITY)
>   nfs-root with nfs-filesystem (score:INFINITY)
>   nfs-export1 with nfs-filesystem (score:INFINITY)
>   nfs-notify with nfs-server (score:INFINITY)
> 
> 
> any ideas what I'm doing wrong here? Did I mess up my constraints?
> 
> TIA
> 

The constraints look good to me. To debug this sort of thing, I would
grab the pe-input file from the transition that tried to start it
wrongly, and use crm_simulate to get more information about it.
crm_simulate is not very user-friendly, so if you can attach the pe-
input file, I can take a look at it. (The pe-input will be listed at
the end of the transition in the logs on the node that was DC at the
time; you'll see a bunch of "pengine:" messages including one that the
resource was scheduled for a start on that particular node.)
-- 
Ken Gaillot <kgaillot at redhat.com>