[Pacemaker] NFS resource isn't completely working

Tue Oct 16 21:30:55 UTC 2012

Greetings,
I'm trying to get an NFS server export to be correctly monitored &
managed by pacemaker, along with pre-existing IP, drbd and filesystem
mounts (which are working correctly).  While NFS is up on the primary
node (along with the other services), the monitoring portion keeps
showing up as a failed action, reported as 'not running'.

Here's my current configuration:
################
node farm-ljf0 \
	attributes standby="off"
node farm-ljf1
primitive ClusterIP ocf:heartbeat:IPaddr2 \
	params ip="10.31.97.100" cidr_netmask="22" nic="eth1" \
	op monitor interval="10s" \
	meta target-role="Started"
primitive FS0 ocf:linbit:drbd \
	params drbd_resource="r0" \
	op monitor interval="10s" role="Master" \
	op monitor interval="30s" role="Slave"
primitive FS0_drbd ocf:heartbeat:Filesystem \
	params device="/dev/drbd0" directory="/mnt/sdb1" fstype="xfs" \
	meta target-role="Started"
primitive FS0_nfs systemd:nfs-server \
	op monitor interval="10s" \
	meta target-role="Started"
group g_services ClusterIP FS0_drbd FS0_nfs
ms FS0_Clone FS0 \
	meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
colocation fs0_on_drbd inf: g_services FS0_Clone:Master
order FS0_drbd-after-FS0 inf: FS0_Clone:promote g_services:start
property $id="cib-bootstrap-options" \
	dc-version="1.1.8-2.fc16-394e906" \
	cluster-infrastructure="openais" \
	expected-quorum-votes="2" \
	stonith-enabled="false" \
	no-quorum-policy="ignore"
################

Here's the output from 'crm status'
################
Last updated: Tue Oct 16 14:26:22 2012
Last change: Tue Oct 16 14:23:18 2012 via cibadmin on farm-ljf1
Stack: openais
Current DC: farm-ljf1 - partition with quorum
Version: 1.1.8-2.fc16-394e906
2 Nodes configured, 2 expected votes
5 Resources configured.

Online: [ farm-ljf0 farm-ljf1 ]

 Master/Slave Set: FS0_Clone [FS0]
     Masters: [ farm-ljf1 ]
     Slaves: [ farm-ljf0 ]
 Resource Group: g_services
     ClusterIP	(ocf::heartbeat:IPaddr2):	Started farm-ljf1
     FS0_drbd	(ocf::heartbeat:Filesystem):	Started farm-ljf1
     FS0_nfs	(systemd:nfs-server):	Started farm-ljf1

Failed actions:
    FS0_nfs_monitor_10000 (node=farm-ljf1, call=54357, rc=7,
status=complete): not running
    FS0_nfs_monitor_10000 (node=farm-ljf0, call=131365, rc=7,
status=complete): not running
################

When I check the cluster log, I'm seeing a bunch of this stuff:
#############
Oct 16 14:23:17 [924] farm-ljf0      attrd:   notice:
attrd_trigger_update: 	Sending flush op to all hosts for:
fail-count-FS0_nfs (11939)
Oct 16 14:23:17 [924] farm-ljf0      attrd:   notice:
attrd_trigger_update: 	Sending flush op to all hosts for:
probe_complete (true)
Oct 16 14:23:17 [924] farm-ljf0      attrd:   notice:
attrd_ais_dispatch: 	Update relayed from farm-ljf1
Oct 16 14:23:17 [924] farm-ljf0      attrd:   notice:
attrd_trigger_update: 	Sending flush op to all hosts for:
fail-count-FS0_nfs (11940)
Oct 16 14:23:17 [924] farm-ljf0      attrd:   notice:
attrd_perform_update: 	Sent update 25471: fail-count-FS0_nfs=11940
Oct 16 14:23:17 [924] farm-ljf0      attrd:   notice:
attrd_ais_dispatch: 	Update relayed from farm-ljf1
Oct 16 14:23:20 [923] farm-ljf0       lrmd:     info:
cancel_recurring_action: 	Cancelling operation FS0_nfs_status_10000
Oct 16 14:23:20 [926] farm-ljf0       crmd:     info:
process_lrm_event: 	LRM operation FS0_nfs_monitor_10000 (call=131365,
status=1, cib-update=0, confirmed=false) Cancelled
Oct 16 14:23:20 [923] farm-ljf0       lrmd:     info:
systemd_unit_exec_done: 	Call to stop passed: type '(o)'
/org/freedesktop/systemd1/job/1062961
Oct 16 14:23:20 [926] farm-ljf0       crmd:   notice:
process_lrm_event: 	LRM operation FS0_nfs_stop_0 (call=131369, rc=0,
cib-update=35842, confirmed=true) ok
#############

I'm not sure what any of that means.  I'd appreciate some guidance.

thanks!