[Pacemaker] NFS resource isn't completely working

Thu Oct 25 18:17:07 UTC 2012

On Wed, Oct 24, 2012 at 5:59 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
> On Wed, Oct 17, 2012 at 8:30 AM, Lonni J Friedman <netllama at gmail.com> wrote:
>> Greetings,
>> I'm trying to get an NFS server export to be correctly monitored &
>> managed by pacemaker, along with pre-existing IP, drbd and filesystem
>> mounts (which are working correctly).  While NFS is up on the primary
>> node (along with the other services), the monitoring portion keeps
>> showing up as a failed action, reported as 'not running'.
>>
>> Here's my current configuration:
>> ################
>> node farm-ljf0 \
>>         attributes standby="off"
>> node farm-ljf1
>> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>>         params ip="10.31.97.100" cidr_netmask="22" nic="eth1" \
>>         op monitor interval="10s" \
>>         meta target-role="Started"
>> primitive FS0 ocf:linbit:drbd \
>>         params drbd_resource="r0" \
>>         op monitor interval="10s" role="Master" \
>>         op monitor interval="30s" role="Slave"
>> primitive FS0_drbd ocf:heartbeat:Filesystem \
>>         params device="/dev/drbd0" directory="/mnt/sdb1" fstype="xfs" \
>>         meta target-role="Started"
>> primitive FS0_nfs systemd:nfs-server \
>>         op monitor interval="10s" \
>>         meta target-role="Started"
>> group g_services ClusterIP FS0_drbd FS0_nfs
>> ms FS0_Clone FS0 \
>>         meta master-max="1" master-node-max="1" clone-max="2"
>> clone-node-max="1" notify="true"
>> colocation fs0_on_drbd inf: g_services FS0_Clone:Master
>> order FS0_drbd-after-FS0 inf: FS0_Clone:promote g_services:start
>> property $id="cib-bootstrap-options" \
>>         dc-version="1.1.8-2.fc16-394e906" \
>>         cluster-infrastructure="openais" \
>>         expected-quorum-votes="2" \
>>         stonith-enabled="false" \
>>         no-quorum-policy="ignore"
>> ################
>>
>> Here's the output from 'crm status'
>> ################
>> Last updated: Tue Oct 16 14:26:22 2012
>> Last change: Tue Oct 16 14:23:18 2012 via cibadmin on farm-ljf1
>> Stack: openais
>> Current DC: farm-ljf1 - partition with quorum
>> Version: 1.1.8-2.fc16-394e906
>> 2 Nodes configured, 2 expected votes
>> 5 Resources configured.
>>
>>
>> Online: [ farm-ljf0 farm-ljf1 ]
>>
>>  Master/Slave Set: FS0_Clone [FS0]
>>      Masters: [ farm-ljf1 ]
>>      Slaves: [ farm-ljf0 ]
>>  Resource Group: g_services
>>      ClusterIP  (ocf::heartbeat:IPaddr2):       Started farm-ljf1
>>      FS0_drbd   (ocf::heartbeat:Filesystem):    Started farm-ljf1
>>      FS0_nfs    (systemd:nfs-server):   Started farm-ljf1
>>
>> Failed actions:
>>     FS0_nfs_monitor_10000 (node=farm-ljf1, call=54357, rc=7,
>> status=complete): not running
>>     FS0_nfs_monitor_10000 (node=farm-ljf0, call=131365, rc=7,
>> status=complete): not running
>> ################
>>
>> When I check the cluster log, I'm seeing a bunch of this stuff:
>
> Your logs start too late I'm afraid.
> We need the earlier entries that show the job FS0_nfs_monitor_10000 failing.
> Be sure to also check the system log file, since that will hopefully
> have some information directly from systemd and/or nfs-server

Hopefully this is what you need:

Oct 16 12:40:54 farm-ljf1 crmd[31139]:   notice: process_lrm_event:
LRM operation FS0_nfs_monitor_0 (call=52, rc=7, cib-update=23,
confirmed=true) not running
Oct 16 13:24:48 farm-ljf1 crmd[7610]:   notice: process_lrm_event: LRM
operation FS0_nfs_monitor_0 (call=42, rc=7, cib-update=18,
confirmed=true) not running
Oct 16 13:24:48 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
Update fail-count-FS0_nfs=(null) failed: No such device or address
Oct 16 13:24:48 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
Update last-failure-FS0_nfs=(null) failed: No such device or address
Oct 16 13:24:48 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
Update fail-count-FS0_nfs=(null) failed: No such device or address
Oct 16 13:24:48 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
Update fail-count-FS0_nfs=(null) failed: No such device or address
Oct 16 13:24:48 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
Update fail-count-FS0_nfs=(null) failed: No such device or address
Oct 16 13:24:48 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
Update fail-count-FS0_nfs=(null) failed: No such device or address
Oct 16 13:24:48 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
Update fail-count-FS0_nfs=(null) failed: No such device or address
Oct 16 13:24:48 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
Update fail-count-FS0_nfs=(null) failed: No such device or address
Oct 16 13:24:48 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
Update fail-count-FS0_nfs=(null) failed: No such device or address
Oct 16 13:24:48 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
Update last-failure-FS0_nfs=(null) failed: No such device or address
Oct 16 13:24:49 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
Update fail-count-FS0_nfs=(null) failed: No such device or address
Oct 16 13:24:49 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
Update fail-count-FS0_nfs=(null) failed: No such device or address
Oct 16 13:24:49 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
Update fail-count-FS0_nfs=(null) failed: No such device or address
Oct 16 13:24:49 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
Update fail-count-FS0_nfs=(null) failed: No such device or address
Oct 16 13:24:49 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
Update fail-count-FS0_nfs=(null) failed: No such device or address
Oct 16 13:24:49 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
Update fail-count-FS0_nfs=(null) failed: No such device or address