[Pacemaker] NFS resource isn't completely working

Thu Oct 25 19:59:50 EDT 2012

On Fri, Oct 26, 2012 at 5:17 AM, Lonni J Friedman <netllama at gmail.com> wrote:
> On Wed, Oct 24, 2012 at 5:59 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
>> On Wed, Oct 17, 2012 at 8:30 AM, Lonni J Friedman <netllama at gmail.com> wrote:
>>> Greetings,
>>> I'm trying to get an NFS server export to be correctly monitored &
>>> managed by pacemaker, along with pre-existing IP, drbd and filesystem
>>> mounts (which are working correctly).  While NFS is up on the primary
>>> node (along with the other services), the monitoring portion keeps
>>> showing up as a failed action, reported as 'not running'.
>>>
>>> Here's my current configuration:
>>> ################
>>> node farm-ljf0 \
>>>         attributes standby="off"
>>> node farm-ljf1
>>> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>>>         params ip="10.31.97.100" cidr_netmask="22" nic="eth1" \
>>>         op monitor interval="10s" \
>>>         meta target-role="Started"
>>> primitive FS0 ocf:linbit:drbd \
>>>         params drbd_resource="r0" \
>>>         op monitor interval="10s" role="Master" \
>>>         op monitor interval="30s" role="Slave"
>>> primitive FS0_drbd ocf:heartbeat:Filesystem \
>>>         params device="/dev/drbd0" directory="/mnt/sdb1" fstype="xfs" \
>>>         meta target-role="Started"
>>> primitive FS0_nfs systemd:nfs-server \
>>>         op monitor interval="10s" \
>>>         meta target-role="Started"
>>> group g_services ClusterIP FS0_drbd FS0_nfs
>>> ms FS0_Clone FS0 \
>>>         meta master-max="1" master-node-max="1" clone-max="2"
>>> clone-node-max="1" notify="true"
>>> colocation fs0_on_drbd inf: g_services FS0_Clone:Master
>>> order FS0_drbd-after-FS0 inf: FS0_Clone:promote g_services:start
>>> property $id="cib-bootstrap-options" \
>>>         dc-version="1.1.8-2.fc16-394e906" \
>>>         cluster-infrastructure="openais" \
>>>         expected-quorum-votes="2" \
>>>         stonith-enabled="false" \
>>>         no-quorum-policy="ignore"
>>> ################
>>>
>>> Here's the output from 'crm status'
>>> ################
>>> Last updated: Tue Oct 16 14:26:22 2012
>>> Last change: Tue Oct 16 14:23:18 2012 via cibadmin on farm-ljf1
>>> Stack: openais
>>> Current DC: farm-ljf1 - partition with quorum
>>> Version: 1.1.8-2.fc16-394e906
>>> 2 Nodes configured, 2 expected votes
>>> 5 Resources configured.
>>>
>>>
>>> Online: [ farm-ljf0 farm-ljf1 ]
>>>
>>>  Master/Slave Set: FS0_Clone [FS0]
>>>      Masters: [ farm-ljf1 ]
>>>      Slaves: [ farm-ljf0 ]
>>>  Resource Group: g_services
>>>      ClusterIP  (ocf::heartbeat:IPaddr2):       Started farm-ljf1
>>>      FS0_drbd   (ocf::heartbeat:Filesystem):    Started farm-ljf1
>>>      FS0_nfs    (systemd:nfs-server):   Started farm-ljf1
>>>
>>> Failed actions:
>>>     FS0_nfs_monitor_10000 (node=farm-ljf1, call=54357, rc=7,
>>> status=complete): not running
>>>     FS0_nfs_monitor_10000 (node=farm-ljf0, call=131365, rc=7,
>>> status=complete): not running
>>> ################
>>>
>>> When I check the cluster log, I'm seeing a bunch of this stuff:
>>
>> Your logs start too late I'm afraid.
>> We need the earlier entries that show the job FS0_nfs_monitor_10000 failing.
>> Be sure to also check the system log file, since that will hopefully
>> have some information directly from systemd and/or nfs-server
>
> Hopefully this is what you need:

No.  Posting log fragments is very unreliable, can you please run
crm_report for the time between when you first started the cluster and
Tue Oct 16 14:26:22 2012 instead?
That will have everything we need to help.

>
> Oct 16 12:40:54 farm-ljf1 crmd[31139]:   notice: process_lrm_event:
> LRM operation FS0_nfs_monitor_0 (call=52, rc=7, cib-update=23,
> confirmed=true) not running
> Oct 16 13:24:48 farm-ljf1 crmd[7610]:   notice: process_lrm_event: LRM
> operation FS0_nfs_monitor_0 (call=42, rc=7, cib-update=18,
> confirmed=true) not running
> Oct 16 13:24:48 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
> Update fail-count-FS0_nfs=(null) failed: No such device or address
> Oct 16 13:24:48 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
> Update last-failure-FS0_nfs=(null) failed: No such device or address
> Oct 16 13:24:48 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
> Update fail-count-FS0_nfs=(null) failed: No such device or address
> Oct 16 13:24:48 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
> Update fail-count-FS0_nfs=(null) failed: No such device or address
> Oct 16 13:24:48 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
> Update fail-count-FS0_nfs=(null) failed: No such device or address
> Oct 16 13:24:48 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
> Update fail-count-FS0_nfs=(null) failed: No such device or address
> Oct 16 13:24:48 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
> Update fail-count-FS0_nfs=(null) failed: No such device or address
> Oct 16 13:24:48 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
> Update fail-count-FS0_nfs=(null) failed: No such device or address
> Oct 16 13:24:48 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
> Update fail-count-FS0_nfs=(null) failed: No such device or address
> Oct 16 13:24:48 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
> Update last-failure-FS0_nfs=(null) failed: No such device or address
> Oct 16 13:24:49 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
> Update fail-count-FS0_nfs=(null) failed: No such device or address
> Oct 16 13:24:49 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
> Update fail-count-FS0_nfs=(null) failed: No such device or address
> Oct 16 13:24:49 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
> Update fail-count-FS0_nfs=(null) failed: No such device or address
> Oct 16 13:24:49 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
> Update fail-count-FS0_nfs=(null) failed: No such device or address
> Oct 16 13:24:49 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
> Update fail-count-FS0_nfs=(null) failed: No such device or address
> Oct 16 13:24:49 farm-ljf1 attrd[7608]:  warning: attrd_cib_callback:
> Update fail-count-FS0_nfs=(null) failed: No such device or address
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org