[Pacemaker] NFS resource isn't completely working
Andrew Beekhof
andrew at beekhof.net
Thu Oct 25 00:59:34 UTC 2012
On Wed, Oct 17, 2012 at 8:30 AM, Lonni J Friedman <netllama at gmail.com> wrote:
> Greetings,
> I'm trying to get an NFS server export to be correctly monitored &
> managed by pacemaker, along with pre-existing IP, drbd and filesystem
> mounts (which are working correctly). While NFS is up on the primary
> node (along with the other services), the monitoring portion keeps
> showing up as a failed action, reported as 'not running'.
>
> Here's my current configuration:
> ################
> node farm-ljf0 \
> attributes standby="off"
> node farm-ljf1
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
> params ip="10.31.97.100" cidr_netmask="22" nic="eth1" \
> op monitor interval="10s" \
> meta target-role="Started"
> primitive FS0 ocf:linbit:drbd \
> params drbd_resource="r0" \
> op monitor interval="10s" role="Master" \
> op monitor interval="30s" role="Slave"
> primitive FS0_drbd ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" directory="/mnt/sdb1" fstype="xfs" \
> meta target-role="Started"
> primitive FS0_nfs systemd:nfs-server \
> op monitor interval="10s" \
> meta target-role="Started"
> group g_services ClusterIP FS0_drbd FS0_nfs
> ms FS0_Clone FS0 \
> meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> colocation fs0_on_drbd inf: g_services FS0_Clone:Master
> order FS0_drbd-after-FS0 inf: FS0_Clone:promote g_services:start
> property $id="cib-bootstrap-options" \
> dc-version="1.1.8-2.fc16-394e906" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore"
> ################
>
> Here's the output from 'crm status'
> ################
> Last updated: Tue Oct 16 14:26:22 2012
> Last change: Tue Oct 16 14:23:18 2012 via cibadmin on farm-ljf1
> Stack: openais
> Current DC: farm-ljf1 - partition with quorum
> Version: 1.1.8-2.fc16-394e906
> 2 Nodes configured, 2 expected votes
> 5 Resources configured.
>
>
> Online: [ farm-ljf0 farm-ljf1 ]
>
> Master/Slave Set: FS0_Clone [FS0]
> Masters: [ farm-ljf1 ]
> Slaves: [ farm-ljf0 ]
> Resource Group: g_services
> ClusterIP (ocf::heartbeat:IPaddr2): Started farm-ljf1
> FS0_drbd (ocf::heartbeat:Filesystem): Started farm-ljf1
> FS0_nfs (systemd:nfs-server): Started farm-ljf1
>
> Failed actions:
> FS0_nfs_monitor_10000 (node=farm-ljf1, call=54357, rc=7,
> status=complete): not running
> FS0_nfs_monitor_10000 (node=farm-ljf0, call=131365, rc=7,
> status=complete): not running
> ################
>
> When I check the cluster log, I'm seeing a bunch of this stuff:
Your logs start too late I'm afraid.
We need the earlier entries that show the job FS0_nfs_monitor_10000 failing.
Be sure to also check the system log file, since that will hopefully
have some information directly from systemd and/or nfs-server
> #############
> Oct 16 14:23:17 [924] farm-ljf0 attrd: notice:
> attrd_trigger_update: Sending flush op to all hosts for:
> fail-count-FS0_nfs (11939)
> Oct 16 14:23:17 [924] farm-ljf0 attrd: notice:
> attrd_trigger_update: Sending flush op to all hosts for:
> probe_complete (true)
> Oct 16 14:23:17 [924] farm-ljf0 attrd: notice:
> attrd_ais_dispatch: Update relayed from farm-ljf1
> Oct 16 14:23:17 [924] farm-ljf0 attrd: notice:
> attrd_trigger_update: Sending flush op to all hosts for:
> fail-count-FS0_nfs (11940)
> Oct 16 14:23:17 [924] farm-ljf0 attrd: notice:
> attrd_perform_update: Sent update 25471: fail-count-FS0_nfs=11940
> Oct 16 14:23:17 [924] farm-ljf0 attrd: notice:
> attrd_ais_dispatch: Update relayed from farm-ljf1
> Oct 16 14:23:20 [923] farm-ljf0 lrmd: info:
> cancel_recurring_action: Cancelling operation FS0_nfs_status_10000
> Oct 16 14:23:20 [926] farm-ljf0 crmd: info:
> process_lrm_event: LRM operation FS0_nfs_monitor_10000 (call=131365,
> status=1, cib-update=0, confirmed=false) Cancelled
> Oct 16 14:23:20 [923] farm-ljf0 lrmd: info:
> systemd_unit_exec_done: Call to stop passed: type '(o)'
> /org/freedesktop/systemd1/job/1062961
> Oct 16 14:23:20 [926] farm-ljf0 crmd: notice:
> process_lrm_event: LRM operation FS0_nfs_stop_0 (call=131369, rc=0,
> cib-update=35842, confirmed=true) ok
> #############
>
> I'm not sure what any of that means. I'd appreciate some guidance.
>
> thanks!
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list