[Pacemaker] Two node lsb:nfs failing starting second node

Mon Aug 2 16:25:00 UTC 2010

Hi,

On Wed, Jul 28, 2010 at 06:48:49AM -0400, Rick Day wrote:
> I am setting up a two node cluster on RHEL 5.5 with Pacemaker
> 1.0.9.1-1. I have a resource set up to start NFS with lsb. I
> bring up my first node and everything is fine. All the
> resources start up. When I bring up the second node, it appears
> that the NFS resource tries to failover and then it just stops.
> Why would it even try to failover just because I bring the
> second node up? I have another two node cluster set up on
> Centos with a slightly different version of pacemaker and it
> works fine.  Please see configuration and a couple of things
> out of the log file below. Please help.
> 
> node SPDLFILE01 \
> 	attributes standby="off"
> node SPDLFILE02 \
> 	attributes standby="off"
> primitive drbd_nfs ocf:heartbeat:drbd \
> 	params drbd_resource="r0" ignore_deprecation="true" \
> 	op monitor interval="15s" \
> 	op start interval="0" timeout="240" \
> 	op stop interval="0" timeout="100"
> primitive fs_nfs ocf:heartbeat:Filesystem \
> 	params device="/dev/drbd1" directory="/var/nfs" fstype="ext3" \
> 	op start interval="0" timeout="60" \
> 	op stop interval="0" timeout="60"
> primitive ip_nfs ocf:heartbeat:IPaddr2 \
> 	params ip="192.168.104.60" cidr_netmask="32" \
> 	op monitor interval="30s"
> primitive nfs lsb:nfs \
> 	meta target-role="Started"
> group nfs_group fs_nfs ip_nfs
> ms ms_drbd_nfs drbd_nfs \
> 	meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
> location cli-standby-nfs nfs \
> 	rule $id="cli-standby-rule-nfs" -inf: #uname eq SPDLFILE02

Why do you want to prevent nfs running on this node? It won't
help on failover.

> colocation nfs_on_drbd inf: fs_nfs ms_drbd_nfs:Master
> order nfs_after_drbd inf: ms_drbd_nfs:promote fs_nfs:start
> property $id="cib-bootstrap-options" \
> 	dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \
> 	cluster-infrastructure="openais" \
> 	expected-quorum-votes="2" \
> 	stonith-enabled="false" \
> 	no-quorum-policy="ignore" \
> 	last-lrm-refresh="1280277692"
> rsc_defaults $id="rsc-options" \
> 	resource-stickiness="100"
> 
> 
> This is what I see in crm_mon when the error occurs.......
> 
> Resource Group: nfs_group
>      fs_nfs     (ocf::heartbeat:Filesystem):    Started SPDLFILE01
>      ip_nfs     (ocf::heartbeat:IPaddr2):	Started SPDLFILE01
> 
> Failed actions:
>     nfs_monitor_0 (node=SPDLFILE01, call=14, rc=5, status=complete): not install
> ed
> 
> 
> 
> Here are some warnings from the log file........
> 
> lrmd: [8129]: WARN: For LSB init script, no additional parameters are needed.
> Jul 27 21:13:26 SPDLFILE01 crmd: [2850]: WARN: status_from_rc: Action 8 (nfs_monitor_0) on SPDLFILE02 failed (target: 7 vs. rc: 0): Error

Was nfs started on boot?

> Jul 27 21:13:27 SPDLFILE01 pengine: [2849]: WARN: See http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information.
> Jul 27 21:13:27 SPDLFILE01 pengine: [2849]: WARN: native_create_actions: Attempting recovery of resource nfs
> Jul 27 21:13:27 SPDLFILE01 lrmd: [8366]: WARN: For LSB init script, no additional parameters are needed.
> Jul 27 21:13:27 SPDLFILE01 lrmd: [8399]: WARN: For LSB init script, no additional parameters are needed.
> Jul 27 21:13:27 SPDLFILE01 crmd: [2850]: WARN: status_from_rc: Action 42 (nfs_start_0) on SPDLFILE01 failed (target: 0 vs. rc: 1): Error

nfs failed to start on node 1. The system logs should have a
clue. BTW, you should probably use ocf nfsserver RA instead of
lsb:nfs.

Thanks,

Dejan

> Jul 27 21:13:27 SPDLFILE01 crmd: [2850]: WARN: update_failcount: Updating failcount for nfs on SPDLFILE01 after failed start: rc=1 (update=INFINITY, time=1280279607)
> Jul 27 21:13:28 SPDLFILE01 pengine: [2849]: WARN: unpack_rsc_op: Processing failed op nfs_start_0 on SPDLFILE01: unknown error (1)
> Jul 27 21:13:28 SPDLFILE01 pengine: [2849]: WARN: common_apply_stickiness: Forcing nfs away from SPDLFILE01 after 1000000 failures (max=1000000)
> Jul 27 21:13:28 SPDLFILE01 lrmd: [8460]: WARN: For LSB init script, no additional parameters are needed.
> Jul 27 21:13:28 SPDLFILE01 pengine: [2849]: WARN: unpack_rsc_op: Processing failed op nfs_start_0 on SPDLFILE01: unknown error (1)
> Jul 27 21:13:28 SPDLFILE01 pengine: [2849]: WARN: common_apply_stickiness: Forcing nfs away from SPDLFILE01 after 1000000 failures (max=1000000)
> Jul 27 21:28:28 SPDLFILE01 pengine: [2849]: WARN: unpack_rsc_op: Processing failed op nfs_start_0 on SPDLFILE01: unknown error (1)
> Jul 27 21:28:28 SPDLFILE01 pengine: [2849]: WARN: common_apply_stickiness: Forcing nfs away from SPDLFILE01 after 1000000 failures (max=1000000)
> Jul 27 21:32:42 SPDLFILE01 pengine: [2849]: WARN: unpack_rsc_op: Processing failed op nfs_start_0 on SPDLFILE01: unknown error (1)
> Jul 27 21:32:42 SPDLFILE01 pengine: [2849]: WARN: common_apply_stickiness: Forcing nfs away from SPDLFILE01 after 1000000 failures (max=1000000)
> Jul 27 21:32:42 SPDLFILE01 pengine: [2849]: WARN: unpack_rsc_op: Processing failed op nfs_start_0 on SPDLFILE01: unknown error (1)
> Jul 27 21:32:42 SPDLFILE01 pengine: [2849]: WARN: common_apply_stickiness: Forcing nfs away from SPDLFILE01 after 1000000 failures (max=1000000)
> Jul 27 21:41:03 SPDLFILE01 pengine: [2849]: WARN: unpack_rsc_op: Processing failed op nfs_start_0 on SPDLFILE01: unknown error (1)
> Jul 27 21:41:03 SPDLFILE01 pengine: [2849]: WARN: common_apply_stickiness: Forcing nfs away from SPDLFILE01 after 1000000 failures (max=1000000)
> Jul 27 21:41:03 SPDLFILE01 pengine: [2849]: WARN: unpack_rsc_op: Processing failed op nfs_start_0 on SPDLFILE01: unknown error (1)
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker