[Pacemaker] OCFS2 fails to mount file system on node reboot sometimes

Wed Mar 9 03:54:27 EST 2011

On Tue, Feb 22, 2011 at 7:56 PM, Jake Smith <jsmith at argotec.com> wrote:
> I get the following error after reboot sometimes when mounting the ocfs2
> file system.  If I manually stop and restart corosync it mounts fine but if
> I just try to run cleanup or crm resource start it fails.  I don't
> understand how I am getting no local IP address set when both the bonded
> links for DRBD sync and bonded links for the network are up.

I'd suggest starting with why scsi_hostadapter is no longer loaded -
since that appears to be the first error.

>
>
>
> corosync.log:
>
> Feb 22 13:12:12 Condor crmd: [1246]: info: do_lrm_rsc_op: Performing
> key=66:4:0:927e853c-e0ee-4f67-a9e7-7cbda27cd316 op=resFS:1_start_0 )
>
> Feb 22 13:12:12 Condor lrmd: [1242]: info: rsc:resFS:1:26: start
>
> Feb 22 13:12:12 Condor lrmd: [1242]: info: RA output: (resFS:1:start:stderr)
> FATAL: Module scsi_hostadapter not found.
>
> Feb 22 13:12:12 Condor lrmd: [1242]: info: RA output: (resFS:1:start:stderr)
> mount.ocfs2: Transport endpoint is not connected
>
> Feb 22 13:12:12 Condor lrmd: [1242]: info: RA output: (resFS:1:start:stderr)
> while mounting /dev/drbd0 on /srv. Check 'dmesg' for more information on
> this error.
>
> Feb 22 13:12:12 Condor crmd: [1246]: info: process_lrm_event: LRM operation
> resFS:1_start_0 (call=26, rc=1, cib-update=33, confirmed=true) unknown error
>
> Feb 22 13:12:12 Condor attrd: [1243]: info: find_hash_entry: Creating hash
> entry for fail-count-resFS:1
>
> Feb 22 13:12:12 Condor attrd: [1243]: info: attrd_trigger_update: Sending
> flush op to all hosts for: fail-count-resFS:1 (INFINITY)
>
> Feb 22 13:12:12 Condor attrd: [1243]: info: attrd_perform_update: Sent
> update 21: fail-count-resFS:1=INFINITY
>
> Feb 22 13:12:12 Condor attrd: [1243]: info: find_hash_entry: Creating hash
> entry for last-failure-resFS:1
>
> Feb 22 13:12:12 Condor attrd: [1243]: info: attrd_trigger_update: Sending
> flush op to all hosts for: last-failure-resFS:1 (1298398314)
>
> Feb 22 13:12:12 Condor attrd: [1243]: info: attrd_perform_update: Sent
> update 24: last-failure-resFS:1=1298398314
>
> Feb 22 13:12:12 Condor crmd: [1246]: info: do_lrm_rsc_op: Performing
> key=5:5:0:927e853c-e0ee-4f67-a9e7-7cbda27cd316 op=resFS:1_stop_0 )
>
> Feb 22 13:12:12 Condor lrmd: [1242]: info: rsc:resFS:1:27: stop
>
>
>
> dmesg:
>
> [   23.896124] DLM (built Jan 11 2011 00:00:14) installed
>
> [   23.917418] block drbd0: role( Secondary -> Primary )
>
> [   24.118912] bond1: no IPv6 routers present
>
> [   25.117097] ocfs2: Registered cluster interface user
>
> [   25.144884] OCFS2 Node Manager 1.5.0
>
> [   25.166762] OCFS2 1.5.0
>
> [   27.085394] bond0: no IPv6 routers present
>
> [   27.305886] dlm: no local IP address has been set
>
> [   27.306168] dlm: cannot start dlm lowcomms -107
>
> [   27.306589] (2370,0):ocfs2_dlm_init:2963 ERROR: status = -107
>
> [   27.306959] (2370,0):ocfs2_mount_volume:1792 ERROR: status = -107
>
> [   27.307289] ocfs2: Unmounting device (147,0) on (node 0)
>
>
>
> crm_config:
>
> node Condor \
>
>         attributes standby="off"
>
> node Vulture \
>
>         attributes standby="off"
>
> primitive resDLM ocf:pacemaker:controld \
>
>         op monitor interval="120s"
>
> primitive resDRBD ocf:linbit:drbd \
>
>         params drbd_resource="srv" \
>
>         operations $id="resDRBD-operations" \
>
>         op monitor interval="20" role="Master" timeout="20" \
>
>         op monitor interval="30" role="Slave" timeout="20"
>
> primitive resFS ocf:heartbeat:Filesystem \
>
>         params device="/dev/drbd/by-res/srv" directory="/srv" fstype="ocfs2"
> \
>
>         op monitor interval="120s"
>
> primitive resIDRAC-CONDOR stonith:ipmilan \
>
>         params hostname="Condor" ipaddr="192.168.2.61" port="623" auth="md5"
> priv="admin" login="xxxx" password="xxxx" \
>
>         meta target-role="Started"
>
> primitive resIDRAC-VULTURE stonith:ipmilan \
>
>         params hostname="Vulture" ipaddr="192.168.2.62" port="623"
> auth="md5" priv="admin" login="xxxx" password="xxxx" \
>
>         meta target-role="Started"
>
> primitive resO2CB ocf:pacemaker:o2cb \
>
>         op monitor interval="120s"
>
> primitive resSAMBAVIP ocf:heartbeat:IPaddr2 \
>
>         params ip="192.168.2.200" cidr_netmask="32" nic="bond0"
> clusterip_hash="sourceip" \
>
>         op monitor interval="30s" \
>
>         meta resource-stickiness="0"
>
> ms msDRBD resDRBD \
>
>         meta resource-stickiness="100" notify="true" master-max="2"
> clone-max="2" clone-node-max="1" interleave="true" target-role="Started"
>
> clone cloneDLM resDLM \
>
>         meta globally-unique="false" interleave="true" target-role="Started"
>
> clone cloneFS resFS \
>
>         meta interleave="true" ordered="true" target-role="Started"
>
> clone cloneO2CB resO2CB \
>
>         meta globally-unique="false" interleave="true" target-role="Started"
>
> clone cloneSAMBAVIP resSAMBAVIP \
>
>         meta globally-unique="true" clone-max="2" clone-node-max="2"
> target-role="Started"
>
> location locIDRAC-CONDOR resIDRAC-CONDOR -inf: Condor
>
> location locIDRAC-VULTURE resIDRAC-VULTURE -inf: Vulture
>
> colocation colDLMDRBD inf: cloneDLM msDRBD:Master
>
> colocation colFSO2CB inf: cloneFS cloneO2CB
>
> colocation colFSSAMBAVIP inf: cloneFS cloneSAMBAVIP
>
> colocation colO2CBDLM inf: cloneO2CB cloneDLM
>
> order ordDLMO2CB 0: cloneDLM cloneO2CB
>
> order ordDRBDDLM 0: msDRBD:promote cloneDLM
>
> order ordFSSAMBAVIP 0: cloneFS cloneSAMBAVIP
>
> order ordO2CBFS 0: cloneO2CB cloneFS
>
> property $id="cib-bootstrap-options" \
>
>         dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
>
>         cluster-infrastructure="openais" \
>
>         expected-quorum-votes="2" \
>
>         stonith-enabled="true" \
>
>         no-quorum-policy="ignore" \
>
>         last-lrm-refresh="1298398491"
>
> rsc_defaults $id="rsc-options" \
>
>         resource-stickiness="100"
>
>
>
> Thanks!
>
>
>
> Jake Smith
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>