[Pacemaker] OCFS2 fails to mount file system on node reboot sometimes

Jake Smith jsmith at argotec.com
Wed Mar 9 16:03:36 UTC 2011


I will see about that error and report back. 
However I believe that scsi_hostadapter error has been there all along without causing a problem. 

Thanks, 

Jake Smith 

----- Original Message -----
From: "Andrew Beekhof" <andrew at beekhof.net> 
To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org> 
Cc: "Jake Smith" <jsmith at argotec.com> 
Sent: Wednesday, March 9, 2011 3:54:27 AM 
Subject: Re: [Pacemaker] OCFS2 fails to mount file system on node reboot sometimes 

On Tue, Feb 22, 2011 at 7:56 PM, Jake Smith <jsmith at argotec.com> wrote: 
> I get the following error after reboot sometimes when mounting the ocfs2 
> file system. If I manually stop and restart corosync it mounts fine but if 
> I just try to run cleanup or crm resource start it fails. I don't 
> understand how I am getting no local IP address set when both the bonded 
> links for DRBD sync and bonded links for the network are up. 

I'd suggest starting with why scsi_hostadapter is no longer loaded - 
since that appears to be the first error. 

> 
> 
> 
> corosync.log: 
> 
> Feb 22 13:12:12 Condor crmd: [1246]: info: do_lrm_rsc_op: Performing 
> key=66:4:0:927e853c-e0ee-4f67-a9e7-7cbda27cd316 op=resFS:1_start_0 ) 
> 
> Feb 22 13:12:12 Condor lrmd: [1242]: info: rsc:resFS:1:26: start 
> 
> Feb 22 13:12:12 Condor lrmd: [1242]: info: RA output: (resFS:1:start:stderr) 
> FATAL: Module scsi_hostadapter not found. 
> 
> Feb 22 13:12:12 Condor lrmd: [1242]: info: RA output: (resFS:1:start:stderr) 
> mount.ocfs2: Transport endpoint is not connected 
> 
> Feb 22 13:12:12 Condor lrmd: [1242]: info: RA output: (resFS:1:start:stderr) 
> while mounting /dev/drbd0 on /srv. Check 'dmesg' for more information on 
> this error. 
> 
> Feb 22 13:12:12 Condor crmd: [1246]: info: process_lrm_event: LRM operation 
> resFS:1_start_0 (call=26, rc=1, cib-update=33, confirmed=true) unknown error 
> 
> Feb 22 13:12:12 Condor attrd: [1243]: info: find_hash_entry: Creating hash 
> entry for fail-count-resFS:1 
> 
> Feb 22 13:12:12 Condor attrd: [1243]: info: attrd_trigger_update: Sending 
> flush op to all hosts for: fail-count-resFS:1 (INFINITY) 
> 
> Feb 22 13:12:12 Condor attrd: [1243]: info: attrd_perform_update: Sent 
> update 21: fail-count-resFS:1=INFINITY 
> 
> Feb 22 13:12:12 Condor attrd: [1243]: info: find_hash_entry: Creating hash 
> entry for last-failure-resFS:1 
> 
> Feb 22 13:12:12 Condor attrd: [1243]: info: attrd_trigger_update: Sending 
> flush op to all hosts for: last-failure-resFS:1 (1298398314) 
> 
> Feb 22 13:12:12 Condor attrd: [1243]: info: attrd_perform_update: Sent 
> update 24: last-failure-resFS:1=1298398314 
> 
> Feb 22 13:12:12 Condor crmd: [1246]: info: do_lrm_rsc_op: Performing 
> key=5:5:0:927e853c-e0ee-4f67-a9e7-7cbda27cd316 op=resFS:1_stop_0 ) 
> 
> Feb 22 13:12:12 Condor lrmd: [1242]: info: rsc:resFS:1:27: stop 
> 
> 
> 
> dmesg: 
> 
> [ 23.896124] DLM (built Jan 11 2011 00:00:14) installed 
> 
> [ 23.917418] block drbd0: role( Secondary -> Primary ) 
> 
> [ 24.118912] bond1: no IPv6 routers present 
> 
> [ 25.117097] ocfs2: Registered cluster interface user 
> 
> [ 25.144884] OCFS2 Node Manager 1.5.0 
> 
> [ 25.166762] OCFS2 1.5.0 
> 
> [ 27.085394] bond0: no IPv6 routers present 
> 
> [ 27.305886] dlm: no local IP address has been set 
> 
> [ 27.306168] dlm: cannot start dlm lowcomms -107 
> 
> [ 27.306589] (2370,0):ocfs2_dlm_init:2963 ERROR: status = -107 
> 
> [ 27.306959] (2370,0):ocfs2_mount_volume:1792 ERROR: status = -107 
> 
> [ 27.307289] ocfs2: Unmounting device (147,0) on (node 0) 
> 
> 
> 
> crm_config: 
> 
> node Condor \ 
> 
> attributes standby="off" 
> 
> node Vulture \ 
> 
> attributes standby="off" 
> 
> primitive resDLM ocf:pacemaker:controld \ 
> 
> op monitor interval="120s" 
> 
> primitive resDRBD ocf:linbit:drbd \ 
> 
> params drbd_resource="srv" \ 
> 
> operations $id="resDRBD-operations" \ 
> 
> op monitor interval="20" role="Master" timeout="20" \ 
> 
> op monitor interval="30" role="Slave" timeout="20" 
> 
> primitive resFS ocf:heartbeat:Filesystem \ 
> 
> params device="/dev/drbd/by-res/srv" directory="/srv" fstype="ocfs2" 
> \ 
> 
> op monitor interval="120s" 
> 
> primitive resIDRAC-CONDOR stonith:ipmilan \ 
> 
> params hostname="Condor" ipaddr="192.168.2.61" port="623" auth="md5" 
> priv="admin" login="xxxx" password="xxxx" \ 
> 
> meta target-role="Started" 
> 
> primitive resIDRAC-VULTURE stonith:ipmilan \ 
> 
> params hostname="Vulture" ipaddr="192.168.2.62" port="623" 
> auth="md5" priv="admin" login="xxxx" password="xxxx" \ 
> 
> meta target-role="Started" 
> 
> primitive resO2CB ocf:pacemaker:o2cb \ 
> 
> op monitor interval="120s" 
> 
> primitive resSAMBAVIP ocf:heartbeat:IPaddr2 \ 
> 
> params ip="192.168.2.200" cidr_netmask="32" nic="bond0" 
> clusterip_hash="sourceip" \ 
> 
> op monitor interval="30s" \ 
> 
> meta resource-stickiness="0" 
> 
> ms msDRBD resDRBD \ 
> 
> meta resource-stickiness="100" notify="true" master-max="2" 
> clone-max="2" clone-node-max="1" interleave="true" target-role="Started" 
> 
> clone cloneDLM resDLM \ 
> 
> meta globally-unique="false" interleave="true" target-role="Started" 
> 
> clone cloneFS resFS \ 
> 
> meta interleave="true" ordered="true" target-role="Started" 
> 
> clone cloneO2CB resO2CB \ 
> 
> meta globally-unique="false" interleave="true" target-role="Started" 
> 
> clone cloneSAMBAVIP resSAMBAVIP \ 
> 
> meta globally-unique="true" clone-max="2" clone-node-max="2" 
> target-role="Started" 
> 
> location locIDRAC-CONDOR resIDRAC-CONDOR -inf: Condor 
> 
> location locIDRAC-VULTURE resIDRAC-VULTURE -inf: Vulture 
> 
> colocation colDLMDRBD inf: cloneDLM msDRBD:Master 
> 
> colocation colFSO2CB inf: cloneFS cloneO2CB 
> 
> colocation colFSSAMBAVIP inf: cloneFS cloneSAMBAVIP 
> 
> colocation colO2CBDLM inf: cloneO2CB cloneDLM 
> 
> order ordDLMO2CB 0: cloneDLM cloneO2CB 
> 
> order ordDRBDDLM 0: msDRBD:promote cloneDLM 
> 
> order ordFSSAMBAVIP 0: cloneFS cloneSAMBAVIP 
> 
> order ordO2CBFS 0: cloneO2CB cloneFS 
> 
> property $id="cib-bootstrap-options" \ 
> 
> dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \ 
> 
> cluster-infrastructure="openais" \ 
> 
> expected-quorum-votes="2" \ 
> 
> stonith-enabled="true" \ 
> 
> no-quorum-policy="ignore" \ 
> 
> last-lrm-refresh="1298398491" 
> 
> rsc_defaults $id="rsc-options" \ 
> 
> resource-stickiness="100" 
> 
> 
> 
> Thanks! 
> 
> 
> 
> Jake Smith 
> 
> _______________________________________________ 
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110309/7d859214/attachment.htm>


More information about the Pacemaker mailing list