[Pacemaker] cannot mount gfs2 filesystem
Soni Maula Harriz
soni.harriz at sangkuriang.co.id
Wed Oct 31 05:15:25 UTC 2012
On Tue, Oct 30, 2012 at 12:20 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
> On Mon, Oct 29, 2012 at 4:22 PM, Soni Maula Harriz
> <soni.harriz at sangkuriang.co.id> wrote:
> > dear all,
> > i configure pacemaker and corosync on 2 Centos 6.3 servers by following
> > instruction on 'Cluster from Scratch'.
> > on the beginning, i follow 'Cluster from Scratch' edition 5. but, since i
> > use centos, i change to 'Cluster from Scratch' edition 3 to configure
> > active/active servers.
> > Now on 1st server (cluster1), the Filesystem resource cannot start. the
> gfs2
> > filesystem can't be mounted.
> >
> > this is the crm configuration
> > [root at cluster2 ~]# crm configure show
> > node cluster1 \
> > attributes standby="off"
> > node cluster2 \
> > attributes standby="off"
> > primitive ClusterIP ocf:heartbeat:IPaddr2 \
> > params ip="xxx.xxx.xxx.229" cidr_netmask="32"
> clusterip_hash="sourceip"
> > \
> > op monitor interval="30s"
> > primitive WebData ocf:linbit:drbd \
> > params drbd_resource="wwwdata" \
> > op monitor interval="60s"
> > primitive WebFS ocf:heartbeat:Filesystem \
> > params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html"
> > fstype="gfs2"
> > primitive WebSite ocf:heartbeat:apache \
> > params configfile="/etc/httpd/conf/httpd.conf"
> > statusurl="http://localhost/server-status" \
> > op monitor interval="1min"
> > ms WebDataClone WebData \
> > meta master-max="2" master-node-max="1" clone-max="2"
> clone-node-max="1"
> > notify="true"
> > clone WebFSClone WebFS
> > clone WebIP ClusterIP \
> > meta globally-unique="true" clone-max="2" clone-node-max="1"
> > interleave="false"
> > clone WebSiteClone WebSite \
> > meta interleave="false"
> > colocation WebSite-with-WebFS inf: WebSiteClone WebFSClone
> > colocation colocation-WebSite-ClusterIP-INFINITY inf: WebSiteClone WebIP
> > colocation fs_on_drbd inf: WebFSClone WebDataClone:Master
> > order WebFS-after-WebData inf: WebDataClone:promote WebFSClone:start
> > order WebSite-after-WebFS inf: WebFSClone WebSiteClone
> > order order-ClusterIP-WebSite-mandatory : WebIP:start WebSiteClone:start
> > property $id="cib-bootstrap-options" \
> > dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \
> > cluster-infrastructure="cman" \
> > expected-quorum-votes="2" \
> > stonith-enabled="false" \
> > no-quorum-policy="ignore"
> > rsc_defaults $id="rsc-options" \
> > resource-stickiness="100"
> >
> > when i want to mount the filesystem manually, this message appear :
> > [root at cluster1 ~]# mount /dev/drbd1 /mnt/
> > mount point already used or other mount in progress
> > error mounting lockproto lock_dlm
> >
> > but when i check the mount, there is no mount from drbd
>
>
This is what the system told me :
> what does "ps axf" say? Is there another mount process running?
>
[root at cluster2 ~]# ps axf
PID TTY STAT TIME COMMAND
2 ? S 0:00 [kthreadd]
3 ? S 0:00 \_ [migration/0]
4 ? S 0:00 \_ [ksoftirqd/0]
5 ? S 0:00 \_ [migration/0]
6 ? S 0:00 \_ [watchdog/0]
7 ? S 0:03 \_ [events/0]
8 ? S 0:00 \_ [cgroup]
9 ? S 0:00 \_ [khelper]
10 ? S 0:00 \_ [netns]
11 ? S 0:00 \_ [async/mgr]
12 ? S 0:00 \_ [pm]
13 ? S 0:00 \_ [sync_supers]
14 ? S 0:00 \_ [bdi-default]
15 ? S 0:00 \_ [kintegrityd/0]
16 ? S 0:03 \_ [kblockd/0]
17 ? S 0:00 \_ [kacpid]
18 ? S 0:00 \_ [kacpi_notify]
19 ? S 0:00 \_ [kacpi_hotplug]
20 ? S 0:00 \_ [ata/0]
21 ? S 0:00 \_ [ata_aux]
22 ? S 0:00 \_ [ksuspend_usbd]
23 ? S 0:00 \_ [khubd]
24 ? S 0:00 \_ [kseriod]
25 ? S 0:00 \_ [md/0]
26 ? S 0:00 \_ [md_misc/0]
27 ? S 0:00 \_ [khungtaskd]
28 ? S 0:00 \_ [kswapd0]
29 ? SN 0:00 \_ [ksmd]
30 ? SN 0:00 \_ [khugepaged]
31 ? S 0:00 \_ [aio/0]
32 ? S 0:00 \_ [crypto/0]
37 ? S 0:00 \_ [kthrotld/0]
39 ? S 0:00 \_ [kpsmoused]
40 ? S 0:00 \_ [usbhid_resumer]
71 ? S 0:00 \_ [kstriped]
188 ? S 0:00 \_ [scsi_eh_0]
190 ? S 0:00 \_ [scsi_eh_1]
220 ? S 0:00 \_ [scsi_eh_2]
272 ? S 0:00 \_ [kdmflush]
273 ? S 0:00 \_ [kdmflush]
293 ? S 0:00 \_ [jbd2/dm-0-8]
294 ? S 0:00 \_ [ext4-dio-unwrit]
853 ? S 0:00 \_ [kdmflush]
877 ? S 0:00 \_ [flush-253:0]
890 ? S 0:00 \_ [jbd2/sda1-8]
891 ? S 0:00 \_ [ext4-dio-unwrit]
949 ? S 0:00 \_ [kauditd]
1602 ? S 0:00 \_ [rpciod/0]
2344 ? S 0:00 \_ [cqueue]
2456 ? S 0:00 \_ [drbd1_worker]
2831 ? S 0:00 \_ [glock_workqueue]
2832 ? S 0:00 \_ [delete_workqueu]
2833 ? S< 0:00 \_ [kslowd001]
2834 ? S< 0:00 \_ [kslowd000]
2846 ? S 0:00 \_ [dlm_astd]
2847 ? S 0:00 \_ [dlm_scand]
2848 ? S 0:00 \_ [dlm_recv/0]
2849 ? S 0:00 \_ [dlm_send]
2850 ? S 0:00 \_ [dlm_recoverd]
1 ? Ss 0:01 /sbin/init
377 ? S<s 0:00 /sbin/udevd -d
840 ? S< 0:00 \_ /sbin/udevd -d
842 ? S< 0:00 \_ /sbin/udevd -d
1182 ? S<sl 0:00 auditd
1208 ? Sl 0:00 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
1250 ? Ss 0:00 rpcbind
1351 ? SLsl 0:06 corosync -f
1394 ? Ssl 0:00 fenced
1420 ? Ssl 0:00 dlm_controld
1467 ? Ssl 0:00 gfs_controld
1539 ? Ss 0:00 dbus-daemon --system
1550 ? S 0:00 avahi-daemon: running [cluster2.local]
1551 ? Ss 0:00 \_ avahi-daemon: chroot helper
1568 ? Ss 0:00 rpc.statd
1606 ? Ss 0:00 rpc.idmapd
1616 ? Ss 0:00 cupsd -C /etc/cups/cupsd.conf
1641 ? Ss 0:00 /usr/sbin/acpid
1650 ? Ss 0:00 hald
1651 ? S 0:00 \_ hald-runner
1692 ? S 0:00 \_ hald-addon-input: Listening on
/dev/input/event3 /dev/input/event1 /dev/input/event0
1695 ? S 0:00 \_ hald-addon-acpi: listening on acpid
socket /var/run/acpid.socket
1715 ? Ssl 0:00 automount --pid-file /var/run/autofs.pid
1740 ? Ss 0:00 /usr/sbin/sshd
1979 ? Ss 0:00 \_ sshd: root at pts/0
2207 pts/0 Ss 0:00 \_ -bash
8528 pts/0 R+ 0:00 \_ ps axf
1748 ? Ss 0:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
1828 ? Ss 0:00 /usr/libexec/postfix/master
1834 ? S 0:00 \_ pickup -l -t fifo -u
1835 ? S 0:00 \_ qmgr -l -t fifo -u
1852 ? Ss 0:00 /usr/sbin/abrtd
1860 ? Ss 0:00 abrt-dump-oops -d /var/spool/abrt -rwx
/var/log/messages
1890 ? Ss 0:00 crond
1901 ? Ss 0:00 /usr/sbin/atd
1913 ? Ss 0:00 /usr/sbin/certmonger -S -p
/var/run/certmonger.pid
1939 ? S 0:00 pacemakerd
1943 ? Ss 0:02 \_ /usr/libexec/pacemaker/cib
1944 ? Ss 0:00 \_ /usr/libexec/pacemaker/stonithd
1945 ? Ss 0:01 \_ /usr/lib64/heartbeat/lrmd
1946 ? Ss 0:00 \_ /usr/libexec/pacemaker/attrd
1947 ? Ss 0:00 \_ /usr/libexec/pacemaker/pengine
1948 ? Ss 0:01 \_ /usr/libexec/pacemaker/crmd
2005 ? Ss 0:00 /usr/sbin/gdm-binary -nodaemon
2136 ? S 0:00 \_ /usr/libexec/gdm-simple-slave --display-id
/org/gnome/DisplayManager/Display1 --force-active-vt
2157 tty1 Ss+ 0:02 \_ /usr/bin/Xorg :0 -nr -verbose -audit 4
-auth /var/run/gdm/auth-for-gdm-nrpPGF/database -nolisten tcp vt1
2485 ? Ssl 0:00 \_ /usr/bin/gnome-session
--autostart=/usr/share/gdm/autostart/LoginWindow/
2595 ? S 0:00 | \_ /usr/libexec/at-spi-registryd
2683 ? S 0:00 | \_ metacity
2705 ? S 0:00 | \_ gnome-power-manager
2706 ? S 0:00 | \_ /usr/libexec/gdm-simple-greeter
2708 ? S 0:00 | \_
/usr/libexec/polkit-gnome-authentication-agent-1
2788 ? S 0:00 \_ pam: gdm-password
2028 tty2 Ss+ 0:00 /sbin/mingetty /dev/tty2
2037 tty3 Ss+ 0:00 /sbin/mingetty /dev/tty3
2050 tty4 Ss+ 0:00 /sbin/mingetty /dev/tty4
2062 tty5 Ss+ 0:00 /sbin/mingetty /dev/tty5
2071 tty6 Ss+ 0:00 /sbin/mingetty /dev/tty6
2346 ? Sl 0:00 /usr/sbin/console-kit-daemon --no-daemon
2474 ? S 0:00 /usr/bin/dbus-launch --exit-with-session
2482 ? Ss 0:00 /bin/dbus-daemon --fork --print-pid 5
--print-address 7 --session
2527 ? S 0:00 /usr/libexec/devkit-power-daemon
2546 ? S 0:00 /usr/libexec/gconfd-2
2609 ? Ssl 0:00 /usr/libexec/gnome-settings-daemon
--gconf-prefix=/apps/gdm/simple-greeter/settings-manager-plugins
2615 ? Ssl 0:00 /usr/libexec/bonobo-activation-server
--ac-activate --ior-output-fd=12
2672 ? S 0:00 /usr/libexec/gvfsd
2728 ? S 0:00 /usr/libexec/polkit-1/polkitd
2744 ? S<sl 0:00 /usr/bin/pulseaudio --start --log-target=syslog
2748 ? SNl 0:00 /usr/libexec/rtkit-daemon
2843 ? D 0:00 /sbin/mount.gfs2 /dev/drbd1 /var/www/html -o rw
3049 ? D 0:00 blockdev --flushbufs /dev/drbd/by-res/wwwdata
7881 ? Ss 0:00 /usr/sbin/anacron -s
> Did crm_mon report any errors?
[root at cluster2 ~]# crm status
============
Last updated: Wed Oct 31 12:10:31 2012
Last change: Mon Oct 29 17:01:09 2012 via cibadmin on cluster1
Stack: cman
Current DC: cluster2 - partition with quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
2 Nodes configured, 2 expected votes
8 Resources configured.
============
Online: [ cluster1 cluster2 ]
Master/Slave Set: WebDataClone [WebData]
Masters: [ cluster1 cluster2 ]
Clone Set: WebIP [ClusterIP] (unique)
ClusterIP:0 (ocf::heartbeat:IPaddr2): Started cluster1
ClusterIP:1 (ocf::heartbeat:IPaddr2): Started cluster2
Clone Set: WebFSClone [WebFS]
WebFS:0 (ocf::heartbeat:Filesystem): Started cluster2
(unmanaged) FAILED
Stopped: [ WebFS:1 ]
Failed actions:
WebFS:1_start_0 (node=cluster1, call=14, rc=-2, status=Timed Out):
unknown exec error
WebFS:0_stop_0 (node=cluster2, call=16, rc=-2, status=Timed Out):
unknown exec error
> Did you check the system logs?
>
[root at cluster2 ~]# crm_verify -L -V
warning: unpack_rsc_op: Processing failed op WebFS:1_last_failure_0 on
cluster1: unknown exec error (-2)
warning: unpack_rsc_op: Processing failed op WebFS:0_last_failure_0 on
cluster2: unknown exec error (-2)
warning: common_apply_stickiness: Forcing WebFSClone away from
cluster1 after 1000000 failures (max=1000000)
warning: common_apply_stickiness: Forcing WebFSClone away from
cluster1 after 1000000 failures (max=1000000)
warning: common_apply_stickiness: Forcing WebFSClone away from
cluster2 after 1000000 failures (max=1000000)
warning: common_apply_stickiness: Forcing WebFSClone away from
cluster2 after 1000000 failures (max=1000000)
warning: should_dump_input: Ignoring requirement that WebFS:0_stop_0
comeplete before WebFSClone_stopped_0: unmanaged failed resources cannot
prevent clone shutdown
[root at cluster2 ~]# grep -i error /var/log/messages
Oct 31 11:12:25 cluster2 kernel: block drbd1: error receiving ReportState,
l: 4!
Oct 31 11:12:29 cluster2 kernel: block drbd1: error receiving ReportState,
l: 4!
Oct 31 11:12:56 cluster2 crmd[1948]: error: process_lrm_event: LRM
operation WebFS:0_start_0 (15) Timed Out (timeout=20000ms)
Oct 31 11:13:17 cluster2 crmd[1948]: error: process_lrm_event: LRM
operation WebFS:0_stop_0 (16) Timed Out (timeout=20000ms)
Oct 31 11:15:51 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing
failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2)
Oct 31 11:16:16 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing
failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2)
Oct 31 11:31:16 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing
failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2)
Oct 31 11:39:05 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing
failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2)
Oct 31 11:39:30 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing
failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2)
Oct 31 11:39:42 cluster2 kernel: block drbd1: error receiving ReportState,
l: 4!
Oct 31 11:39:44 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing
failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2)
Oct 31 11:39:44 cluster2 kernel: block drbd1: error receiving ReportState,
l: 4!
Oct 31 11:39:53 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing
failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2)
Oct 31 11:40:13 cluster2 crmd[1948]: warning: status_from_rc: Action 49
(WebFS:1_start_0) on cluster1 failed (target: 0 vs. rc: -2): Error
Oct 31 11:40:13 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing
failed op WebFS:1_last_failure_0 on cluster1: unknown exec error (-2)
Oct 31 11:40:13 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing
failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2)
Oct 31 11:40:14 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing
failed op WebFS:1_last_failure_0 on cluster1: unknown exec error (-2)
Oct 31 11:40:14 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing
failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2)
Oct 31 11:55:15 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing
failed op WebFS:1_last_failure_0 on cluster1: unknown exec error (-2)
Oct 31 11:55:15 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing
failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2)
Oct 31 12:10:15 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing
failed op WebFS:1_last_failure_0 on cluster1: unknown exec error (-2)
Oct 31 12:10:15 cluster2 pengine[1947]: warning: unpack_rsc_op: Processing
failed op WebFS:0_last_failure_0 on cluster2: unknown exec error (-2)
> >
> > there is another strange thing, the 1st server (cluster1) cannot reboot.
> it
> > hangs with message 'please standby while rebooting the system'. in the
> > reboot process, there are 2 failed action which is related to fencing. i
> > didn't configure any fencing yet. one of the failed action is :
> > 'stopping cluster
> > leaving fence domain .... found dlm lockspace /sys/kernel/dlm/web
> > fence_tool : cannot leave due to active system [FAILED]'
> >
> > please help me with this problem
> >
> > --
> > Best Regards,
> >
> > Soni Maula Harriz
> > Database Administrator
> > PT. Data Aksara Sangkuriang
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
--
Best Regards,
Soni Maula Harriz
Database Administrator
PT. Data Aksara Sangkuriang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20121031/1bdb4f72/attachment.htm>
More information about the Pacemaker
mailing list