[Pacemaker] kernel BUG at fs/dlm/lock.c:242! after sync of GFS2 (2 node - active/active)

Vladislav Bogdanov bubble at hoster-ok.com
Wed Sep 8 00:00:36 EDT 2010


08.09.2010 05:19, Alisson Landim wrote:
> After setting up a 2 node cluster following the cluster from scratch
> guide for Fedora 13 i have to say that GFS2 filesystem (active/active)
> doesn't work!
> 
> If the kernel bug described below is not caused by one the modifications
> i HAD to do following the guide to continue, so Fedora 13 has no actual
> GFS2 cluster system working!
> The modifications were:
> 
> 1 - Changed  "/dev/drbd/by-res/wwwdata" to "/dev/drbd1" on WebFS.
> 2 - Did not executed command: "mkfs.gfs2 -p lock_dlm -j 2 -t pcmk:web
> /dev/drbd1" on the second node cause when drbd is not loaded says "
> Could not stat deice" and when it is "Device is busy/Read only file
> system" as i described on previous post.
> 
> But no matter what i did, it does not justify a kernel bug.

Could you paste output from drbd-overview? drbd device should be in
Primary state on node where you issue mount.
And...
Did you start corosync or openais? Later is required instead of "plain"
corosync to support GFS2/OCFS2.

Andrew, shouldn't openais requirement for GFS2 be mentioned in docs?
GFS2 needs CPKT, and cries about it when started under plain corosync.
Now I just start openais instead of corosync, but current openais
movement is to remove that initscript.

BTW dlm seems to be really broken on F13 (and F12). At least for me. I
constantly get kernel panic with that kernels.

http://lists.linbit.com/pipermail/drbd-user/2010-August/014596.html

Only downgrade to F11 kernel helped me to get it all work. I have no
power to investigate further but can provide more information if someone
is willing to fix it.

Best,
Vladislav

> 
> The problem occurs after the sync of the filesystem, after
> "cat/proc/drbd" reaches 100%.
> 
> Any hint to make it work?
> 
> 
> Cluster from scratch guide followed:
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch02.html
> 
> Kernel Bug Description from "dmesg":
> 
> ------------[ cut here ]------------
> kernel BUG at fs/dlm/lock.c:242!
> invalid opcode: 0000 [#1] SMP
> last sysfs file: /sys/kernel/dlm/web/event_done
> CPU 3
> Modules linked in: gfs2 drbd lru_cache ipt_CLUSTERIP dlm configfs sunrpc
> ipv6 cpufreq_ondemand acpi_cpufreq freq_table uinput iTCO_wdt
> iTCO_vendor_support e1000 i2c_i801 shpchp microcode sky2 i2c_core
> e752x_edac i6300esb edac_core raid1 [last unloaded: scsi_wait_scan]
> 
> Pid: 2866, comm: mount.gfs2 Not tainted 2.6.34.6-47.fc13.x86_64 #1
> SEP7320VP2D2                            /       
> RIP: 0010:[<ffffffffa01244d6>]  [<ffffffffa01244d6>] is_remote+0x73/0x81
> [dlm]
> RSP: 0018:ffff8801371a1938  EFLAGS: 00010296
> RAX: 0000000000000004 RBX: ffff8801380f9900 RCX: 0000000000005192
> RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
> RBP: ffff8801371a1948 R08: 00000000ffffffff R09: 0000000000000073
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801380490e8
> R13: ffff880138b7b000 R14: 0000000000000000 R15: ffff8801380f99e0
> FS:  00007fddfa6ac700(0000) GS:ffff880002180000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007fe4193da000 CR3: 0000000138053000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process mount.gfs2 (pid: 2866, threadinfo ffff8801371a0000, task
> ffff880137119770)
> Stack:
>  00000000c700a8c0 ffff8801380f9900 ffff8801371a19b8 ffffffffa0129111
> <0> ffff880138b7b604 c700a8c000000c30 ffff8801371a19e0 ffff88013a218200
> <0> 0000000000000000 aa00a8c000000246 ffff8801371a19b8 ffff8801380490e8
> Call Trace:
>  [<ffffffffa0129111>] _request_lock+0x22e/0x274 [dlm]
>  [<ffffffffa01291d5>] request_lock+0x7e/0xa7 [dlm]
>  [<ffffffffa012676f>] ? create_lkb+0x126/0x14e [dlm]
>  [<ffffffffa0129b89>] dlm_lock+0xf7/0x14d [dlm]
>  [<ffffffffa01998bd>] ? gdlm_bast+0x0/0x43 [gfs2]
>  [<ffffffffa019998a>] ? gdlm_ast+0x0/0x116 [gfs2]
>  [<ffffffffa01998bd>] ? gdlm_bast+0x0/0x43 [gfs2]
>  [<ffffffffa01998a5>] gdlm_lock+0xef/0x107 [gfs2]
>  [<ffffffffa019998a>] ? gdlm_ast+0x0/0x116 [gfs2]
>  [<ffffffffa01998bd>] ? gdlm_bast+0x0/0x43 [gfs2]
>  [<ffffffffa01816e5>] do_xmote+0xed/0x14f [gfs2]
>  [<ffffffffa0181853>] run_queue+0x10c/0x14a [gfs2]
>  [<ffffffffa0182782>] gfs2_glock_nq+0x282/0x2a6 [gfs2]
>  [<ffffffffa01827f1>] gfs2_glock_nq_num+0x4b/0x73 [gfs2]
>  [<ffffffffa018c814>] init_locking+0x85/0x162 [gfs2]
>  [<ffffffffa018deb6>] gfs2_get_sb+0x6e3/0x9ad [gfs2]
>  [<ffffffffa01827e9>] ? gfs2_glock_nq_num+0x43/0x73 [gfs2]
>  [<ffffffff811d8b40>] ? selinux_sb_copy_data+0x196/0x1af
>  [<ffffffff8110fe29>] vfs_kern_mount+0xbd/0x19b
>  [<ffffffff8110ff6f>] do_kern_mount+0x4d/0xed
>  [<ffffffff811256c3>] do_mount+0x753/0x7c9
>  [<ffffffff810f8373>] ? alloc_pages_current+0x95/0x9e
>  [<ffffffff811257c1>] sys_mount+0x88/0xc2
>  [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
> Code: 8b e0 00 00 00 8b 73 3c 44 8b 83 d0 00 00 00 48 c7 c7 38 57 13 a0
> 31 c0 e8 85 67 32 e1 48 c7 c7 6d 57 13 a0 31 c0 e8 77 67 32 e1 <0f> 0b
> eb fe 59 0f 95 c0 0f b6 c0 5b c9 c3 55 48 89 e5 53 48 83
> RIP  [<ffffffffa01244d6>] is_remote+0x73/0x81 [dlm]
>  RSP <ffff8801371a1938>
> ---[ end trace 7c9ca33705dbca8d ]---
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





More information about the Pacemaker mailing list