[Pacemaker] kernel BUG at fs/dlm/lock.c:242! after sync of GFS2 (2 node - active/active)

Mon Sep 13 13:21:06 UTC 2010

On Wed, Sep 8, 2010 at 6:00 AM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
> 08.09.2010 05:19, Alisson Landim wrote:
>> After setting up a 2 node cluster following the cluster from scratch
>> guide for Fedora 13 i have to say that GFS2 filesystem (active/active)
>> doesn't work!
>>
>> If the kernel bug described below is not caused by one the modifications
>> i HAD to do following the guide to continue, so Fedora 13 has no actual
>> GFS2 cluster system working!
>> The modifications were:
>>
>> 1 - Changed  "/dev/drbd/by-res/wwwdata" to "/dev/drbd1" on WebFS.
>> 2 - Did not executed command: "mkfs.gfs2 -p lock_dlm -j 2 -t pcmk:web
>> /dev/drbd1" on the second node cause when drbd is not loaded says "
>> Could not stat deice" and when it is "Device is busy/Read only file
>> system" as i described on previous post.
>>
>> But no matter what i did, it does not justify a kernel bug.
>
> Could you paste output from drbd-overview? drbd device should be in
> Primary state on node where you issue mount.
> And...
> Did you start corosync or openais? Later is required instead of "plain"
> corosync to support GFS2/OCFS2.
>
> Andrew, shouldn't openais requirement for GFS2 be mentioned in docs?

Which ones though?  Its not strictly Pacemaker's job to explain the
requirements of the resources you're running.
I'm pretty sure clusters from scratch mentions it though, as does
http://www.clusterlabs.org/wiki/FAQ#What_is_the_Project.27s_Relationship_with_OpenAIS.3F

> GFS2 needs CPKT, and cries about it when started under plain corosync.
> Now I just start openais instead of corosync, but current openais
> movement is to remove that initscript.
>
> BTW dlm seems to be really broken on F13 (and F12). At least for me. I
> constantly get kernel panic with that kernels.

Just downgrading the kernel was enough?

> http://lists.linbit.com/pipermail/drbd-user/2010-August/014596.html
>
> Only downgrade to F11 kernel helped me to get it all work. I have no
> power to investigate further but can provide more information if someone
> is willing to fix it.
>
> Best,
> Vladislav
>
>>
>> The problem occurs after the sync of the filesystem, after
>> "cat/proc/drbd" reaches 100%.
>>
>> Any hint to make it work?
>>
>>
>> Cluster from scratch guide followed:
>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch02.html
>>
>> Kernel Bug Description from "dmesg":
>>
>> ------------[ cut here ]------------
>> kernel BUG at fs/dlm/lock.c:242!
>> invalid opcode: 0000 [#1] SMP
>> last sysfs file: /sys/kernel/dlm/web/event_done
>> CPU 3
>> Modules linked in: gfs2 drbd lru_cache ipt_CLUSTERIP dlm configfs sunrpc
>> ipv6 cpufreq_ondemand acpi_cpufreq freq_table uinput iTCO_wdt
>> iTCO_vendor_support e1000 i2c_i801 shpchp microcode sky2 i2c_core
>> e752x_edac i6300esb edac_core raid1 [last unloaded: scsi_wait_scan]
>>
>> Pid: 2866, comm: mount.gfs2 Not tainted 2.6.34.6-47.fc13.x86_64 #1
>> SEP7320VP2D2                            /
>> RIP: 0010:[<ffffffffa01244d6>]  [<ffffffffa01244d6>] is_remote+0x73/0x81
>> [dlm]
>> RSP: 0018:ffff8801371a1938  EFLAGS: 00010296
>> RAX: 0000000000000004 RBX: ffff8801380f9900 RCX: 0000000000005192
>> RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
>> RBP: ffff8801371a1948 R08: 00000000ffffffff R09: 0000000000000073
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801380490e8
>> R13: ffff880138b7b000 R14: 0000000000000000 R15: ffff8801380f99e0
>> FS:  00007fddfa6ac700(0000) GS:ffff880002180000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> CR2: 00007fe4193da000 CR3: 0000000138053000 CR4: 00000000000006e0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Process mount.gfs2 (pid: 2866, threadinfo ffff8801371a0000, task
>> ffff880137119770)
>> Stack:
>>  00000000c700a8c0 ffff8801380f9900 ffff8801371a19b8 ffffffffa0129111
>> <0> ffff880138b7b604 c700a8c000000c30 ffff8801371a19e0 ffff88013a218200
>> <0> 0000000000000000 aa00a8c000000246 ffff8801371a19b8 ffff8801380490e8
>> Call Trace:
>>  [<ffffffffa0129111>] _request_lock+0x22e/0x274 [dlm]
>>  [<ffffffffa01291d5>] request_lock+0x7e/0xa7 [dlm]
>>  [<ffffffffa012676f>] ? create_lkb+0x126/0x14e [dlm]
>>  [<ffffffffa0129b89>] dlm_lock+0xf7/0x14d [dlm]
>>  [<ffffffffa01998bd>] ? gdlm_bast+0x0/0x43 [gfs2]
>>  [<ffffffffa019998a>] ? gdlm_ast+0x0/0x116 [gfs2]
>>  [<ffffffffa01998bd>] ? gdlm_bast+0x0/0x43 [gfs2]
>>  [<ffffffffa01998a5>] gdlm_lock+0xef/0x107 [gfs2]
>>  [<ffffffffa019998a>] ? gdlm_ast+0x0/0x116 [gfs2]
>>  [<ffffffffa01998bd>] ? gdlm_bast+0x0/0x43 [gfs2]
>>  [<ffffffffa01816e5>] do_xmote+0xed/0x14f [gfs2]
>>  [<ffffffffa0181853>] run_queue+0x10c/0x14a [gfs2]
>>  [<ffffffffa0182782>] gfs2_glock_nq+0x282/0x2a6 [gfs2]
>>  [<ffffffffa01827f1>] gfs2_glock_nq_num+0x4b/0x73 [gfs2]
>>  [<ffffffffa018c814>] init_locking+0x85/0x162 [gfs2]
>>  [<ffffffffa018deb6>] gfs2_get_sb+0x6e3/0x9ad [gfs2]
>>  [<ffffffffa01827e9>] ? gfs2_glock_nq_num+0x43/0x73 [gfs2]
>>  [<ffffffff811d8b40>] ? selinux_sb_copy_data+0x196/0x1af
>>  [<ffffffff8110fe29>] vfs_kern_mount+0xbd/0x19b
>>  [<ffffffff8110ff6f>] do_kern_mount+0x4d/0xed
>>  [<ffffffff811256c3>] do_mount+0x753/0x7c9
>>  [<ffffffff810f8373>] ? alloc_pages_current+0x95/0x9e
>>  [<ffffffff811257c1>] sys_mount+0x88/0xc2
>>  [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
>> Code: 8b e0 00 00 00 8b 73 3c 44 8b 83 d0 00 00 00 48 c7 c7 38 57 13 a0
>> 31 c0 e8 85 67 32 e1 48 c7 c7 6d 57 13 a0 31 c0 e8 77 67 32 e1 <0f> 0b
>> eb fe 59 0f 95 c0 0f b6 c0 5b c9 c3 55 48 89 e5 53 48 83
>> RIP  [<ffffffffa01244d6>] is_remote+0x73/0x81 [dlm]
>>  RSP <ffff8801371a1938>
>> ---[ end trace 7c9ca33705dbca8d ]---
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>