[Pacemaker] kernel BUG at fs/dlm/lock.c:242! after sync of GFS2 (2 node - active/active)

Wed Sep 8 02:19:49 UTC 2010

After setting up a 2 node cluster following the cluster from scratch guide for Fedora 13 i have to say that GFS2 filesystem (active/active) doesn't work! 

If the kernel bug described below is not caused by one the modifications i HAD to do following the guide to continue, so Fedora 13 has no actual GFS2 cluster system working!
The modifications were:

1 - Changed  "/dev/drbd/by-res/wwwdata" to "/dev/drbd1" on WebFS.
2 - Did not executed command: "mkfs.gfs2 -p lock_dlm -j 2 -t pcmk:web /dev/drbd1" on the second node cause when drbd is not loaded says " Could not stat deice" and when it is "Device is busy/Read only file system" as i described on previous post.

But no matter what i did, it does not justify a kernel bug.

The problem occurs after the sync of the filesystem, after "cat/proc/drbd" reaches 100%.

Any hint to make it work?

Cluster from scratch guide followed:
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch02.html

Kernel Bug Description from "dmesg":

------------[ cut here ]------------
kernel BUG at fs/dlm/lock.c:242!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/kernel/dlm/web/event_done
CPU 3 
Modules linked in: gfs2 drbd lru_cache ipt_CLUSTERIP dlm configfs sunrpc ipv6 cpufreq_ondemand acpi_cpufreq freq_table uinput iTCO_wdt iTCO_vendor_support e1000 i2c_i801 shpchp microcode sky2 i2c_core e752x_edac i6300esb edac_core raid1 [last unloaded: scsi_wait_scan]

Pid: 2866, comm: mount.gfs2 Not tainted 2.6.34.6-47.fc13.x86_64 #1 SEP7320VP2D2                            /        
RIP: 0010:[<ffffffffa01244d6>]  [<ffffffffa01244d6>] is_remote+0x73/0x81 [dlm]
RSP: 0018:ffff8801371a1938  EFLAGS: 00010296
RAX: 0000000000000004 RBX: ffff8801380f9900 RCX: 0000000000005192
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffff8801371a1948 R08: 00000000ffffffff R09: 0000000000000073
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801380490e8
R13: ffff880138b7b000 R14: 0000000000000000 R15: ffff8801380f99e0
FS:  00007fddfa6ac700(0000) GS:ffff880002180000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fe4193da000 CR3: 0000000138053000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mount.gfs2 (pid: 2866, threadinfo ffff8801371a0000, task ffff880137119770)
Stack:
 00000000c700a8c0 ffff8801380f9900 ffff8801371a19b8 ffffffffa0129111
<0> ffff880138b7b604 c700a8c000000c30 ffff8801371a19e0 ffff88013a218200
<0> 0000000000000000 aa00a8c000000246 ffff8801371a19b8 ffff8801380490e8
Call Trace:
 [<ffffffffa0129111>] _request_lock+0x22e/0x274 [dlm]
 [<ffffffffa01291d5>] request_lock+0x7e/0xa7 [dlm]
 [<ffffffffa012676f>] ? create_lkb+0x126/0x14e [dlm]
 [<ffffffffa0129b89>] dlm_lock+0xf7/0x14d [dlm]
 [<ffffffffa01998bd>] ? gdlm_bast+0x0/0x43 [gfs2]
 [<ffffffffa019998a>] ? gdlm_ast+0x0/0x116 [gfs2]
 [<ffffffffa01998bd>] ? gdlm_bast+0x0/0x43 [gfs2]
 [<ffffffffa01998a5>] gdlm_lock+0xef/0x107 [gfs2]
 [<ffffffffa019998a>] ? gdlm_ast+0x0/0x116 [gfs2]
 [<ffffffffa01998bd>] ? gdlm_bast+0x0/0x43 [gfs2]
 [<ffffffffa01816e5>] do_xmote+0xed/0x14f [gfs2]
 [<ffffffffa0181853>] run_queue+0x10c/0x14a [gfs2]
 [<ffffffffa0182782>] gfs2_glock_nq+0x282/0x2a6 [gfs2]
 [<ffffffffa01827f1>] gfs2_glock_nq_num+0x4b/0x73 [gfs2]
 [<ffffffffa018c814>] init_locking+0x85/0x162 [gfs2]
 [<ffffffffa018deb6>] gfs2_get_sb+0x6e3/0x9ad [gfs2]
 [<ffffffffa01827e9>] ? gfs2_glock_nq_num+0x43/0x73 [gfs2]
 [<ffffffff811d8b40>] ? selinux_sb_copy_data+0x196/0x1af
 [<ffffffff8110fe29>] vfs_kern_mount+0xbd/0x19b
 [<ffffffff8110ff6f>] do_kern_mount+0x4d/0xed
 [<ffffffff811256c3>] do_mount+0x753/0x7c9
 [<ffffffff810f8373>] ? alloc_pages_current+0x95/0x9e
 [<ffffffff811257c1>] sys_mount+0x88/0xc2
 [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
Code: 8b e0 00 00 00 8b 73 3c 44 8b 83 d0 00 00 00 48 c7 c7 38 57 13 a0 31 c0 e8 85 67 32 e1 48 c7 c7 6d 57 13 a0 31 c0 e8 77 67 32 e1 <0f> 0b eb fe 59 0f 95 c0 0f b6 c0 5b c9 c3 55 48 89 e5 53 48 83 
RIP  [<ffffffffa01244d6>] is_remote+0x73/0x81 [dlm]
 RSP <ffff8801371a1938>
---[ end trace 7c9ca33705dbca8d ]---

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20100907/4c002aff/attachment.htm>