[Pacemaker] kernel BUG at fs/dlm/lowcomms.c:861! on Fedora 12
Daniel Qian
daniel at bestningning.com
Sun Jan 3 19:30:20 UTC 2010
I came a long way to set up this two-node cluster of pacemaker +
openais/corosync + ocfs2 + DLM + drbd on Fedora 12. I resolved issues one
after another until I hit this last hurdle which is beyond my power to
overcome. All other components are working fine.
[root at ilo150 ~]# crm_mon -1
============
Last updated: Sun Jan 3 12:17:17 2010
Stack: openais
Current DC: ilo143 - partition with quorum
Version: 1.0.5-ee19d8e83c2a5d45988f1cee36d334a631d84fc7
2 Nodes configured, 2 expected votes
5 Resources configured.
============
Online: [ ilo143 ilo150 ]
Master/Slave Set: drbd_clone0
Masters: [ ilo143 ilo150 ]
Clone Set: dlm-clone
Started: [ ilo143 ilo150 ]
Clone Set: o2cb-clone
Started: [ ilo143 ilo150 ]
Clone Set: ip-clone (unique)
ClusterIP:0 (ocf::heartbeat:IPaddr2): Started ilo143
ClusterIP:1 (ocf::heartbeat:IPaddr2): Started ilo143
However I start having this problem when I try to mount the ocfs2 file
system by typing "crm resource start fs0-clone". Snippet from
/var/log/messages
Jan 2 17:46:13 ilo150 kernel: ------------[ cut here ]------------
Jan 2 17:46:13 ilo150 kernel: kernel BUG at fs/dlm/lowcomms.c:861!
Jan 2 17:46:13 ilo150 kernel: invalid opcode: 0000 [#1] SMP
Jan 2 17:46:13 ilo150 kernel: last sysfs file:
/sys/kernel/dlm/5316FDFD93BB4F7E97B296FC513FA149/event_done
Jan 2 17:46:13 ilo150 kernel: CPU 1
Jan 2 17:46:13 ilo150 kernel: Modules linked in: sctp libcrc32c ocfs2
ocfs2_nodemanager ocfs2_stack_user ocfs2_stackglue dlm drbd configfs ipv6
bnx2 ipmi_si serio_raw ipmi_msghandler hpwdt iTCO_wdt iTCO_vendor_support
cciss radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded:
scsi_wait_scan]
Jan 2 17:46:13 ilo150 kernel: Pid: 2918, comm: dlm_send Not tainted
2.6.31.9-174.fc12.x86_64 #1 ProLiant DL360 G6
Jan 2 17:46:13 ilo150 kernel: RIP: 0010:[<ffffffffa01d75c9>]
[<ffffffffa01d75c9>] sctp_init_assoc+0x13e/0x2c1 [dlm]
Jan 2 17:46:13 ilo150 kernel: RSP: 0018:ffff8808e9bdbc20 EFLAGS: 00010246
Jan 2 17:46:13 ilo150 kernel: RAX: ffff8808e9920038 RBX: ffff8808e9920000
RCX: 0000000000000000
Jan 2 17:46:13 ilo150 kernel: RDX: 0000000000000000 RSI: 0000000000524852
RDI: ffff8808e9920048
Jan 2 17:46:13 ilo150 kernel: RBP: ffff8808e9bdbe00 R08: 0000000000000000
R09: ffff88091f804200
Jan 2 17:46:13 ilo150 kernel: R10: ffff88091f804200 R11: 0000000000000000
R12: ffff8808e9920038
Jan 2 17:46:13 ilo150 kernel: R13: ffff8808e9920048 R14: ffff8808eed9a000
R15: ffff8808e9bdbe80
Jan 2 17:46:13 ilo150 kernel: FS: 0000000000000000(0000)
GS:ffff880028053000(0000) knlGS:0000000000000000
Jan 2 17:46:13 ilo150 kernel: CS: 0010 DS: 0018 ES: 0018 CR0:
000000008005003b
Jan 2 17:46:13 ilo150 kernel: CR2: 00007fc4485c9000 CR3: 0000000001001000
CR4: 00000000000006e0
Jan 2 17:46:13 ilo150 kernel: DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
Jan 2 17:46:13 ilo150 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0
DR7: 0000000000000400
Jan 2 17:46:13 ilo150 kernel: Process dlm_send (pid: 2918, threadinfo
ffff8808e9bda000, task ffff8808e9be0000)
Jan 2 17:46:13 ilo150 kernel: Stack:
Jan 2 17:46:13 ilo150 kernel: 0000000000000000 0000000000000000
ffff8808e9bdbd10 0000000000000010
Jan 2 17:46:13 ilo150 kernel: <0> 0000000000000000 0000000000000000
ffff8808e9bdbd90 0000000000000030
Jan 2 17:46:13 ilo150 kernel: <0> 0000000000000080 0000000000000000
0000000000000000 0000000000000000
Jan 2 17:46:13 ilo150 kernel: Call Trace:
Jan 2 17:46:13 ilo150 kernel: [<ffffffff810106c5>] ?
__switch_to+0x18b/0x217
Jan 2 17:46:13 ilo150 kernel: [<ffffffffa01d727c>] ?
process_send_sockets+0x0/0x17c [dlm]
Jan 2 17:46:13 ilo150 kernel: [<ffffffffa01d72b0>]
process_send_sockets+0x34/0x17c [dlm]
Jan 2 17:46:13 ilo150 kernel: [<ffffffff810b272d>] ?
probe_workqueue_execution+0xb1/0xcd
Jan 2 17:46:13 ilo150 kernel: [<ffffffffa01d727c>] ?
process_send_sockets+0x0/0x17c [dlm]
Jan 2 17:46:13 ilo150 kernel: [<ffffffff810635a0>]
worker_thread+0x18a/0x224
Jan 2 17:46:13 ilo150 kernel: [<ffffffff81067b37>] ?
autoremove_wake_function+0x0/0x39
Jan 2 17:46:13 ilo150 kernel: [<ffffffff81063416>] ?
worker_thread+0x0/0x224
Jan 2 17:46:13 ilo150 kernel: [<ffffffff810677b5>] kthread+0x91/0x99
Jan 2 17:46:13 ilo150 kernel: [<ffffffff81012daa>] child_rip+0xa/0x20
Jan 2 17:46:13 ilo150 kernel: [<ffffffff81067724>] ? kthread+0x0/0x99
Jan 2 17:46:13 ilo150 kernel: [<ffffffff81012da0>] ? child_rip+0x0/0x20
Jan 2 17:46:13 ilo150 kernel: Code: 60 fe ff ff 80 00 00 00 89 85 38 fe ff
ff 48 8d 45 90 48 89 85 50 fe ff ff e8 88 5f 24 e1 4c 8b 63 38 48 8d 43 38
49 39 c4 75 04 <0f> 0b eb fe 4d 63 44 24 1c 41 8b 54 24 18 66 ff 43 48 45 31
ff
Jan 2 17:46:13 ilo150 kernel: RIP [<ffffffffa01d75c9>]
sctp_init_assoc+0x13e/0x2c1 [dlm]
Jan 2 17:46:13 ilo150 kernel: RSP <ffff8808e9bdbc20>
Jan 2 17:46:13 ilo150 kernel: ---[ end trace d3844af31bca174b ]---
I am wondering if this is a Fedora specific bug. I have the full messages
logs from both nodes if anyone is interested and here is my config
[root at ilo150 ~]# crm configure show
node ilo143
node ilo150
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="xx.xx.xx.xx" cidr_netmask="32" \
op monitor interval="30s"
primitive dlm ocf:pacemaker:controld \
op monitor interval="120s"
primitive drbd_r0 ocf:linbit:drbd \
params drbd_resource="r0" \
op monitor interval="20" role="Master" timeout="20" \
op monitor interval="30" role="Slave" timeout="20"
primitive fs0 ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/mnt" fstype="ocfs2" \
meta target-role="Stopped"
primitive o2cb ocf:ocfs2:o2cb \
op monitor interval="120s"
ms drbd_clone0 drbd_r0 \
meta master-max="2" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
clone dlm-clone dlm \
meta interleave="true"
clone fs0-clone fs0
clone ip-clone ClusterIP \
meta globally-unique="true" clone-max="2" clone-node-max="2"
clone o2cb-clone o2cb \
meta interleave="true"
colocation fs0-with-o2cb inf: fs0-clone o2cb-clone
colocation fs0_on_drbd inf: fs0-clone drbd_clone0:Master
colocation o2cb-with-dlm inf: o2cb-clone dlm-clone
order fs0-after-drbd inf: drbd_clone0:promote fs0-clone:start
order fs0-after-o2cb inf: o2cb-clone fs0-clone
order o2cb-after-dlm inf: dlm-clone o2cb-clone
property $id="cib-bootstrap-options" \
dc-version="1.0.5-ee19d8e83c2a5d45988f1cee36d334a631d84fc7" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
no-quorum-policy="ignore" \
stonith-enabled="false" \
last-lrm-refresh="1262472066"
Thanks,
Daniel
More information about the Pacemaker
mailing list