[Pacemaker] clmvd hangs on node1 if node2 is fenced
Michael Smith
msmith at cbnco.com
Thu Aug 26 22:50:25 UTC 2010
> Xinwei Hu <hxinwei at ...> writes:
>
> > That sounds worrying actually.
> > I think this is logged as bug 585419 on SLES' bugzilla.
> > If you can reproduce this issue, it worths to reopen it I think.
I've got a pair of fully patched SLES11 SP1 nodes and they're showing
what I guess is the same behaviour: if I hard-poweroff node2, operations
like "vgdisplay -v" hang on node1 for quite some time. Sometimes a
minute, sometimes two, sometimes forever. They get stuck here:
Aug 26 18:31:42 xen-test1 clvmd[8906]: doing PRE command LOCK_VG
'V_vm_store' at
1 (client=0x7f2714000b40)
Aug 26 18:31:42 xen-test1 clvmd[8906]: lock_resource 'V_vm_store',
flags=0, mode=3
After a few seconds, corosync & dlm notice the node is gone, but
vg_display and
friends still hang while trying to lock the VG.
Aug 26 18:31:44 xen-test1 corosync[8476]: [TOTEM ] A processor failed,
forming new configuration.
Aug 26 18:31:50 xen-test1 cluster-dlm[8870]: update_cluster: Processing
membership 1260
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: dlm_process_node: Skipped
active node 219878572: born-on=1256, last-seen=1260, this-event=1260,
last-event=1256
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: del_configfs_node:
del_configfs_node rmdir "/sys/kernel/config/dlm/cluster/comms/236655788"
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: dlm_process_node: Removed
inactive node 236655788: born-on=1252, last-seen=1256, this-event=1260,
last-event=1256
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: log_config: dlm:controld
conf 1 0 1 memb 219878572 join left 236655788
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: log_config: dlm:ls:clvmd
conf 1 0 1 memb 219878572 join left 236655788
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: add_change: clvmd
add_change cg 3 remove nodeid 236655788 reason 3
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: add_change: clvmd
add_change cg 3 counts member 1 joined 0 remove 1 failed 1
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: stop_kernel: clvmd
stop_kernel cg 3
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: do_sysfs: write "0" to
"/sys/kernel/dlm/clvmd/control"
Aug 26 18:31:51 xen-test1 kernel: [ 365.267802] dlm: closing connection
to node 236655788
Aug 26 18:31:51 xen-test1 clvmd[8906]: confchg callback. 0 joined, 1
left, 1 members
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: fence_node_time: Node
236655788/xen-test2 has not been shot yet
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: check_fencing_done: clvmd
check_fencing 23665578 not fenced add 1282861615 fence 0
Aug 26 18:31:51 xen-test1 crmd: [8489]: info: ais_dispatch: Membership
1260: quorum still lost
Aug 26 18:31:51 xen-test1 cluster-dlm: [8870]: info: ais_dispatch:
Membership 1260: quorum still lost
...
cluster-glue-1.0.5-0.5.1
corosync-1.2.1-0.5.1
kernel-xen-2.6.32.13-0.5.1
libcorosync4-1.2.1-0.5.1
lvm2-2.02.39-18.27.1
lvm2-clvm-2.02.39-18.27.1
multipath-tools-0.4.8-40.23.1
Thanks,
Mike
More information about the Pacemaker
mailing list