[Pacemaker] node1 fencing itself after node2 being fenced

Mon Feb 24 09:19:19 EST 2014

Just an update on this issue which has now been resolved.

The issue was with my cluster configuration, dlm + sctp does not play nice
with each other, I had to un-configure redundant rings and set rrp_mode to
"none" after which clvmd works as expected.

Thanks to all for your assistance in this issue.

From: Asgaroth [mailto:lists at blueface.com] 
Sent: 10 February 2014 11:46
To: 'Mailing List: Pacemaker'
Subject: RE: node1 fencing itself after node2 being fenced

Hi All,

OK, here is my testing using cman/clvmd enabled on system startup and clvmd
outside of pacemaker control. I still seem to be getting the clvmd hang/fail
situation even when running outside of pacemaker control, I cannot see
off-hand where the issue is occurring, but maybe it is related to what
Vladislav was saying where clvmd hangs if it is not running on a cluster
node that has cman running, however, I have both cman/clvmd enable to start
at boot. Here is a little synopsis of what appears to be happening here:

[1] Everything is fine here, both nodes up and running:

# cman_tool nodes

Node  Sts   Inc   Joined               Name

   1   M    444   2014-02-07 10:25:00  test01

   2   M    440   2014-02-07 10:25:00  test02

# dlm_tool ls

dlm lockspaces

name          clvmd

id            0x4104eefa

flags         0x00000000 

change        member 2 joined 1 remove 0 failed 0 seq 1,1

members       1 2

[2] Here I "echo c > /proc/sysrq-trigger" on node2 (test02), I can see
crm_mon saying that node 2 is in unclean state and fencing kicks in (reboot
node 2)

# cman_tool nodes

Node  Sts   Inc   Joined               Name

   1   M    440   2014-02-07 10:27:58  test01

   2   X    444                                              test02

# dlm_tool ls

dlm lockspaces

name          clvmd

id            0x4104eefa

flags         0x00000004 kern_stop

change        member 2 joined 1 remove 0 failed 0 seq 2,2

members       1 2 

new change    member 1 joined 0 remove 1 failed 1 seq 3,3

new status    wait_messages 0 wait_condition 1 fencing

new members   1

[3] So the above looks fine so far, to my untrained eye, dlm in kern_stop
state while waiting on successful fence, and the node reboots and we have
the following state:

# cman_tool nodes

Node  Sts   Inc   Joined               Name

   1   M    440   2014-02-07 10:27:58  test01

   2   M    456   2014-02-07 10:35:42  test02

# dlm_tool ls

dlm lockspaces

name          clvmd

id            0x4104eefa

flags         0x00000000 

change        member 2 joined 1 remove 0 failed 0 seq 4,4

members       1 2

So it looks like dlm and cman seem to be working properly (again, I could be
wrong, my untrained eye and all :) )

However, if I try to run any lvm status/clvm status commands then they still
just hang. Could this be related to clvmd doing a check when cman is up and
running but clvmd has not started yet (As I understand from Vladislav's
previous email). Or do I have something fundamentally wrong with my fencing
configuration.

Here is a link to the "dlm_tool dump" at the time of the above "dlm_tool ls"
(if it helps)

http://pastebin.com/KV6YZWrN

Again, thanks for all the info thus far.

Thanks

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140224/c6a54524/attachment-0003.html>