[Pacemaker] node1 fencing itself after node2 being fenced

Tue Feb 18 22:29:36 EST 2014

On 18 Feb 2014, at 9:12 pm, Asgaroth <lists at blueface.com> wrote:

>> 
>> The 3rd node should (and needs to be) fenced at this point to allow the
>> cluster to continue.
>> Is this not happening?
> 
> The fencing operation appears to complete successfully, here is the
> sequence:
> 
> [1] All 3 nodes running properly
> [2] On node 3 I run "echo c > /proc/sysrq-trigger" which "hangs" node3
> [3] The fence_test03 resources executes a fence operation on node 3 (fires a
> shutdown/startup on the vm)
> [4] dlm shows kern_stop state while node 3 is being fenced
> [5] node 3 reboots, and node 1 & 2 operate as normal (clvmd and gfs2 work
> properly, dlm notified that fence successful (2 members in each lock group))
> [6] While node 3 is booting, cman starts properly then clvmd starts but
> hangs on boot

I would really love to see logs at this point.
Both from pacemaker and the system in general (and clvmd if it produces any).

Based on what you say below, there doesn't seem to be a good reason for the hang (ie. no reason to be trying to fence anyone)

> [7] While node 3 is "hung" at the clvmd stage, node 1 & 2 are unable to
> perform lvm operations due to node 3 attempting to join the clvmd "group".
> Dlm shows that node 3 is a member, cman sees node 3 as a cluster member,
> however, pacemaker has not started as clvmd is not successfully started.
> 
> Because pacemaker is not "up" and because I do not have clvmd as a resource
> definition, there is no fence performed if/when clvmd fails.
> 
> Other than the above, fencing appears to be working properly. Are there some
> other fencing tests you may like me to perform to verify that fencing is
> working as expected?
> 
>> 
>> Did you specify on-fail=fence for the clvmd agent?
>> 
> 
> 
> Hmmm, I don't have any clvmd agents defined within pacemaker at the moment
> as I am starting clvmd outside of pacemaker control.

Right. I forgot. Sorry. Carry on :-)
There have been a bunch of discussions going on regarding clvmd in rhel7 and they got muddled in my head.

> 
> In my original post I had clvmd and dlm defined as a clone resource under
> pacemaker control. My understanding from the responses to that post was to
> remove those resources from pacemaker control and run clvmd on boot and dlm
> would be managed by cman startup. Are you saying that I should have
> dlm/clvmd defined as pacemaker resources and still have clvmd start on
> bootup?
> 
> For example, originally I defined dlm/clvmd under pacemaker control as
> follows:
> 
> pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s
> on-fail=fence clone interleave=true ordered=true
> pcs resource create clvmd lsb:clvmd op monitor interval=30s on-fail=fence
> clone interleave=true ordered=true
> 
> However, right now, the above two resource definitions have been removed
> from pacemaker.
> 
> Thanks for your time (and others too) thus far in assisting me with this
> issue.
> 
> Thanks
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140219/269edde9/attachment-0003.sig>