[Pacemaker] pacemaker/dlm problems

Tue Sep 27 08:24:02 UTC 2011

27.09.2011 10:56, Andrew Beekhof wrote:
> On Tue, Sep 27, 2011 at 5:07 PM, Vladislav Bogdanov
> <bubble at hoster-ok.com> wrote:
>> 27.09.2011 08:59, Andrew Beekhof wrote:
>> [snip]
>>>>>>>> I agree with Jiaju
>>>>>>>> (https://lists.linux-foundation.org/pipermail/openais/2011-September/016713.html),
>>>>>>>> that could be solely pacemaker problem, because it probably should
>>>>>>>> originate fencing itself is such situation I think.
>>>>>>>>
>>>>>>>> So, using pacemaker/dlm with openais stack is currently risky due to
>>>>>>>> possible hangs of dlm_lockspaces.
>>>>>>>
>>>>>>> It shouldn't be, failing to connect to attrd is very unusual.
>>>>>>
>>>>>> By the way, one of underlying problems, which actually made me to notice
>>>>>> all this, is that pacemaker cluster does not fence its DC if it leaves
>>>>>> the cluster for a very short time. That is what Jiaju told in his notes.
>>>>>> And I can confirm that.
>>>>>
>>>>> Thats highly surprising.  Do the logs you sent display this behaviour?
>>>>
>>>> They do. Rest of the cluster begins the election, but then accepts
>>>> returned DC back (I write this from memory, I looked at logs Sep 5-6, so
>>>> I may mix up something).
>>>
>>> Actually, this might be possible - if DC.old came back before DC.new
>>> had a chance to get elected, run the PE and initiate fencing, then
>>> there would be no need to fence.
>>>
>>
>> (text below is for pacemaker on top of openais stack, not for cman)
>>
>> Except dlm lockspaces are in kern_stop state, so a whole dlm-related
>> part is frozen :( - clvmd in my case, but I expect the same from gfs2
>> and ocfs2.
>> And fencing requests originated on CPG NODEDOWN event by dlm_controld
>> (with my patch to dlm_controld and your patch for
>> crm_terminate_member_common()) on a quorate partition are lost. DC.old
>> doesn't accept CIB updates from other nodes, so that fencing requests
>> are discarded.
> 
> All the more reason to start using the stonith api directly.
> I was playing around list night with the dlm_controld.pcmk code:
>    https://github.com/beekhof/dlm/commit/9f890a36f6844c2a0567aea0a0e29cc47b01b787

Wow, I'll try it!

Btw (offtopic), don't you think that it could be interesting to have
stacks support in dlopened modules there? From what I see in that code,
it could be almost easily achieved. One just needs to create module API
structure, enumerate functions in each stack, add module loading to
dlm_controld core and change calls to module functions.

> 
>>
>> I think that problem is that membership changes are handled in a
>> non-transactional way (?).
> 
> Sounds more like the dlm/etc is being dumb - if the host is back and
> healthy, why would we want to shoot it?

Ammmm..... No comments from me on this ;)

But, anyways, something needs to be done at either side...

> 
>> If pacemaker fully finish processing of one membership change - elect
>> new DC on a quorate partition, and do not try to take over dc role (or
>> release it) on a non-quorate partition if quorate one exists, that
>> problem could be gone.
> 
> Non quorate partitions still have a DC.
> They're just not supposed to do anything (depending on the value of
> no-quorum-policy).

I actually meant "do not try to take over dc role in a rejoined cluster
(or release that role) if it was running on a non-quorate partition
before rejoin if quorate one existed". Sorry for confusion. Not very
natural wording again, but should be better.

May be DC from non-quorate partition should just have lower priority to
become DC when cluster rejoins and new election happen (does it?)?

> 
>> I didn't dig into code so much, so all above is just my deduction which
>> may be completely wrong.
>> And of course real logic could (should) be much more complicated, with
>> handling of just rebooted members, etc.
>>
>> (end of openais specific part)
>>
>>>> [snip]
>>>>>>>> Although it took 25 seconds instead of 3 to break the cluster (I
>>>>>>>> understand, this is almost impossible to load host so much, but
>>>>>>>> anyways), then I got a real nightmare: two nodes of 3-node cluster had
>>>>>>>> cman stopped (and pacemaker too because of cman connection loss) - they
>>>>>>>> asked to kick_node_from_cluster() for each other, and that succeeded.
>>>>>>>> But fencing didn't happen (I still need to look why, but this is cman
>>>>>>>> specific).
>>>>
>>>> Btw this part is tricky for me to understand the underlying logic:
>>>> * cman just stops cman processes on remote nodes, disregarding the
>>>> quorum. I hope that could be fixed in corosync If I understand one of
>>>> latest threads there right.
>>>> * But cman does not do fencing of that nodes, and they still run
>>>> resources. And this could be extremely dangerous under some
>>>> circumstances. And cman does not do fencing even if it has fence devices
>>>> configure in cluster.conf (I verified that).
>>>>
>>>>>>>> Remaining node had pacemaker hanged, it doesn't even
>>>>>>>> notice cluster infrastructure change, down nodes were listed as a
>>>>>>>> online, one of them was a DC, all resources are marked as started on all
>>>>>>>> (down too) nodes. No log entries from pacemaker at all.
>>>>>>>
>>>>>>> Well I can't see any logs from anyone to its hard for me to comment.
>>>>>>
>>>>>> Logs are sent privately.
>>>>>>
>>>>>>>
>>>>
>>>> Vladislav
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker