[Pacemaker] Build dlm_controld for pacemaker stack (dlm_controld.pcmk)

Mon Nov 5 06:33:59 UTC 2012

05.11.2012 08:40, Andrew Beekhof wrote:
> On Fri, Nov 2, 2012 at 6:22 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>> 02.11.2012 02:05, Andrew Beekhof wrote:
>>> On Thu, Nov 1, 2012 at 5:09 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>>> 01.11.2012 02:47, Andrew Beekhof wrote:
>>>> ...
>>>>>>
>>>>>> One remark about that - it requires that gfs2 communicates with dlm in
>>>>>> the kernel space - so gfs_controld is not longer required. I think
>>>>>> Fedora 17 is the first version with that feature. And it is definitely
>>>>>> not available for EL6 (centos6 which I use).
>>>>>>
>>>>>> But I have preliminary success running GFS2 with corosync2 and pacemaker
>>>>>> 1.1.8 on EL6. dlm4 runs just fine as is (although it misses some
>>>>>> featured on EL6 because of kernel). And it still includes (not
>>>>>> documented) option enable_fscontrol, so user-space communication with fs
>>>>>> control daemons is supported. Even it that feature will be removed
>>>>>> upstream, it can be easily returned back - just several lines of code.
>>>>>> And I ported gfs_controld from cman to corosync2 (patch is very dirty
>>>>>> yet, made with scissors and needle, just a proof-of-concept that it even
>>>>>> can work). Some features are unsupported (f.e. nodir) and will not be
>>>>>> implemented by me.
>>>>>
>>>>> I'm impressed.  What was the motivation though?  You really really
>>>>> don't like CMAN? :-)
>>>>
>>>> Why should I like software which is going to die? ;)
>>>>
>>>> I believe that how things are done currently (third case from your list)
>>>> fully reflect my "perfectionistic" needs. I had many problems with
>>>> cman+pacemaker in a past. Most critical is that pacemaker and
>>>> dlm_controld react differently when node reappears back very soon after
>>>> if was lost (because pacemaker uses totem ? directly for membership, but
>>>> dlm uses CPG).
>>>
>>> We both get it from the CPG and quorum APIs for option 3.
>>
>> Yes, but not for 1 nor for 2.
> 
> Not quite. We used to ignore it for option 2, but not anymore.
> Option 2 uses CPG for messaging.
> 
>> I saw described behavior with both of
>> them, but not with 3.
>> That's why I decided to go with 3 which I think conceptually right.
>>
>>>
>>>> Pacemaker accepts that, but controld freezes lockspaces,
>>>> waiting for fencing. But fencing is never done because nobody handles
>>>> "node lost" CPG event.
>>>
>>> WTF.  Pacemaker should absolutely do this.  Bug report?
>>
>> Sorry for being unclear.
>> I saw that with both 1 and 2 (where pacemaker did not use CPG), until I
>> "fixed" fencing at dlm layer for 1. I modified it to request fencing if
>> "node down" event occurs and then did not see freezes anymore. From what
>> I understand, "node down" CPG event occurs when corosync forms
>> transitional membership (at least pacemaker logged lines about that at
>> the same time with dlm freeze. And if stable membership occurs
>> (milli-)seconds after transitional one, pacemaker (as of probable 1.1.6)
>> did not fence re-appeared node. I can understand that - pacemaker can
>> absolutely live with that. But dlm cannot.
> 
> Right. Any sort of membership hiccup is fatal as far as the dlm is concerned.
> But even with options 1 and 2, it should still make a fencing request.

I'm afraid no. At least not with 3.0.17 or 3.1.7. Sources are clear
about that - CPG node down event does not result in fencing requested by
dlm_controld. And that was a major problem for me with options 1 and 2.
One-line patch solved that though. But I decided that cman is a no-go
for me anymore because such critical issues as proper fencing should be
tested thoroughly and if they are not, then I will feel like sitting on
a bomb with it.

> 
> Without fence_pcmk in cluster.conf that request might have gotten
> lost, but with 1.1.8 I would expect the node to be shot - regardless
> of whether the rest of Pacemaker thought it was ok.
> Thats why going direct to stonithd was an important change.

Aha. I tried cman last time before fence_pcmk was written (and before
that fencing call dlm_controld.pcmk uses was modified to go straight to
stonithd). I recall I was polishing option 1 that time (after throwing
cman away), and first revision of that move did not work because it used
async libstonithd call to fence a node. That's why I used direct calls
to stonith in my version of dlm_controld.pcmk. All that resulted in
fully-working stack and I decided to go with option 3 only after hearing
from you that you do not test pacemaker with corosync1 yourselves anymore.

That was second major problem with option 1 - before all that changes
there was a possibility for fencing request to be dropped silently. And
I actually hit that. I do not know if it fully works with stock 3.0.17
dlm_controld.pcmk (I suspect no because of issue 1) but with my builds
it is stable.

Anyways, I seem to be happy with option 3 on EL6, it introduces clean
and straight-forward model of cluster stack and it works perfectly, so I
do not see any reasons to return back to option 1 or 2.

> 
>> And it is its task to do
>> proper fencing in case it cannot work, not pacemaker's. But that piece
>> was missing there. The same is (probably, I may be damn wrong here) true
>> for cman - I did a quick search for a CPG "node down" handler in its
>> sources but didn't find one. I suspect it was handled by some deprecated
>> daemon (f.e. groupd) in the past, but as of 3.1.7 I did not observe
>> handling for that.
>>
>> As I go with option 3, I should not see that anymore even theoretically.
>>
>> So no bug report for what I wont use anymore :)
>>
>>>
>>>> dlm does start fencing for "process lost", but
>>>> not for "node lost".
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>