[Pacemaker] Build dlm_controld for pacemaker stack (dlm_controld.pcmk)

Andrew Beekhof andrew at beekhof.net
Wed Oct 31 23:53:47 UTC 2012


On Tue, Oct 30, 2012 at 6:15 PM, Vladislav Bogdanov
<bubble at hoster-ok.com> wrote:
> 29.10.2012 19:51, Bernardo Cabezas Serra wrote:
>> Hello,
>>
>> disclaimer: I have posted this issue to linux-ha list too a couple of
>> days ago. I'm sorry if this is not the correct list, and thanks if you
>> can give me a hint about which cluster stack should I use for ocfs2 by now.
>>
>> I'm trying to compile all stack for corosync + pacemaker + dlm + ocfs2
>> (with dlm_controld.pmk), without cman stack. I'm following the "From
>> source" Pacemaker guide.
>>
>> After some days trying to compile the correct combination of
>> sources/versions, I have no success, and I'm not sure if at this moment
>> this is possible.
>>
>> The fist problem was that cluster removed support for dlm_controld with
>> pacemaker stack. Last version with support was 3.0.17.
>> But this was done some years ago, and as far as I have been able to
>> understand, things are still broken.
>>
>>
>> The most relevant info found about this issue are these threads from
>> Andrew Beekhof and Vladislav Bogdanov, wich suggest to compile
>> dlm_controld from Cluster, applying some patches. They report it worked
>> (whith some remaining issues):
>>
>> http://oss.clusterlabs.org/pipermail/pacemaker/2009-October/003064.html
>> http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg09959.html
>>
>> But most recent issue about this is a year ago, and seems that things
>> are still broken.
>> I haven't been able to compile, with lots of errors, so I'm currently
>> asking if this is the right way, becouse seems that nobody else is
>> willing to use this...
>
> I still run that on two clusters with post-1.1.7 pacemaker, but not with
> 1.1.8. Just looked, it is fe859a7 (Apr 11, two weeks after 1.1.7) with
> two dozens of cherry-picked patches. I should migrate them to
> corosync2/pacemaker-1.1.8 shortly.
>
> What exactly errors do you see? Pacemaker APIs used there received some
> changes between 1.1.7 and 1.1.8. I have one more patch which I tried
> with pacemaker master Aug 22 (close enough to 1.1.8, but some APIs
> changed again after that point). That version did not work for me with
> corosync14 because of bug fixed after that and I decided to move to
> corosync2 right after that failure to be more upstream-compatible. I
> can't say if it help you, but you may want to try. Should I post it?
>
> Main issue with "pcmk" version of all that daemons is that fs control
> daemons and dlm_controld require ability to request fencing. Originally
> it was done by calling some high-level pacemaker APIs
> (crm_terminate_member_no_mainloop()) and that did not work in some
> circumstances.
> Andrew developed brand-new stonith-ng subsystem which is used for
> fencing in that version of dlm_controld you talk about (with my patches
> on top of Andrew's patches).

crm_terminate_member_no_mainloop() should still work though
(specifically because I knew the old controld's used it).
You just need the (new) compatibility header and the result will be
/very/ reliable - there is no crmd/pengine involvement anymore, you go
straight to the fencing daemon.

> I suspect that ocfs_controld.pcmk (like gfs_controld.pcmk in 3.0.17)
> still uses that old way. If that is true, then it can't work reliably. I
> tried to port gfs_controld to use stonith-ng with corosync1/openais
> (included in the patch I talk about above), but I did not test it at all
> (although I just ported it to corosync2/dlm4 and it works in a testing
> setup, see my answer to David).
>
> Vladislav
>
>>
>>
>> At cluster page, they state that now DLM code has been separated from
>> cluster:
>> https://fedorahosted.org/cluster/wiki/HomePage
>>
>> But this dlm project (that seems to have pcmk support), depends on
>> corosync 2.0, so it can't run with last pacemaker (1.1.8). (can it?)
>> http://git.fedorahosted.org/git/dlm.git
>>
>> Before spending more time with this, I wanted to ask for the right way
>> to do things.
>> So Questions are:
>>
>> (1) Is it by now possible to get an ocfs2 corosync + pacemaker cluster,
>> without cman, and dlm_controld with pcmk stack? (if yes which
>> repos/versions)?
>> (2) What is the future roadmap about this? Will future corosync2.0
>> cluster have dlm issues addressed?
>>
>> Also, I have read (also Andrew post) that OCFS2 cluster could have
>> problems on top of corosync 2.0, as OCFS2 has't ben ported (GFS2 was
>> ported).
>> http://www.gossamer-threads.com/lists/linuxha/pacemaker/78538
>> so:
>> (3) Is GSF2 a better future option in terms of support, for linux-ha
>> clusters?
>>
>>
>> More details about pcmk dlm_controld:
>> I found that Suse have always been mantaining  cman-free cluster stack,
>> so I have tried to find dlm in its packages.
>> Found:
>> http://rpmfind.net//linux/RPM/opensuse/factory/x86_64/libdlm-3.00.01-24.5.x86_64.html
>>
>>
>> But also I have had lots of compilation problems, trying several
>> pacemaker, versions, also the suse-patched ones. Haven't been able to
>> successfully complie a dlm_controld.
>>
>>
>> Thanks and Regards,
>> Bernardo
>> --
>> APSL
>> APSL
>> *Bernardo Cabezas Serra*
>> *Responsable Sistemas*
>> Ada Byron, edificio NTIC 2ºA
>> 07121 ParcBit
>> Mail: bcabezas at apsl.net
>> Skype: bernat.cabezas
>> Tel: 971439771
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



More information about the Pacemaker mailing list