[Pacemaker] Dual primary drbd + ocfs2: problems starting o2cb

Vladislav Bogdanov bubble at hoster-ok.com
Mon Aug 19 10:25:35 EDT 2013


16.08.2013 16:04, Elmar Marschke wrote:
> Hi all,
> 
> i'm working on a two node pacemaker cluster with dual primary drbd and
> ocfs2.
> 
> Dual pri drbd and ocfs2 WITHOUT pacemaker work fine (mounting, reading,
> writing, everything...).

ocfs2 uses own clustering stack by default.

> 
> When i try to make this work in pacemaker, there seems to be a problem
> to start the o2cb resource.
> 
> My (already simplified) configuration is:
> -----------------------------------------
> node poc1 \
>     attributes standby="off"
> node poc2 \
>     attributes standby="off"
> primitive res_dlm ocf:pacemaker:controld \
>     op monitor interval="120"
> primitive res_drbd ocf:linbit:drbd \
>     params drbd_resource="r0" \
>     op stop interval="0" timeout="100" \
>     op start interval="0" timeout="240" \
>     op promote interval="0" timeout="90" \
>     op demote interval="0" timeout="90" \
>     op notifiy interval="0" timeout="90" \
>     op monitor interval="40" role="Slave" timeout="20" \
>     op monitor interval="20" role="Master" timeout="20"
> primitive res_o2cb ocf:pacemaker:o2cb \
>     op monitor interval="60"
> ms ms_drbd res_drbd \
>     meta notify="true" master-max="2" master-node-max="1"
> target-role="Started"
> property $id="cib-bootstrap-options" \
>     no-quorum-policy="ignore" \
>     stonith-enabled="false" \
>     dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
>     cluster-infrastructure="openais" \
>     expected-quorum-votes="2" \
>     last-lrm-refresh="1376574860"

Side note: you need to run both dlm and o2cb as clones, and group them
(either with "group" or with pair of colocation/order statements), so so
ocfs2_controld is started when dlm_controld already runs. You probably
already tried that, but do not forget the last part of this.

> 
> 
> First error message in corosync.log as far as i can identify it:
> ----------------------------------------------------------------
> lrmd: [5547]: info: RA output: (res_dlm:probe:stderr) dlm_controld.pcmk:
> no process found
> [ other stuff ]
> lrmd: [5547]: info: RA output: (res_dlm:start:stderr) dlm_controld.pcmk:
> no process found
> [ other stuff ]
>  lrmd: [5547]: info: RA output: (res_o2cb:start:stderr)
> 2013/08/16_13:25:20 ERROR: ocfs2_controld.pcmk did not come up
> 
> (
> You can find the whole corosync logfile (starting corosync on node 1
> from beginning until after starting of resources) on:
> http://www.marschke.info/corosync_drei.log
> )
> 
> syslog shows:
> -------------
> ocfs2_controld.pcmk[5774]: Unable to connect to CKPT: Object does not exist

How exactly did you start corosync process? As "corosync" or as "openais"?
Background is that CKPT service is not loaded by corosync by default,
only if it is started by openais script, you may want to look at it for
details.

> 
> 
> Output of crm_mon:
> ------------------
> ============
> Stack: openais
> Current DC: poc1 - partition WITHOUT quorum
> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> 2 Nodes configured, 2 expected votes
> 4 Resources configured.
> ============
> 
> Online: [ poc1 ]
> OFFLINE: [ poc2 ]
> 
>  Master/Slave Set: ms_drbd [res_drbd]
>      Masters: [ poc1 ]
>      Stopped: [ res_drbd:1 ]
>  res_dlm    (ocf::pacemaker:controld):    Started poc1
> 
> Migration summary:
> * Node poc1:
>    res_o2cb: migration-threshold=1000000 fail-count=1000000
> 
> Failed actions:
>     res_o2cb_start_0 (node=poc1, call=6, rc=1, status=complete): unknown
> error
> 
> ---------------------------------------------------------------------
> This is the situation after a reboot of node poc1. For simplification i
> left pacemaker / corosync unstarted on the second node, and already
> removed a group and a clone resource where dlm and o2cb already had been
> in (errors were there also).
> 
> Is my configuration of the resource agents correct?
> I checked using "ra meta ...", but as far as i recognized everything is ok.
> 
> Is some piece of software missing?
> dlm-pcmk is installed, ocfs2_controld.pcmk and dlm_controld.pcmk are
> available, i even did additional links in /usr/sbin:
> root at poc1:~# which ocfs2_controld.pcmk
> /usr/sbin/ocfs2_controld.pcmk
> root at poc1:~# which dlm_controld.pcmk
> /usr/sbin/dlm_controld.pcmk
> root at poc1:~#
> 
> I already googled but couldn't find any useful. Thanks for any hints...:)
> 
> kind regards
> elmar
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Pacemaker mailing list