[Pacemaker] Dual primary drbd + ocfs2: problems starting o2cb
Jake Smith
jsmith at argotec.com
Fri Aug 16 13:46:15 UTC 2013
> -----Original Message-----
> From: Elmar Marschke [mailto:elmar.marschke at schenker.at]
> Sent: Friday, August 16, 2013 9:05 AM
> To: The Pacemaker cluster resource manager
> Subject: [Pacemaker] Dual primary drbd + ocfs2: problems starting o2cb
>
> Hi all,
>
> i'm working on a two node pacemaker cluster with dual primary drbd and
> ocfs2.
>
> Dual pri drbd and ocfs2 WITHOUT pacemaker work fine (mounting, reading,
> writing, everything...).
>
> When i try to make this work in pacemaker, there seems to be a problem
to
> start the o2cb resource.
>
> My (already simplified) configuration is:
> -----------------------------------------
> node poc1 \
> attributes standby="off"
> node poc2 \
> attributes standby="off"
> primitive res_dlm ocf:pacemaker:controld \
> op monitor interval="120"
> primitive res_drbd ocf:linbit:drbd \
> params drbd_resource="r0" \
> op stop interval="0" timeout="100" \
> op start interval="0" timeout="240" \
> op promote interval="0" timeout="90" \
> op demote interval="0" timeout="90" \
> op notifiy interval="0" timeout="90" \
> op monitor interval="40" role="Slave" timeout="20" \
> op monitor interval="20" role="Master" timeout="20"
> primitive res_o2cb ocf:pacemaker:o2cb \
> op monitor interval="60"
> ms ms_drbd res_drbd \
> meta notify="true" master-max="2" master-node-max="1" target-
> role="Started"
> property $id="cib-bootstrap-options" \
> no-quorum-policy="ignore" \
> stonith-enabled="false" \
> dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> last-lrm-refresh="1376574860"
>
Looks like you are missing ordering and colocation and clone (even group
to make it a shorter config; group = order and colocation in one
statement) statements. The resources *must* start in a particular order
and they much run on the same node and there must be an instance of each
resource on each node.
More here for DRBD 8.4:
http://www.drbd.org/users-guide/s-ocfs2-pacemaker.html
Or DRBD 8.3:
http://www.drbd.org/users-guide-8.3/s-ocfs2-pacemaker.html
Basically add:
Group grp_dlm_o2cb res_dlm res_o2cb
Clone cl_dlm_o2cb grp_dlm_o2cb meta interleave=true
Order ord_drbd_then_dlm_o2cb res_drbd:promote cl_dlm_o2cb:start
Colocation col_dlm_o2cb_with_drbdmaster cl_dlm_o2cb res_drbd:Master
HTH
Jake
> First error message in corosync.log as far as i can identify it:
> ----------------------------------------------------------------
> lrmd: [5547]: info: RA output: (res_dlm:probe:stderr) dlm_controld.pcmk:
> no process found
> [ other stuff ]
> lrmd: [5547]: info: RA output: (res_dlm:start:stderr) dlm_controld.pcmk:
> no process found
> [ other stuff ]
> lrmd: [5547]: info: RA output: (res_o2cb:start:stderr)
> 2013/08/16_13:25:20 ERROR: ocfs2_controld.pcmk did not come up
>
> (
> You can find the whole corosync logfile (starting corosync on node 1
from
> beginning until after starting of resources) on:
> http://www.marschke.info/corosync_drei.log
> )
>
> syslog shows:
> -------------
> ocfs2_controld.pcmk[5774]: Unable to connect to CKPT: Object does not
> exist
>
>
> Output of crm_mon:
> ------------------
> ============
> Stack: openais
> Current DC: poc1 - partition WITHOUT quorum
> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> 2 Nodes configured, 2 expected votes
> 4 Resources configured.
> ============
>
> Online: [ poc1 ]
> OFFLINE: [ poc2 ]
>
> Master/Slave Set: ms_drbd [res_drbd]
> Masters: [ poc1 ]
> Stopped: [ res_drbd:1 ]
> res_dlm (ocf::pacemaker:controld): Started poc1
>
> Migration summary:
> * Node poc1:
> res_o2cb: migration-threshold=1000000 fail-count=1000000
>
> Failed actions:
> res_o2cb_start_0 (node=poc1, call=6, rc=1, status=complete):
> unknown error
>
> ---------------------------------------------------------------------
> This is the situation after a reboot of node poc1. For simplification i
left
> pacemaker / corosync unstarted on the second node, and already removed a
> group and a clone resource where dlm and o2cb already had been in
(errors
> were there also).
>
> Is my configuration of the resource agents correct?
> I checked using "ra meta ...", but as far as i recognized everything is
ok.
>
> Is some piece of software missing?
> dlm-pcmk is installed, ocfs2_controld.pcmk and dlm_controld.pcmk are
> available, i even did additional links in /usr/sbin:
> root at poc1:~# which ocfs2_controld.pcmk
> /usr/sbin/ocfs2_controld.pcmk
> root at poc1:~# which dlm_controld.pcmk
> /usr/sbin/dlm_controld.pcmk
> root at poc1:~#
>
> I already googled but couldn't find any useful. Thanks for any
hints...:)
>
> kind regards
> elmar
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list