[Pacemaker] Dual primary drbd + ocfs2: problems starting o2cb
Elmar Marschke
elmar.marschke at schenker.at
Fri Aug 16 09:04:44 EDT 2013
Hi all,
i'm working on a two node pacemaker cluster with dual primary drbd and
ocfs2.
Dual pri drbd and ocfs2 WITHOUT pacemaker work fine (mounting, reading,
writing, everything...).
When i try to make this work in pacemaker, there seems to be a problem
to start the o2cb resource.
My (already simplified) configuration is:
-----------------------------------------
node poc1 \
attributes standby="off"
node poc2 \
attributes standby="off"
primitive res_dlm ocf:pacemaker:controld \
op monitor interval="120"
primitive res_drbd ocf:linbit:drbd \
params drbd_resource="r0" \
op stop interval="0" timeout="100" \
op start interval="0" timeout="240" \
op promote interval="0" timeout="90" \
op demote interval="0" timeout="90" \
op notifiy interval="0" timeout="90" \
op monitor interval="40" role="Slave" timeout="20" \
op monitor interval="20" role="Master" timeout="20"
primitive res_o2cb ocf:pacemaker:o2cb \
op monitor interval="60"
ms ms_drbd res_drbd \
meta notify="true" master-max="2" master-node-max="1" target-role="Started"
property $id="cib-bootstrap-options" \
no-quorum-policy="ignore" \
stonith-enabled="false" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
last-lrm-refresh="1376574860"
First error message in corosync.log as far as i can identify it:
----------------------------------------------------------------
lrmd: [5547]: info: RA output: (res_dlm:probe:stderr) dlm_controld.pcmk:
no process found
[ other stuff ]
lrmd: [5547]: info: RA output: (res_dlm:start:stderr) dlm_controld.pcmk:
no process found
[ other stuff ]
lrmd: [5547]: info: RA output: (res_o2cb:start:stderr)
2013/08/16_13:25:20 ERROR: ocfs2_controld.pcmk did not come up
(
You can find the whole corosync logfile (starting corosync on node 1
from beginning until after starting of resources) on:
http://www.marschke.info/corosync_drei.log
)
syslog shows:
-------------
ocfs2_controld.pcmk[5774]: Unable to connect to CKPT: Object does not exist
Output of crm_mon:
------------------
============
Stack: openais
Current DC: poc1 - partition WITHOUT quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
4 Resources configured.
============
Online: [ poc1 ]
OFFLINE: [ poc2 ]
Master/Slave Set: ms_drbd [res_drbd]
Masters: [ poc1 ]
Stopped: [ res_drbd:1 ]
res_dlm (ocf::pacemaker:controld): Started poc1
Migration summary:
* Node poc1:
res_o2cb: migration-threshold=1000000 fail-count=1000000
Failed actions:
res_o2cb_start_0 (node=poc1, call=6, rc=1, status=complete):
unknown error
---------------------------------------------------------------------
This is the situation after a reboot of node poc1. For simplification i
left pacemaker / corosync unstarted on the second node, and already
removed a group and a clone resource where dlm and o2cb already had been
in (errors were there also).
Is my configuration of the resource agents correct?
I checked using "ra meta ...", but as far as i recognized everything is ok.
Is some piece of software missing?
dlm-pcmk is installed, ocfs2_controld.pcmk and dlm_controld.pcmk are
available, i even did additional links in /usr/sbin:
root at poc1:~# which ocfs2_controld.pcmk
/usr/sbin/ocfs2_controld.pcmk
root at poc1:~# which dlm_controld.pcmk
/usr/sbin/dlm_controld.pcmk
root at poc1:~#
I already googled but couldn't find any useful. Thanks for any hints...:)
kind regards
elmar
More information about the Pacemaker
mailing list