[Pacemaker] understanding resource restarts through pengine
Oualid Nouri
o.nouri at computer-lan.de
Tue Sep 20 12:10:39 UTC 2011
Hi,
I'm testing pacemaker resource failover in a very simple test environment with two virtual machines.
3 Cloned resources (drbd dualprimary), controld, clvm.
Fencing with external/ssh that's it.
I'm having problems understanding why my clvm resource gets restarted when a failing node gets back online.
When one node is powerd off (failtest) the remaining node fences the "failing" node and the clvm-resource stays online.
But when the failed node is back online the clvm resource clone on the previously "remaining " node gets restarted without visible reason (see logs)
I gues doing something wrong!
But what?
Anyone who can point me in the right direction?
Thank you!
Sep 20 13:18:41 tnode2 crmd: [3121]: info: do_pe_invoke: Query 228: Requesting the current CIB: S_POLICY_ENGINE
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: unpack_config: On loss of CCM Quorum: Ignore
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: unpack_rsc_op: Operation res_drbd_1:1_monitor_0 found resource res_drbd_1:1 active on tnode1
Sep 20 13:18:41 tnode2 crmd: [3121]: info: do_pe_invoke_callback: Invoking the PE: query=228, ref=pe_calc-dc-1316517521-176, seq=1268, quorate=1
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: unpack_rsc_op: Operation res_drbd_1:0_monitor_0 found resource res_drbd_1:0 active on tnode2
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: clone_print: Master/Slave Set: ms_drbd_1 [res_drbd_1]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print: Masters: [ tnode2 ]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print: Slaves: [ tnode1 ]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: clone_print: Clone Set: cl_controld_1 [res_controld_dlm]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print: Started: [ tnode2 ]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print: Stopped: [ res_controld_dlm:1 ]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: native_print: stonith_external_ssh_1#011(stonith:external/ssh):#011Started tnode1
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: native_print: stonith_external_ssh_2#011(stonith:external/ssh):#011Started tnode2
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: clone_print: Clone Set: cl_clvmd_1 [res_clvmd_clustervg]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print: Started: [ tnode2 ]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print: Stopped: [ res_clvmd_clustervg:1 ]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: RecurringOp: Start recurring monitor (60s) for res_controld_dlm:1 on tnode1
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Leave res_drbd_1:0#011(Master tnode2)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Promote res_drbd_1:1#011(Slave -> Master tnode1)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Leave res_controld_dlm:0#011(Started tnode2)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Start res_controld_dlm:1#011(tnode1)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Leave stonith_external_ssh_1#011(Started tnode1)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Leave stonith_external_ssh_2#011(Started tnode2)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Restart res_clvmd_clustervg:0#011(Started tnode2)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Start res_clvmd_clustervg:1#011(tnode1)
CONFIG
node tnode1 \
attributes standby="off"
node tnode2 \
attributes standby="off"
primitive res_clvmd_clustervg ocf:lvm2:clvmd \
params daemon_timeout="30" \
operations $id="res_clvmd_clustervg-operations" \
op monitor interval="0" timeout="4min" start-delay="5"
primitive res_controld_dlm ocf:pacemaker:controld \
operations $id="res_controld_dlm-operations" \
op monitor interval="60" timeout="60" start-delay="0" \
meta target-role="started"
primitive res_drbd_1 ocf:linbit:drbd \
params drbd_resource="r0" \
operations $id="res_drbd_1-operations" \
op start interval="0" timeout="240" \
op promote interval="0" timeout="90" \
op demote interval="0" timeout="90" \
op stop interval="0" timeout="100" \
op monitor interval="10" timeout="20" start-delay="1min" \
op notify interval="0" timeout="90" \
meta target-role="started" is-managed="true"
primitive stonith_external_ssh_1 stonith:external/ssh \
params hostlist="tnode2" \
operations $id="stonith_external_ssh_1-operations" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60" \
op monitor interval="60" timeout="60" start-delay="0" \
meta failure-timeout="3"
primitive stonith_external_ssh_2 stonith:external/ssh \
params hostlist="tnode1" \
operations $id="stonith_external_ssh_2-operations" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60" \
op monitor interval="60" timeout="60" start-delay="0" \
meta target-role="started" failure-timeout="3"
ms ms_drbd_1 res_drbd_1 \
meta master-max="2" clone-max="2" notify="true" ordered="true" interleave="true"
clone cl_clvmd_1 res_clvmd_clustervg \
meta clone-max="2" notify="true"
clone cl_controld_1 res_controld_dlm \
meta clone-max="2" notify="true" ordered="true" interleave="true"
location loc_ms_drbd_1-ping-prefer ms_drbd_1 \
rule $id="loc_ms_drbd_1-ping-prefer-rule" pingd: defined pingd
location loc_stonith_external_ssh_1_tnode2 stonith_external_ssh_1 -inf: tnode2
location loc_stonith_external_ssh_2_tnode1 stonith_external_ssh_2 -inf: tnode1
colocation col_cl_controld_1_cl_clvmd_1 inf: cl_clvmd_1 cl_controld_1
colocation col_ms_drbd_1_cl_controld_1 inf: cl_controld_1 ms_drbd_1:Master
order ord_cl_controld_1_cl_clvmd_1 inf: cl_controld_1 cl_clvmd_1
order ord_ms_drbd_1_cl_controld_1 inf: ms_drbd_1:promote cl_controld_1:start
property $id="cib-bootstrap-options" \
expected-quorum-votes="2" \
stonith-timeout="30" \
dc-version="1.1.5-ecb6baaf7fc091b023d6d4ba7e0fce26d32cf5c8" \
no-quorum-policy="ignore" \
cluster-infrastructure="openais" \
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110920/270f4941/attachment-0003.html>
More information about the Pacemaker
mailing list