[Pacemaker] understanding resource restarts through pengine

Oualid Nouri o.nouri at computer-lan.de
Tue Sep 20 08:10:39 EDT 2011


Hi,
I'm testing pacemaker resource failover  in a very simple test environment with two virtual machines.
3 Cloned resources (drbd dualprimary), controld, clvm.
Fencing with external/ssh that's it.
I'm having problems understanding why my clvm resource gets restarted when a  failing node gets back online.

When one node is powerd off (failtest) the remaining node fences the "failing" node and the clvm-resource stays online.
But when the failed node is back online the clvm resource clone on the previously "remaining " node gets restarted without visible reason (see logs)

I gues doing something wrong!
But what?
Anyone who can point me in the right direction?


Thank you!



Sep 20 13:18:41 tnode2 crmd: [3121]: info: do_pe_invoke: Query 228: Requesting the current CIB: S_POLICY_ENGINE
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: unpack_config: On loss of CCM Quorum: Ignore
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: unpack_rsc_op: Operation res_drbd_1:1_monitor_0 found resource res_drbd_1:1 active on tnode1
Sep 20 13:18:41 tnode2 crmd: [3121]: info: do_pe_invoke_callback: Invoking the PE: query=228, ref=pe_calc-dc-1316517521-176, seq=1268, quorate=1
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: unpack_rsc_op: Operation res_drbd_1:0_monitor_0 found resource res_drbd_1:0 active on tnode2
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: clone_print:  Master/Slave Set: ms_drbd_1 [res_drbd_1]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print:      Masters: [ tnode2 ]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print:      Slaves: [ tnode1 ]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: clone_print:  Clone Set: cl_controld_1 [res_controld_dlm]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print:      Started: [ tnode2 ]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print:      Stopped: [ res_controld_dlm:1 ]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: native_print: stonith_external_ssh_1#011(stonith:external/ssh):#011Started tnode1
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: native_print: stonith_external_ssh_2#011(stonith:external/ssh):#011Started tnode2
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: clone_print:  Clone Set: cl_clvmd_1 [res_clvmd_clustervg]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print:      Started: [ tnode2 ]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print:      Stopped: [ res_clvmd_clustervg:1 ]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: RecurringOp:  Start recurring monitor (60s) for res_controld_dlm:1 on tnode1
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Leave   res_drbd_1:0#011(Master tnode2)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Promote res_drbd_1:1#011(Slave -> Master tnode1)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Leave   res_controld_dlm:0#011(Started tnode2)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Start   res_controld_dlm:1#011(tnode1)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Leave   stonith_external_ssh_1#011(Started tnode1)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Leave   stonith_external_ssh_2#011(Started tnode2)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Restart res_clvmd_clustervg:0#011(Started tnode2)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Start   res_clvmd_clustervg:1#011(tnode1)

CONFIG

node tnode1 \
        attributes standby="off"
node tnode2 \
        attributes standby="off"
primitive res_clvmd_clustervg ocf:lvm2:clvmd \
        params daemon_timeout="30" \
        operations $id="res_clvmd_clustervg-operations" \
        op monitor interval="0" timeout="4min" start-delay="5"
primitive res_controld_dlm ocf:pacemaker:controld \
        operations $id="res_controld_dlm-operations" \
        op monitor interval="60" timeout="60" start-delay="0" \
        meta target-role="started"
primitive res_drbd_1 ocf:linbit:drbd \
        params drbd_resource="r0" \
        operations $id="res_drbd_1-operations" \
        op start interval="0" timeout="240" \
        op promote interval="0" timeout="90" \
        op demote interval="0" timeout="90" \
        op stop interval="0" timeout="100" \
        op monitor interval="10" timeout="20" start-delay="1min" \
        op notify interval="0" timeout="90" \
        meta target-role="started" is-managed="true"
primitive stonith_external_ssh_1 stonith:external/ssh \
        params hostlist="tnode2" \
        operations $id="stonith_external_ssh_1-operations" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="60" timeout="60" start-delay="0" \
        meta failure-timeout="3"
primitive stonith_external_ssh_2 stonith:external/ssh \
        params hostlist="tnode1" \
        operations $id="stonith_external_ssh_2-operations" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="60" timeout="60" start-delay="0" \
        meta target-role="started" failure-timeout="3"
ms ms_drbd_1 res_drbd_1 \
        meta master-max="2" clone-max="2" notify="true" ordered="true" interleave="true"
clone cl_clvmd_1 res_clvmd_clustervg \
        meta clone-max="2" notify="true"
clone cl_controld_1 res_controld_dlm \
        meta clone-max="2" notify="true" ordered="true" interleave="true"
location loc_ms_drbd_1-ping-prefer ms_drbd_1 \
        rule $id="loc_ms_drbd_1-ping-prefer-rule" pingd: defined pingd
location loc_stonith_external_ssh_1_tnode2 stonith_external_ssh_1 -inf: tnode2
location loc_stonith_external_ssh_2_tnode1 stonith_external_ssh_2 -inf: tnode1
colocation col_cl_controld_1_cl_clvmd_1 inf: cl_clvmd_1 cl_controld_1
colocation col_ms_drbd_1_cl_controld_1 inf: cl_controld_1 ms_drbd_1:Master
order ord_cl_controld_1_cl_clvmd_1 inf: cl_controld_1 cl_clvmd_1
order ord_ms_drbd_1_cl_controld_1 inf: ms_drbd_1:promote cl_controld_1:start
property $id="cib-bootstrap-options" \
        expected-quorum-votes="2" \
        stonith-timeout="30" \
        dc-version="1.1.5-ecb6baaf7fc091b023d6d4ba7e0fce26d32cf5c8" \
        no-quorum-policy="ignore" \
        cluster-infrastructure="openais" \
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110920/270f4941/attachment-0002.html>


More information about the Pacemaker mailing list