[Pacemaker] Clones restart on node recovery
Andrew Beekhof
andrew at beekhof.net
Thu Jun 10 07:24:00 UTC 2010
On Wed, Jun 9, 2010 at 6:46 AM, jraditchkov at gmail.com
<jraditchkov at gmail.com> wrote:
> Hi hopefully osmeone can help. I have little experience with pacemaker
> and possibly I do something wrong.
>
> I have the follwoing design:
>
> Two hardware nodes
> Part of the services are 100% redundant on both nodes - we use Clones
> for them they are redundant and essential for the system to run at
> least one resource
> The rest of services must failover from one node to another (HB, DP,
> WEB) and are used as individual resources
>
> The configuration mostly works:
> 1. we can start the cluster; it initializes OK and services start - OK
> 2. when we bring node1 or node2 to standby services properly failover
> and within the clones only the active node is active - everything
> works great - OK
> 3. We experience a problem when we bring online the stadby node up.
> For some reason the all clones restart themselves rather than only the
> failed resources in the clone (although the order constraints are set
> to advisory).
Can you create a bug for this and include a hb_report of scenario 3 please?
>
> The system does not restart the resources which are not in clones,
> only the Clone sets.
> In the logs we see that resources are shuffled for no apparent reason
> between the host nodes which makes them restart:
> Move resource MYSQL:1 (Started node01 -> node02)
>
> We believe it should only start the resource on the new online node
> rather than restarting the Clone and movin the resource to the other
> node.
>
> Please, help. Is there something we are doing wrong conceptually?
>
> Below are some debugs of what is happening as well as our config file
> at the bottom.
>
>
> I. INITIAL STATE - both machines are online (OK)
> ================================================================================
> ============
> Last updated: Wed Jun 9 02:55:36 2010
> Stack: openais
> Current DC: node01 - partition with quorum
> Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7
> 2 Nodes configured, 2 expected votes
> 8 Resources configured.
> ============
>
> Online: [ node01 node02 ]
>
> HB (ocf::rs:MyRA): Started node01
> DP (ocf::rs:MyRA): Started node01
> WEB (ocf::rs:MyRA): Started node01
> Clone Set: MYSQL-CLONE
> Started: [ node02 node01 ]
> Clone Set: NDBD-CLONE
> Started: [ node02 node01 ]
> Clone Set: NDB_MGMD-CLONE
> Started: [ node02 node01 ]
> Clone Set: DN-CLONE
> Started: [ node02 node01 ]
> Clone Set: RS-CLONE
> Started: [ node02 node01 ]
>
>
>
>
>
> II. BRING SECOND NODE TO STANDBY - standby node2 (OK)
> ================================================================================
>
> Jun 9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource HB (Started node01)
> Jun 9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource DP (Started node01)
> Jun 9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource WEB (Started node01)
> Jun 9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Stop
> resource MYSQL:0 (node02)
> Jun 9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource MYSQL:1 (Started node01)
> Jun 9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Stop
> resource NDBD:0 (node02)
> Jun 9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource NDBD:1 (Started node01)
> Jun 9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Stop
> resource NDB_MGMD:0 (node02)
> Jun 9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource NDB_MGMD:1 (Started node01)
> Jun 9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Stop
> resource DN:0 (node02)
> Jun 9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource DN:1 (Started node01)
> Jun 9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Stop
> resource RS:0 (node02)
> Jun 9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource RS:1 (Started node01)
>
>
> ============
> Last updated: Wed Jun 9 04:20:28 2010
> Stack: openais
> Current DC: node01 - partition with quorum
> Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7
> 2 Nodes configured, 2 expected votes
> 8 Resources configured.
> ============
>
> Node node02: standby
> Online: [ node01 ]
>
> HB (ocf::rs:MyRA): Started node01
> DP (ocf::rs:MyRA): Started node01
> WEB (ocf::rs:MyRA): Started node01
> Clone Set: MYSQL-CLONE
> Started: [ node01 ]
> Stopped: [ MYSQL:0 ]
> Clone Set: NDBD-CLONE
> Started: [ node01 ]
> Stopped: [ NDBD:0 ]
> Clone Set: NDB_MGMD-CLONE
> Started: [ node01 ]
> Stopped: [ NDB_MGMD:0 ]
> Clone Set: DN-CLONE
> Started: [ node01 ]
> Stopped: [ DN:0 ]
> Clone Set: RS-CLONE
> Started: [ node01 ]
> Stopped: [ RS:0 ]
>
>
> III. online node2
> ================================================================================
> Jun 9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource HB (Started node01)
> Jun 9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource DP (Started node01)
> Jun 9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource WEB (Started node01)
> Jun 9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> MYSQL:0 (node01)
> Jun 9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Move
> resource MYSQL:1 (Started node01 -> node02)
> Jun 9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> NDBD:0 (node01)
> Jun 9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Move
> resource NDBD:1 (Started node01 -> node02)
> Jun 9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> NDB_MGMD:0 (node01)
> Jun 9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Move
> resource NDB_MGMD:1 (Started node01 -> node02)
> Jun 9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> DN:0 (node01)
> Jun 9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Move
> resource DN:1 (Started node01 -> node02)
> Jun 9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> RS:0 (node01)
> Jun 9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Move
> resource RS:1 (Started node01 -> node02)
> Jun 9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource HB (Started node01)
> Jun 9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource DP (Started node01)
> Jun 9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource WEB (Started node01)
> Jun 9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> MYSQL:0 (node02)
> Jun 9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> MYSQL:1 (node01)
> Jun 9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Move
> resource NDBD:0 (Started node01 -> node02)
> Jun 9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Move
> resource NDBD:1 (Started node02 -> node01)
> Jun 9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Move
> resource NDB_MGMD:0 (Started node01 -> node02)
> Jun 9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Move
> resource NDB_MGMD:1 (Started node02 -> node01)
> Jun 9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> DN:0 (node02)
> Jun 9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> DN:1 (node01)
> Jun 9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Stop
> resource RS:0 (node01)
> Jun 9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Recover
> resource RS:1 (Started node02)
> Jun 9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource HB (Started node01)
> Jun 9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource DP (Started node01)
> Jun 9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource WEB (Started node01)
> Jun 9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> MYSQL:0 (node02)
> Jun 9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> MYSQL:1 (node01)
> Jun 9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource NDBD:0 (Started node02)
> Jun 9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource NDBD:1 (Started node01)
> Jun 9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource NDB_MGMD:0 (Started node02)
> Jun 9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource NDB_MGMD:1 (Started node01)
> Jun 9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> DN:0 (node02)
> Jun 9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> DN:1 (node01)
> Jun 9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource RS:0 (Stopped)
> Jun 9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Recover
> resource RS:1 (Started node02)
> Jun 9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource HB (Started node01)
> Jun 9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource DP (Started node01)
> Jun 9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource WEB (Started node01)
> Jun 9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource MYSQL:0 (Started node02)
> Jun 9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource MYSQL:1 (Started node01)
> Jun 9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource NDBD:0 (Started node02)
> Jun 9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource NDBD:1 (Started node01)
> Jun 9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource NDB_MGMD:0 (Started node02)
> Jun 9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource NDB_MGMD:1 (Started node01)
> Jun 9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> DN:0 (node02)
> Jun 9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> DN:1 (node01)
> Jun 9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Start
> RS:0 (node02)
> Jun 9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
> resource RS:1 (Stopped)
>
> ============
> Last updated: Wed Jun 9 04:31:27 2010
> Stack: openais
> Current DC: node01 - partition with quorum
> Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7
> 2 Nodes configured, 2 expected votes
> 8 Resources configured.
> ============
>
> Online: [ node01 node02 ]
>
> HB (ocf::rs:MyRA): Started node01
> DP (ocf::rs:MyRA): Started node01
> WEB (ocf::rs:MyRA): Started node01
> Clone Set: MYSQL-CLONE
> Started: [ node02 node01 ]
> Clone Set: NDBD-CLONE
> Started: [ node02 node01 ]
> Clone Set: NDB_MGMD-CLONE
> Started: [ node02 node01 ]
> Clone Set: DN-CLONE
> Started: [ node02 node01 ]
> Clone Set: RS-CLONE
> Started: [ node02 ]
> Stopped: [ RS:1 ]
>
> Failed actions:
> RS:0_start_0 (node=node01, call=29, rc=1, status=complete): unknown error
> RS:1_monitor_10000 (node=node02, call=26, rc=-2, status=Timed
> Out): unknown exec error
>
>
>
>
> CONFIG FILE
> ==========================
> node node01
> node node02
>
> primitive HB ocf:rs:MyRA \
> params veid="105" \
> params proc="hbd" \
> op start interval="0" timeout="120" \
> op stop interval="0" timeout="120" \
> op monitor interval="10" timeout="10" depth="0"
>
> primitive DP ocf:rs:MyRA \
> params veid="106" \
> params proc="rsd" \
> op start interval="0" timeout="120" \
> op stop interval="0" timeout="120" \
> op monitor interval="10" timeout="10" depth="0"
>
> primitive MYSQL ocf:rs:MyRA \
> params veid="108" \
> params proc="mysqld" \
> op start interval="0" timeout="120" \
> op stop interval="0" timeout="120" \
> op monitor interval="10" timeout="10" depth="0"
>
> primitive NDBD ocf:rs:MyRA \
> params veid="102" \
> params proc="ndbd" \
> op start interval="0" timeout="7200" \
> op stop interval="0" timeout="300" \
> op monitor interval="10" timeout="10" depth="0"
>
> primitive NDB_MGMD ocf:rs:MyRA \
> params veid="101" \
> params proc="ndb_mgmd" \
> op start interval="0" timeout="120" \
> op stop interval="0" timeout="120" \
> op monitor interval="10" timeout="10" depth="0"
>
> primitive DN ocf:rs:MyRA \
> params veid="103" \
> params proc="dnd" \
> op start interval="0" timeout="120" \
> op stop interval="0" timeout="120" \
> op monitor interval="10" timeout="10" depth="0"
>
> primitive RS ocf:rs:MyRA \
> params veid="104" \
> params proc="rsd" \
> op start interval="0" timeout="120" \
> op stop interval="0" timeout="120" \
> op monitor interval="10" timeout="10" depth="0"
>
> primitive WEB ocf:rs:MyRA \
> params veid="107" \
> params proc="httpd" \
> op start interval="0" timeout="120" \
> op stop interval="0" timeout="120" \
> op monitor interval="10" timeout="10" depth="0"
>
> clone MYSQL-CLONE MYSQL \
> meta interleave="true"
> clone NDBD-CLONE NDBD \
> meta interleave="true"
> clone NDB_MGMD-CLONE NDB_MGMD \
> meta interleave="true"
> clone DN-CLONE DN \
> meta interleave="true"
> clone RS-CLONE RS \
> meta interleave="true"
>
> location HB_LOC_1 HB 200: node01
> location DP_LOC_1 DP 200: node01
> location WEB_LOC_1 WEB 200: node01
> location HB_LOC_2 HB 100: node02
> location DP_LOC_2 DP 100: node02
> location WEB_LOC_2 WEB 100: node02
>
> order NDB_MGMD-CLONE_before_NDBD-CLONE advisory: NDB_MGMD-CLONE NDBD-CLONE
> order NDBD-CLONE_before_MYSQL-CLONE advisory: NDBD-CLONE MYSQL-CLONE
> order MYSQL_CLONE_before_HB advisory: MYSQL-CLONE HB
> order MYSQL-CLONE_before_DN-CLONE advisory: MYSQL-CLONE DN-CLONE
> order MYSQL-CLONE_before_WEB advisory: MYSQL-CLONE WEB
> order HB_before_DP advisory: HB DP
> order HB_before_RS-CLONE advisory: HB RS-CLONE
>
> property $id="cib-bootstrap-options" \
> dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1273876473"
>
> rsc_defaults $id="rsc-options" \
> resource-stickiness="0"
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
More information about the Pacemaker
mailing list