[Pacemaker] Clones restart on node recovery

jraditchkov at gmail.com jraditchkov at gmail.com
Wed Jun 9 00:46:42 EDT 2010


Hi hopefully osmeone can help. I have little experience with pacemaker
and possibly I do something wrong.

I have the follwoing design:

Two hardware nodes
Part of the services are 100% redundant on both nodes - we use Clones
for them they are redundant and essential for the system to run at
least one resource
The rest of services must failover from one node to another (HB, DP,
WEB) and are used as individual resources

The configuration mostly works:
1. we can start the cluster; it initializes OK and services start - OK
2. when we bring node1 or node2 to standby services properly failover
and within the clones only the active node is active - everything
works great - OK
3. We experience a problem when we bring online the stadby node up.
For some reason the all clones restart themselves rather than only the
failed resources in the clone (although the order constraints are set
to advisory).

The system does not restart the resources which are not in clones,
only the Clone sets.
In the logs we see that resources are shuffled for no apparent reason
between the host nodes which makes them restart:
    Move resource MYSQL:1   (Started node01 -> node02)

We believe it should only start the resource on the new online node
rather than restarting the Clone and movin the resource to the other
node.

Please, help. Is there something we are doing wrong conceptually?

Below are some debugs of what is happening as well as our config file
at the bottom.


I. INITIAL STATE - both machines are online (OK)
================================================================================
============
Last updated: Wed Jun  9 02:55:36 2010
Stack: openais
Current DC: node01 - partition with quorum
Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7
2 Nodes configured, 2 expected votes
8 Resources configured.
============

Online: [ node01 node02 ]

HB      (ocf::rs:MyRA):   Started node01
DP      (ocf::rs:MyRA):   Started node01
WEB     (ocf::rs:MyRA):   Started node01
 Clone Set: MYSQL-CLONE
     Started: [ node02 node01 ]
 Clone Set: NDBD-CLONE
     Started: [ node02 node01 ]
 Clone Set: NDB_MGMD-CLONE
     Started: [ node02 node01 ]
 Clone Set: DN-CLONE
     Started: [ node02 node01 ]
 Clone Set: RS-CLONE
     Started: [ node02 node01 ]





II. BRING SECOND NODE TO STANDBY - standby node2 (OK)
================================================================================

Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource HB       (Started node01)
Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource DP       (Started node01)
Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource WEB      (Started node01)
Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Stop
resource MYSQL:0   (node02)
Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource MYSQL:1  (Started node01)
Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Stop
resource NDBD:0    (node02)
Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource NDBD:1   (Started node01)
Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Stop
resource NDB_MGMD:0        (node02)
Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource NDB_MGMD:1       (Started node01)
Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Stop
resource DN:0    (node02)
Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource DN:1   (Started node01)
Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Stop
resource RS:0     (node02)
Jun  9 04:19:03 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource RS:1    (Started node01)


============
Last updated: Wed Jun  9 04:20:28 2010
Stack: openais
Current DC: node01 - partition with quorum
Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7
2 Nodes configured, 2 expected votes
8 Resources configured.
============

Node node02: standby
Online: [ node01 ]

HB      (ocf::rs:MyRA):   Started node01
DP      (ocf::rs:MyRA):   Started node01
WEB     (ocf::rs:MyRA):   Started node01
 Clone Set: MYSQL-CLONE
     Started: [ node01 ]
     Stopped: [ MYSQL:0 ]
 Clone Set: NDBD-CLONE
     Started: [ node01 ]
     Stopped: [ NDBD:0 ]
 Clone Set: NDB_MGMD-CLONE
     Started: [ node01 ]
     Stopped: [ NDB_MGMD:0 ]
 Clone Set: DN-CLONE
     Started: [ node01 ]
     Stopped: [ DN:0 ]
 Clone Set: RS-CLONE
     Started: [ node01 ]
     Stopped: [ RS:0 ]


III. online node2
================================================================================
Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource HB       (Started node01)
Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource DP       (Started node01)
Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource WEB      (Started node01)
Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Start
MYSQL:0   (node01)
Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Move
resource MYSQL:1   (Started node01 -> node02)
Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Start
NDBD:0    (node01)
Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Move
resource NDBD:1    (Started node01 -> node02)
Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Start
NDB_MGMD:0        (node01)
Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Move
resource NDB_MGMD:1        (Started node01 -> node02)
Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Start
DN:0    (node01)
Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Move
resource DN:1    (Started node01 -> node02)
Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Start
RS:0     (node01)
Jun  9 04:23:33 bst-rs-01 pengine: [2261]: notice: LogActions: Move
resource RS:1     (Started node01 -> node02)
Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource HB       (Started node01)
Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource DP       (Started node01)
Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource WEB      (Started node01)
Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Start
MYSQL:0   (node02)
Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Start
MYSQL:1   (node01)
Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Move
resource NDBD:0    (Started node01 -> node02)
Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Move
resource NDBD:1    (Started node02 -> node01)
Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Move
resource NDB_MGMD:0        (Started node01 -> node02)
Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Move
resource NDB_MGMD:1        (Started node02 -> node01)
Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Start
DN:0    (node02)
Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Start
DN:1    (node01)
Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Stop
resource RS:0     (node01)
Jun  9 04:26:05 bst-rs-01 pengine: [2261]: notice: LogActions: Recover
resource RS:1  (Started node02)
Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource HB       (Started node01)
Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource DP       (Started node01)
Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource WEB      (Started node01)
Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Start
MYSQL:0   (node02)
Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Start
MYSQL:1   (node01)
Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource NDBD:0   (Started node02)
Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource NDBD:1   (Started node01)
Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource NDB_MGMD:0       (Started node02)
Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource NDB_MGMD:1       (Started node01)
Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Start
DN:0    (node02)
Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Start
DN:1    (node01)
Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource RS:0    (Stopped)
Jun  9 04:28:17 bst-rs-01 pengine: [2261]: notice: LogActions: Recover
resource RS:1  (Started node02)
Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource HB       (Started node01)
Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource DP       (Started node01)
Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource WEB      (Started node01)
Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource MYSQL:0  (Started node02)
Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource MYSQL:1  (Started node01)
Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource NDBD:0   (Started node02)
Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource NDBD:1   (Started node01)
Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource NDB_MGMD:0       (Started node02)
Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource NDB_MGMD:1       (Started node01)
Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Start
DN:0    (node02)
Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Start
DN:1    (node01)
Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Start
RS:0     (node02)
Jun  9 04:28:26 bst-rs-01 pengine: [2261]: notice: LogActions: Leave
resource RS:1    (Stopped)

============
Last updated: Wed Jun  9 04:31:27 2010
Stack: openais
Current DC: node01 - partition with quorum
Version: 1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7
2 Nodes configured, 2 expected votes
8 Resources configured.
============

Online: [ node01 node02 ]

HB      (ocf::rs:MyRA):   Started node01
DP      (ocf::rs:MyRA):   Started node01
WEB     (ocf::rs:MyRA):   Started node01
 Clone Set: MYSQL-CLONE
     Started: [ node02 node01 ]
 Clone Set: NDBD-CLONE
     Started: [ node02 node01 ]
 Clone Set: NDB_MGMD-CLONE
     Started: [ node02 node01 ]
 Clone Set: DN-CLONE
     Started: [ node02 node01 ]
 Clone Set: RS-CLONE
     Started: [ node02 ]
     Stopped: [ RS:1 ]

Failed actions:
    RS:0_start_0 (node=node01, call=29, rc=1, status=complete): unknown error
    RS:1_monitor_10000 (node=node02, call=26, rc=-2, status=Timed
Out): unknown exec error




CONFIG FILE
==========================
node node01
node node02

primitive HB ocf:rs:MyRA \
	params veid="105" \
	params proc="hbd" \
	op start interval="0" timeout="120" \
	op stop interval="0" timeout="120" \
	op monitor interval="10" timeout="10" depth="0"

primitive DP ocf:rs:MyRA \
	params veid="106" \
	params proc="rsd" \
	op start interval="0" timeout="120" \
	op stop interval="0" timeout="120" \
	op monitor interval="10" timeout="10" depth="0"

primitive MYSQL ocf:rs:MyRA \
	params veid="108" \
	params proc="mysqld" \
	op start interval="0" timeout="120" \
	op stop interval="0" timeout="120" \
	op monitor interval="10" timeout="10" depth="0"

primitive NDBD ocf:rs:MyRA \
	params veid="102" \
	params proc="ndbd" \
	op start interval="0" timeout="7200" \
	op stop interval="0" timeout="300" \
	op monitor interval="10" timeout="10" depth="0"

primitive NDB_MGMD ocf:rs:MyRA \
	params veid="101" \
	params proc="ndb_mgmd" \
	op start interval="0" timeout="120" \
	op stop interval="0" timeout="120" \
	op monitor interval="10" timeout="10" depth="0"

primitive DN ocf:rs:MyRA \
	params veid="103" \
	params proc="dnd" \
	op start interval="0" timeout="120" \
	op stop interval="0" timeout="120" \
	op monitor interval="10" timeout="10" depth="0"

primitive RS ocf:rs:MyRA \
	params veid="104" \
	params proc="rsd" \
	op start interval="0" timeout="120" \
	op stop interval="0" timeout="120" \
	op monitor interval="10" timeout="10" depth="0"

primitive WEB ocf:rs:MyRA \
	params veid="107" \
	params proc="httpd" \
	op start interval="0" timeout="120" \
	op stop interval="0" timeout="120" \
	op monitor interval="10" timeout="10" depth="0"

clone MYSQL-CLONE MYSQL \
	meta interleave="true"
clone NDBD-CLONE NDBD \
	meta interleave="true"
clone NDB_MGMD-CLONE NDB_MGMD \
	meta interleave="true"
clone DN-CLONE DN \
	meta interleave="true"
clone RS-CLONE RS \
	meta interleave="true"

location HB_LOC_1 HB 200: node01
location DP_LOC_1 DP 200: node01
location WEB_LOC_1 WEB 200: node01
location HB_LOC_2 HB 100: node02
location DP_LOC_2 DP 100: node02
location WEB_LOC_2 WEB 100: node02

order NDB_MGMD-CLONE_before_NDBD-CLONE advisory: NDB_MGMD-CLONE NDBD-CLONE
order NDBD-CLONE_before_MYSQL-CLONE advisory: NDBD-CLONE MYSQL-CLONE
order MYSQL_CLONE_before_HB advisory: MYSQL-CLONE HB
order MYSQL-CLONE_before_DN-CLONE advisory: MYSQL-CLONE DN-CLONE
order MYSQL-CLONE_before_WEB advisory: MYSQL-CLONE WEB
order HB_before_DP advisory: HB DP
order HB_before_RS-CLONE advisory: HB RS-CLONE

property $id="cib-bootstrap-options" \
	dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
	cluster-infrastructure="openais" \
	expected-quorum-votes="2" \
	stonith-enabled="false" \
	no-quorum-policy="ignore" \
	last-lrm-refresh="1273876473"

rsc_defaults $id="rsc-options" \
	resource-stickiness="0"




More information about the Pacemaker mailing list