[Pacemaker] Infinite fail-count and migration-threshold after node fail-back
Pavlos Parissis
pavlos.parissis at gmail.com
Fri Nov 12 17:45:11 UTC 2010
On 11 November 2010 16:59, Dan Frincu <dfrincu at streamwide.ro> wrote:
[...snip...]
>
> <constraints>
> <rsc_location id="loc-1" rsc="Webserver" node="sles-1" score="200"/>
> <rsc_location id="loc-2" rsc="Webserver" node="sles-3" score="0"/>
> <rsc_location id="loc-3" rsc="Database" node="sles-2" score="200"/>
> <rsc_location id="loc-4" rsc="Database" node="sles-3" score="0"/>
> </constraints>
> Example 6.1. Example set of opt-in location constraints
>
> At the moment you have symmetric-cluster=false, you need to add
> location constraints in order to get your resources running.
> Below is my conf and it works as expected, pbx_service_01 starts on
> node-01 and never fails back, in case failed over to node-03 and
> node-01 is back on line, due to resource-stickiness="1000", but take a
> look at the score in location constraint, very low scores compared to
> 1000 - I could have also set it to inf
>
>
> Yes but you don't have groups defined in your setup, having groups means the
> score of each active resource is added.
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ch-advanced-resources.html#id2220530
>
> For example:
>
> root at cluster1:~# ptest -sL
> Allocation scores:
> group_color: all allocation score on cluster1: 0
> group_color: all allocation score on cluster2: -1000000
> group_color: virtual_ip_1 allocation score on cluster1: 1000
> group_color: virtual_ip_1 allocation score on cluster2: -1000000
> group_color: virtual_ip_2 allocation score on cluster1: 1000
> group_color: virtual_ip_2 allocation score on cluster2: 0
> group_color: Failover_Alert allocation score on cluster1: 1000
> group_color: Failover_Alert allocation score on cluster2: 0
> group_color: fs_home allocation score on cluster1: 1000
> group_color: fs_home allocation score on cluster2: 0
> group_color: fs_mysql allocation score on cluster1: 1000
> group_color: fs_mysql allocation score on cluster2: 0
> group_color: fs_storage allocation score on cluster1: 1000
> group_color: fs_storage allocation score on cluster2: 0
> group_color: httpd allocation score on cluster1: 1000
> group_color: httpd allocation score on cluster2: 0
> group_color: mysqld allocation score on cluster1: 1000
> group_color: mysqld allocation score on cluster2: 0
> clone_color: ms_drbd_home allocation score on cluster1: 9000
> clone_color: ms_drbd_home allocation score on cluster2: -1000000
> clone_color: drbd_home:0 allocation score on cluster1: 1100
> clone_color: drbd_home:0 allocation score on cluster2: 0
> clone_color: drbd_home:1 allocation score on cluster1: 0
> clone_color: drbd_home:1 allocation score on cluster2: 1100
> native_color: drbd_home:0 allocation score on cluster1: 1100
> native_color: drbd_home:0 allocation score on cluster2: 0
> native_color: drbd_home:1 allocation score on cluster1: -1000000
> native_color: drbd_home:1 allocation score on cluster2: 1100
> drbd_home:0 promotion score on cluster1: 18100
> drbd_home:1 promotion score on cluster2: -1000000
> clone_color: ms_drbd_mysql allocation score on cluster1: 10100
> clone_color: ms_drbd_mysql allocation score on cluster2: -1000000
> clone_color: drbd_mysql:0 allocation score on cluster1: 1100
> clone_color: drbd_mysql:0 allocation score on cluster2: 0
> clone_color: drbd_mysql:1 allocation score on cluster1: 0
> clone_color: drbd_mysql:1 allocation score on cluster2: 1100
> native_color: drbd_mysql:0 allocation score on cluster1: 1100
> native_color: drbd_mysql:0 allocation score on cluster2: 0
> native_color: drbd_mysql:1 allocation score on cluster1: -1000000
> native_color: drbd_mysql:1 allocation score on cluster2: 1100
> drbd_mysql:0 promotion score on cluster1: 20300
> drbd_mysql:1 promotion score on cluster2: -1000000
> clone_color: ms_drbd_storage allocation score on cluster1: 11200
> clone_color: ms_drbd_storage allocation score on cluster2: -1000000
> clone_color: drbd_storage:0 allocation score on cluster1: 1100
> clone_color: drbd_storage:0 allocation score on cluster2: 0
> clone_color: drbd_storage:1 allocation score on cluster1: 0
> clone_color: drbd_storage:1 allocation score on cluster2: 1100
> native_color: drbd_storage:0 allocation score on cluster1: 1100
> native_color: drbd_storage:0 allocation score on cluster2: 0
> native_color: drbd_storage:1 allocation score on cluster1: -1000000
> native_color: drbd_storage:1 allocation score on cluster2: 1100
> drbd_storage:0 promotion score on cluster1: 22500
> drbd_storage:1 promotion score on cluster2: -1000000
> native_color: virtual_ip_1 allocation score on cluster1: 12300
> native_color: virtual_ip_1 allocation score on cluster2: -1000000
> native_color: virtual_ip_2 allocation score on cluster1: 8000
> native_color: virtual_ip_2 allocation score on cluster2: -1000000
> native_color: Failover_Alert allocation score on cluster1: 7000
> native_color: Failover_Alert allocation score on cluster2: -1000000
> native_color: fs_home allocation score on cluster1: 6000
> native_color: fs_home allocation score on cluster2: -1000000
> native_color: fs_mysql allocation score on cluster1: 5000
> native_color: fs_mysql allocation score on cluster2: -1000000
> native_color: fs_storage allocation score on cluster1: 4000
> native_color: fs_storage allocation score on cluster2: -1000000
> native_color: mysqld allocation score on cluster1: 4000
> native_color: mysqld allocation score on cluster2: -1000000
> native_color: httpd allocation score on cluster1: 16000
> native_color: httpd allocation score on cluster2: -1000000
> drbd_home:0 promotion score on cluster1: 1000000
> drbd_home:1 promotion score on cluster2: -1000000
> drbd_mysql:0 promotion score on cluster1: 1000000
> drbd_mysql:1 promotion score on cluster2: -1000000
> drbd_storage:0 promotion score on cluster1: 1000000
> drbd_storage:1 promotion score on cluster2: -1000000
> clone_color: ping_gw_clone allocation score on cluster1: 0
> clone_color: ping_gw_clone allocation score on cluster2: 0
> clone_color: ping_gw:0 allocation score on cluster1: 1000
> clone_color: ping_gw:0 allocation score on cluster2: 0
> clone_color: ping_gw:1 allocation score on cluster1: 0
> clone_color: ping_gw:1 allocation score on cluster2: 1000
> native_color: ping_gw:0 allocation score on cluster1: 1000
> native_color: ping_gw:0 allocation score on cluster2: 0
> native_color: ping_gw:1 allocation score on cluster1: -1000000
> native_color: ping_gw:1 allocation score on cluster2: 1000
I have the same version as you have, I am using heartbeat although,
and I run your scenario in my systems.
I had the pbx_service_01 (which is a group) on node-01 and set that
node as standby using crm node standby node-01. The resource group and
the appropriate drbd ms resource failed
over to node-03. When I put node-01 back online by running crm node
online node-01, the pbx_service_01 resource group and the appropriate
drbd ms resource didn't fail back.
Below is my scores before the failover (the first column is a line
number) and you can also find my conf at the bottom
1 group_color: pbx_service_01 allocation score on node-01: 200
2 group_color: pbx_service_01 allocation score on node-03: 10
3 group_color: ip_01 allocation score on node-01: 1200
4 group_color: ip_01 allocation score on node-03: 10
5 group_color: fs_01 allocation score on node-01: 1000
6 group_color: fs_01 allocation score on node-03: 0
7 group_color: pbx_01 allocation score on node-01: 1000
8 group_color: pbx_01 allocation score on node-03: 0
9 group_color: sshd_01 allocation score on node-01: 1000
10 group_color: sshd_01 allocation score on node-03: 0
11 group_color: mailAlert-01 allocation score on node-01: 1000
12 group_color: mailAlert-01 allocation score on node-03: 0
13 native_color: ip_01 allocation score on node-01: 5200
14 native_color: ip_01 allocation score on node-03: 10
15 clone_color: ms-drbd_01 allocation score on node-01: 4100
16 clone_color: ms-drbd_01 allocation score on node-03: -1000000
17 clone_color: drbd_01:0 allocation score on node-01: 11100
18 clone_color: drbd_01:0 allocation score on node-03: 0
19 clone_color: drbd_01:1 allocation score on node-01: 100
20 clone_color: drbd_01:1 allocation score on node-03: 11000
21 native_color: drbd_01:0 allocation score on node-01: 11100
22 native_color: drbd_01:0 allocation score on node-03: 0
23 native_color: drbd_01:1 allocation score on node-01: -1000000
24 native_color: drbd_01:1 allocation score on node-03: 11000
25 drbd_01:0 promotion score on node-01: 18100
26 drbd_01:1 promotion score on node-03: -1000000
27 native_color: fs_01 allocation score on node-01: 15100
28 native_color: fs_01 allocation score on node-03: -1000000
29 native_color: pbx_01 allocation score on node-01: 3000
30 native_color: pbx_01 allocation score on node-03: -1000000
31 native_color: sshd_01 allocation score on node-01: 2000
32 native_color: sshd_01 allocation score on node-03: -1000000
33 native_color: mailAlert-01 allocation score on node-01: 1000
34 native_color: mailAlert-01 allocation score on node-03: -1000000
35 group_color: pbx_service_02 allocation score on node-02: 200
36 group_color: pbx_service_02 allocation score on node-03: 10
37 group_color: ip_02 allocation score on node-02: 1200
38 group_color: ip_02 allocation score on node-03: 10
39 group_color: fs_02 allocation score on node-02: 1000
40 group_color: fs_02 allocation score on node-03: 0
41 group_color: pbx_02 allocation score on node-02: 1000
42 group_color: pbx_02 allocation score on node-03: 0
43 group_color: sshd_02 allocation score on node-02: 1000
44 group_color: sshd_02 allocation score on node-03: 0
45 group_color: mailAlert-02 allocation score on node-02: 1000
46 group_color: mailAlert-02 allocation score on node-03: 0
47 native_color: ip_02 allocation score on node-02: 5200
48 native_color: ip_02 allocation score on node-03: 10
49 clone_color: ms-drbd_02 allocation score on node-02: 4100
50 clone_color: ms-drbd_02 allocation score on node-03: -1000000
51 clone_color: drbd_02:0 allocation score on node-02: 11100
52 clone_color: drbd_02:0 allocation score on node-03: 0
53 clone_color: drbd_02:1 allocation score on node-02: 100
54 clone_color: drbd_02:1 allocation score on node-03: 11000
55 native_color: drbd_02:0 allocation score on node-02: 11100
56 native_color: drbd_02:0 allocation score on node-03: 0
57 native_color: drbd_02:1 allocation score on node-02: -1000000
58 native_color: drbd_02:1 allocation score on node-03: 11000
59 drbd_02:0 promotion score on node-02: 18100
60 drbd_02:2 promotion score on none: 0
61 drbd_02:1 promotion score on node-03: -1000000
62 native_color: fs_02 allocation score on node-02: 15100
63 native_color: fs_02 allocation score on node-03: -1000000
64 native_color: pbx_02 allocation score on node-02: 3000
65 native_color: pbx_02 allocation score on node-03: -1000000
66 native_color: sshd_02 allocation score on node-02: 2000
67 native_color: sshd_02 allocation score on node-03: -1000000
68 native_color: mailAlert-02 allocation score on node-02: 1000
69 native_color: mailAlert-02 allocation score on node-03: -1000000
70 drbd_01:0 promotion score on node-01: 1000000
71 drbd_01:1 promotion score on node-03: -1000000
72 drbd_02:0 promotion score on node-02: 1000000
73 drbd_02:2 promotion score on none: 0
74 drbd_02:1 promotion score on node-03: -1000000
75 native_color: pdu allocation score on node-03: -1000000
76 native_color: pdu allocation score on node-02: -1000000
77 native_color: pdu allocation score on node-01: -1000000
node $id="059313ce-c6aa-4bd5-a4fb-4b781de6d98f" node-03
node $id="d791b1f5-9522-4c84-a66f-cd3d4e476b38" node-02
node $id="e388e797-21f4-4bbe-a588-93d12964b4d7" node-01 \
attributes standby="off"
primitive drbd_01 ocf:linbit:drbd \
params drbd_resource="drbd_resource_01" \
op monitor interval="30s" \
op start interval="0" timeout="240s" \
op stop interval="0" timeout="120s"
primitive drbd_02 ocf:linbit:drbd \
params drbd_resource="drbd_resource_02" \
op monitor interval="30s" \
op start interval="0" timeout="240s" \
op stop interval="0" timeout="120s"
primitive fs_01 ocf:heartbeat:Filesystem \
params device="/dev/drbd1" directory="/pbx_service_01" fstype="ext3" \
meta migration-threshold="3" failure-timeout="60" is-managed="true" \
op monitor interval="20s" timeout="40s" OCF_CHECK_LEVEL="20" \
op start interval="0" timeout="60s" \
op stop interval="0" timeout="60s"
primitive fs_02 ocf:heartbeat:Filesystem \
params device="/dev/drbd2" directory="/pbx_service_02" fstype="ext3" \
meta migration-threshold="3" failure-timeout="60" \
op monitor interval="20s" timeout="40s" OCF_CHECK_LEVEL="20" \
op start interval="0" timeout="60s" \
op stop interval="0" timeout="60s"
primitive ip_01 ocf:heartbeat:IPaddr2 \
params ip="192.168.78.10" nic="eth3" cidr_netmask="24"
broadcast="192.168.78.255" \
meta failure-timeout="120" migration-threshold="3" \
op monitor interval="5s"
primitive ip_02 ocf:heartbeat:IPaddr2 \
meta failure-timeout="120" migration-threshold="3" \
params ip="192.168.78.20" nic="eth3" cidr_netmask="24"
broadcast="192.168.78.255" \
op monitor interval="5s"
primitive mailAlert-01 ocf:heartbeat:MailTo \
params email="root" subject="[Zanadoo Clustet event] pbx_service_01" \
op monitor interval="2" timeout="10" \
op start interval="0" timeout="10" \
op stop interval="0" timeout="10"
primitive mailAlert-02 ocf:heartbeat:MailTo \
params email="root" subject="[Zanadoo Clustet event] pbx_service_02" \
op monitor interval="2" timeout="10" \
op start interval="0" timeout="10" \
op stop interval="0" timeout="10"
primitive pbx_01 lsb:znd-pbx_01 \
meta migration-threshold="3" failure-timeout="60" is-managed="true" \
op monitor interval="20s" timeout="20s" \
op start interval="0" timeout="60s" \
op stop interval="0" timeout="60s"
primitive pbx_02 lsb:znd-pbx_02 \
meta migration-threshold="3" failure-timeout="60" \
op monitor interval="20s" timeout="20s" \
op start interval="0" timeout="60s" \
op stop interval="0" timeout="60s"
primitive pdu stonith:external/rackpdu \
params community="empisteftiko"
names_oid=".1.3.6.1.4.1.318.1.1.4.4.2.1.4"
oid=".1.3.6.1.4.1.318.1.1.4.4.2.1.3"
hostlist="node-01,node-02,node-03" pduip="192.168.100.100"
stonith-timeout="30" \
op monitor interval="1m" timeout="60s" \
meta target-role="Stopped"
primitive sshd_01 lsb:znd-sshd-pbx_01 \
meta is-managed="true" \
op monitor on-fail="stop" interval="10m" \
op start interval="0" timeout="60s" on-fail="stop" \
op stop interval="0" timeout="60s" on-fail="stop"
primitive sshd_02 lsb:znd-sshd-pbx_02 \
op monitor on-fail="stop" interval="10m" \
op start interval="0" timeout="60s" on-fail="stop" \
op stop interval="0" timeout="60s" on-fail="stop" \
meta target-role="Started"
group pbx_service_01 ip_01 fs_01 pbx_01 sshd_01 mailAlert-01 \
meta target-role="Started"
group pbx_service_02 ip_02 fs_02 pbx_02 sshd_02 mailAlert-02 \
meta target-role="Started"
ms ms-drbd_01 drbd_01 \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Started"
is-managed="true"
ms ms-drbd_02 drbd_02 \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Started"
is-managed="true"
location PrimaryNode-drbd_01 ms-drbd_01 100: node-01
location PrimaryNode-drbd_02 ms-drbd_02 100: node-02
location PrimaryNode-pbx_service_01 pbx_service_01 200: node-01
location PrimaryNode-pbx_service_02 pbx_service_02 200: node-02
location SecondaryNode-drbd_01 ms-drbd_01 0: node-03
location SecondaryNode-drbd_02 ms-drbd_02 0: node-03
location SecondaryNode-pbx_service_01 pbx_service_01 10: node-03
location SecondaryNode-pbx_service_02 pbx_service_02 10: node-03
location fencing-on-node-01 pdu 1: node-01
location fencing-on-node-02 pdu 1: node-02
location fencing-on-node-03 pdu 1: node-03
colocation fs_01-on-drbd_01 inf: fs_01 ms-drbd_01:Master
colocation fs_02-on-drbd_02 inf: fs_02 ms-drbd_02:Master
order pbx_service_01-after-drbd_01 inf: ms-drbd_01:promote pbx_service_01:start
order pbx_service_02-after-drbd_02 inf: ms-drbd_02:promote pbx_service_02:start
property $id="cib-bootstrap-options" \
dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \
cluster-infrastructure="Heartbeat" \
symmetric-cluster="false" \
stonith-enabled="false" \
last-lrm-refresh="1289304946"
rsc_defaults $id="rsc-options" \
resource-stickiness="1000"
More information about the Pacemaker
mailing list