[Pacemaker] Collocating resource with a started clone instance
Sergey Tachenov
stachenov at gmail.com
Sat Jun 23 13:56:54 UTC 2012
Well, I've got to the point where I understand what exactly doesn't
work. Now I only need to understand why, and fix it.
I created a test cluster to reproduce this issue. Here's its current state:
Online: [ node2 node1 ]
Clone Set: dummies [dummy]
Started: [ node2 ]
Stopped: [ dummy:1 ]
Master/Slave Set: ms_drbd_pgdrive [drbd_pgdrive]
Masters: [ node1 ]
Slaves: [ node2 ]
Resource Group: postgres
pgdrive_fs (ocf::heartbeat:Filesystem): Started node1
DBIP (ocf::heartbeat:IPaddr2): Started node1
postgresql (ocf::heartbeat:pgsql): Started node1
Failed actions:
dummy:1_start_0 (node=node1, call=54, rc=1, status=complete):
unknown error
Here's the config:
node node1
node node2
primitive DBIP ocf:heartbeat:IPaddr2 \
params ip="192.168.220.17" cidr_netmask="24" \
op monitor interval="10"
primitive drbd_pgdrive ocf:linbit:drbd \
params drbd_resource="pgdrive" \
op start interval="0" timeout="240" \
op stop interval="0" timeout="100" \
op monitor interval="15"
primitive dummy ocf:heartbeat:anything \
params binfile="/home/alqualos/dummy/dummy"
pidfile="/home/alqualos/dummy/dummy.pid" user="alqualos"
logfile="/home/alqualos/dummy/dummy.log" \
op monitor interval="10"
primitive pgdrive_fs ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/hd2" fstype="ext4" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60"
primitive postgresql ocf:heartbeat:pgsql \
params pgdata="/hd2/pgsql" \
op monitor interval="30" timeout="30" depth="0" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120"
group postgres pgdrive_fs DBIP postgresql
ms ms_drbd_pgdrive drbd_pgdrive \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
clone dummies dummy \
meta interleave="true"
colocation postgres_on_drbd inf: postgres ms_drbd_pgdrive:Master
order postgres_after_drbd inf: ms_drbd_pgdrive:promote postgres:start
property $id="cib-bootstrap-options" \
dc-version="1.1.7-2.fc16-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
last-lrm-refresh="1340454087"
rsc_defaults $id="rsc-options" \
resource-stickiness="10"
Now, I'd like to get the whole postgres group (and, by consequence, the
DRBD Master as well) moved to node2 because dummy is stopped there.
It seems that the "shortest" way is to move the ms_drbd_pgdrive:Master
because everything else depends on it. If I try to add an explicit
location constraint for it, it works:
location drbd_on_node2 ms_drbd_pgdrive rule role=Master 500: #uname eq node2
shadow[test] # ptest -L -s
...
drbd_pgdrive:0 promotion score on node2: 10500
drbd_pgdrive:1 promotion score on node1: 10060
I have no idea where 10000 come from, but 60 is obviously derived from
stickiness, and 500 from the new rule. Obviously, the DRBD resource gets
promoted on the node2, and everything else follows. So far, so good.
Now, if I replace the location constraint with a collocation one, it fails:
colocation drbd_where_dummies 500: ms_drbd_pgdrive:Master dummies
shadow[test] # ptest -L -s
...
drbd_pgdrive:1 promotion score on node1: 10060
drbd_pgdrive:0 promotion score on node2: 10000
So the problem is that the collocation constraint doesn't affect DRBD
promotion for some reason, whether directly or indirectly via dependent
resources such as IP. Location constraints work in both ways, that's why
a ping test gets everything moved, but a collocation constraint doesn't.
Is there a roundabout way to implement a collocation constraint using a
location constraint, short of writing a custom RA that sets some
attribute (like pingd does) depending on whether the resource is running
or not? Maybe there's already such attribute that I'm unaware of?
More information about the Pacemaker
mailing list