[Pacemaker] Avoid one node from being a target for resources migration

Mon Jan 12 18:25:35 CET 2015

----- Original Message -----
> Hello.
> 
> I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and Node2 are
> DRBD master-slave, also they have a number of other services installed
> (postgresql, nginx, ...). Node3 is just a corosync node (for quorum), no
> DRBD/postgresql/... are installed at it, only corosync+pacemaker.
> 
> But when I add resources to the cluster, a part of them are somehow moved to
> node3 and since then fail. Note than I have a "colocation" directive to
> place these resources to the DRBD master only and "location" with -inf for
> node3, but this does not help - why? How to make pacemaker not run anything
> at node3?
> 
> All the resources are added in a single transaction: "cat config.txt | crm -w
> -f- configure" where config.txt contains directives and "commit" statement
> at the end.
> 
> Below are "crm status" (error messages) and "crm configure show" outputs.
> 
> 
> root at node3:~# crm status
> Current DC: node2 (1017525950) - partition with quorum
> 3 Nodes configured
> 6 Resources configured
> Online: [ node1 node2 node3 ]
> Master/Slave Set: ms_drbd [drbd]
> Masters: [ node1 ]
> Slaves: [ node2 ]
> Resource Group: server
> fs (ocf::heartbeat:Filesystem): Started node1
> postgresql (lsb:postgresql): Started node3 FAILED
> bind9 (lsb:bind9): Started node3 FAILED
> nginx (lsb:nginx): Started node3 (unmanaged) FAILED
> Failed actions:
> drbd_monitor_0 (node=node3, call=744, rc=5, status=complete,
> last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not
> installed
> postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete,
> last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms): unknown
> error
> bind9_monitor_0 (node=node3, call=757, rc=1, status=complete,
> last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms): unknown
> error
> nginx_stop_0 (node=node3, call=767, rc=5, status=complete, last-rc-change=Mon
> Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not installed

Here's what is going on. Even when you say "never run this resource on node3"
pacemaker is going to probe for the resource regardless on node3 just to verify
the resource isn't running.

The failures you are seeing "monitor_0 failed" indicate that pacemaker failed
to be able to verify resources are running on node3 because the related 
packages for the resources are not installed. Given pacemaker's default
behavior I'd expect this.

You have two options.

1. install the resource related packages on node3 even though you never want
them to run there. This will allow the resource-agents to verify the resource
is in fact inactive.

2. If you are using the current master branch of pacemaker, there's a new
location constraint option called 'resource-discovery=always|never|exclusive'.
If you add the 'resource-discovery=never' option to your location constraint
that attempts to keep resources from node3, you'll avoid having pacemaker
perform the 'monitor_0' actions on node3 as well.

-- Vossel

> 
> root at node3:~# crm configure show | cat
> node $id="1017525950" node2
> node $id="13071578" node3
> node $id="1760315215" node1
> primitive drbd ocf:linbit:drbd \
> params drbd_resource="vlv" \
> op start interval="0" timeout="240" \
> op stop interval="0" timeout="120"
> primitive fs ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" directory="/var/lib/vlv.drbd/root"
> options="noatime,nodiratime" fstype="xfs" \
> op start interval="0" timeout="300" \
> op stop interval="0" timeout="300"
> primitive postgresql lsb:postgresql \
> op monitor interval="10" timeout="60" \
> op start interval="0" timeout="60" \
> op stop interval="0" timeout="60"
> primitive bind9 lsb:bind9 \
> op monitor interval="10" timeout="60" \
> op start interval="0" timeout="60" \
> op stop interval="0" timeout="60"
> primitive nginx lsb:nginx \
> op monitor interval="10" timeout="60" \
> op start interval="0" timeout="60" \
> op stop interval="0" timeout="60"
> group server fs postgresql bind9 nginx
> ms ms_drbd drbd meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> location loc_server server rule $id="loc_server-rule" -inf: #uname eq node3
> colocation col_server inf: server ms_drbd:Master
> order ord_server inf: ms_drbd:promote server:start
> property $id="cib-bootstrap-options" \
> stonith-enabled="false" \
> last-lrm-refresh="1421079189" \
> maintenance-mode="false"
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>