<div dir="ltr"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">1. install the resource related packages on node3 even though you never want<br>them to run there. This will allow the resource-agents to verify the resource<br>is in fact inactive.</blockquote><div><br></div><div>Thanks, your advise helped: I installed all the services at node3 as well (including DRBD, but without it configs) and stopped+disabled them. Then I added the following line to my configuration:</div><div><br></div><div>location loc_drbd drbd rule -inf: #uname eq node3<br></div><div><br></div><div>So node3 is never a target for DRBD, and this helped: &quot;crm nodr standby node1&quot; doesn&#39;t tries to use node3 anymore.</div><div><br></div><div>But I have another (related) issue. If some node (e.g. node1) becomes isolated from other 2 nodes, how to force it to shutdown its services? I cannot use IPMB-based fencing/stonith, because there are no reliable connections between nodes at all (the nodes are in geo-distributed datacenters), and IPMI call to shutdown a node from another node is impossible.</div><div><br></div><div>E.g. initially I have the following:</div><div><br></div><div><b># crm status</b></div><div><div>Online: [ node1 node2 node3 ]</div><div>Master/Slave Set: ms_drbd [drbd]<br></div><div>     Masters: [ node2 ]</div><div>     Slaves: [ node1 ]</div><div>Resource Group: server</div><div>     fs (ocf::heartbeat:Filesystem):    Started node2</div><div>     postgresql (lsb:postgresql):       Started node2</div><div>     bind9      (lsb:bind9):    Started node2</div><div>     nginx      (lsb:nginx):    Started node2</div></div><div><br></div><div>Then I turn on firewall on node2 to isolate it from the outside internet:</div><div><br></div><div><div><b>root@node2:~# iptables -A INPUT -p tcp --dport 22 -j ACCEPT</b></div><div><b>root@node2:~# </b><b>iptables -A OUTPUT -p tcp --sport 22 -j ACCEPT</b></div><div><b>root@node2:~# </b><b>iptables -A INPUT -i lo -j ACCEPT</b></div><div><b>root@node2:~# </b><b>iptables -A OUTPUT -o lo -j ACCEPT</b></div><div><b>root@node2:~# </b><b>iptables -P INPUT DROP; iptables -P OUTPUT DROP</b></div></div><div><br></div><div>Then I see that, although node2 clearly knows it&#39;s isolated (it doesn&#39;t see other 2 nodes and does not have quorum), it does not stop its services:</div><div><br></div><div><div><b>root@node2:~# crm status</b></div><div>Online: [ node2 ]<br></div><div>OFFLINE: [ node1 node3 ]</div><div>Master/Slave Set: ms_drbd [drbd]<br></div><div>     Masters: [ node2 ]</div><div>     Stopped: [ node1 node3 ]</div><div>Resource Group: server</div><div>     fs<span class="" style="white-space:pre">        </span>(ocf::heartbeat:Filesystem):<span class="" style="white-space:pre">        </span>Started node2</div><div>     postgresql<span class="" style="white-space:pre">        </span>(lsb:postgresql):<span class="" style="white-space:pre">        </span>Started node2</div><div>     bind9<span class="" style="white-space:pre">        </span>(lsb:bind9):<span class="" style="white-space:pre">        </span>Started node2</div><div>     nginx<span class="" style="white-space:pre">        </span>(lsb:nginx):<span class="" style="white-space:pre">        </span>Started node2</div></div><div><br></div><div>So is there a way to say pacemaker to shutdown nodes&#39; services when they become isolated?</div><div><br></div><div><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jan 12, 2015 at 8:25 PM, David Vossel <span dir="ltr">&lt;<a href="mailto:dvossel@redhat.com" target="_blank">dvossel@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class=""><div class="h5"><br>

<br>

----- Original Message -----<br>

&gt; Hello.<br>

&gt;<br>

&gt; I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and Node2 are<br>

&gt; DRBD master-slave, also they have a number of other services installed<br>

&gt; (postgresql, nginx, ...). Node3 is just a corosync node (for quorum), no<br>

&gt; DRBD/postgresql/... are installed at it, only corosync+pacemaker.<br>

&gt;<br>

&gt; But when I add resources to the cluster, a part of them are somehow moved to<br>

&gt; node3 and since then fail. Note than I have a &quot;colocation&quot; directive to<br>

&gt; place these resources to the DRBD master only and &quot;location&quot; with -inf for<br>

&gt; node3, but this does not help - why? How to make pacemaker not run anything<br>

&gt; at node3?<br>

&gt;<br>

&gt; All the resources are added in a single transaction: &quot;cat config.txt | crm -w<br>

&gt; -f- configure&quot; where config.txt contains directives and &quot;commit&quot; statement<br>

&gt; at the end.<br>

&gt;<br>

&gt; Below are &quot;crm status&quot; (error messages) and &quot;crm configure show&quot; outputs.<br>

&gt;<br>

&gt;<br>

&gt; root@node3:~# crm status<br>

&gt; Current DC: node2 (1017525950) - partition with quorum<br>

&gt; 3 Nodes configured<br>

&gt; 6 Resources configured<br>

&gt; Online: [ node1 node2 node3 ]<br>

&gt; Master/Slave Set: ms_drbd [drbd]<br>

&gt; Masters: [ node1 ]<br>

&gt; Slaves: [ node2 ]<br>

&gt; Resource Group: server<br>

&gt; fs (ocf::heartbeat:Filesystem): Started node1<br>

&gt; postgresql (lsb:postgresql): Started node3 FAILED<br>

&gt; bind9 (lsb:bind9): Started node3 FAILED<br>

&gt; nginx (lsb:nginx): Started node3 (unmanaged) FAILED<br>

&gt; Failed actions:<br>

&gt; drbd_monitor_0 (node=node3, call=744, rc=5, status=complete,<br>

&gt; last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not<br>

&gt; installed<br>

&gt; postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete,<br>

&gt; last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms): unknown<br>

&gt; error<br>

&gt; bind9_monitor_0 (node=node3, call=757, rc=1, status=complete,<br>

&gt; last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms): unknown<br>

&gt; error<br>

&gt; nginx_stop_0 (node=node3, call=767, rc=5, status=complete, last-rc-change=Mon<br>

&gt; Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not installed<br>

<br>

</div></div>Here&#39;s what is going on. Even when you say &quot;never run this resource on node3&quot;<br>

pacemaker is going to probe for the resource regardless on node3 just to verify<br>

the resource isn&#39;t running.<br>

<br>

The failures you are seeing &quot;monitor_0 failed&quot; indicate that pacemaker failed<br>

to be able to verify resources are running on node3 because the related<br>

packages for the resources are not installed. Given pacemaker&#39;s default<br>

behavior I&#39;d expect this.<br>

<br>

You have two options.<br>

<br>

1. install the resource related packages on node3 even though you never want<br>

them to run there. This will allow the resource-agents to verify the resource<br>

is in fact inactive.<br>

<br>

2. If you are using the current master branch of pacemaker, there&#39;s a new<br>

location constraint option called &#39;resource-discovery=always|never|exclusive&#39;.<br>

If you add the &#39;resource-discovery=never&#39; option to your location constraint<br>

that attempts to keep resources from node3, you&#39;ll avoid having pacemaker<br>

perform the &#39;monitor_0&#39; actions on node3 as well.<br>

<br>

-- Vossel<br>

<div><div class="h5"><br>

&gt;<br>

&gt; root@node3:~# crm configure show | cat<br>

&gt; node $id=&quot;1017525950&quot; node2<br>

&gt; node $id=&quot;13071578&quot; node3<br>

&gt; node $id=&quot;1760315215&quot; node1<br>

&gt; primitive drbd ocf:linbit:drbd \<br>

&gt; params drbd_resource=&quot;vlv&quot; \<br>

&gt; op start interval=&quot;0&quot; timeout=&quot;240&quot; \<br>

&gt; op stop interval=&quot;0&quot; timeout=&quot;120&quot;<br>

&gt; primitive fs ocf:heartbeat:Filesystem \<br>

&gt; params device=&quot;/dev/drbd0&quot; directory=&quot;/var/lib/vlv.drbd/root&quot;<br>

&gt; options=&quot;noatime,nodiratime&quot; fstype=&quot;xfs&quot; \<br>

&gt; op start interval=&quot;0&quot; timeout=&quot;300&quot; \<br>

&gt; op stop interval=&quot;0&quot; timeout=&quot;300&quot;<br>

&gt; primitive postgresql lsb:postgresql \<br>

&gt; op monitor interval=&quot;10&quot; timeout=&quot;60&quot; \<br>

&gt; op start interval=&quot;0&quot; timeout=&quot;60&quot; \<br>

&gt; op stop interval=&quot;0&quot; timeout=&quot;60&quot;<br>

&gt; primitive bind9 lsb:bind9 \<br>

&gt; op monitor interval=&quot;10&quot; timeout=&quot;60&quot; \<br>

&gt; op start interval=&quot;0&quot; timeout=&quot;60&quot; \<br>

&gt; op stop interval=&quot;0&quot; timeout=&quot;60&quot;<br>

&gt; primitive nginx lsb:nginx \<br>

&gt; op monitor interval=&quot;10&quot; timeout=&quot;60&quot; \<br>

&gt; op start interval=&quot;0&quot; timeout=&quot;60&quot; \<br>

&gt; op stop interval=&quot;0&quot; timeout=&quot;60&quot;<br>

&gt; group server fs postgresql bind9 nginx<br>

&gt; ms ms_drbd drbd meta master-max=&quot;1&quot; master-node-max=&quot;1&quot; clone-max=&quot;2&quot;<br>

&gt; clone-node-max=&quot;1&quot; notify=&quot;true&quot;<br>

&gt; location loc_server server rule $id=&quot;loc_server-rule&quot; -inf: #uname eq node3<br>

&gt; colocation col_server inf: server ms_drbd:Master<br>

&gt; order ord_server inf: ms_drbd:promote server:start<br>

&gt; property $id=&quot;cib-bootstrap-options&quot; \<br>

&gt; stonith-enabled=&quot;false&quot; \<br>

&gt; last-lrm-refresh=&quot;1421079189&quot; \<br>

&gt; maintenance-mode=&quot;false&quot;<br>

&gt;<br>

</div></div>&gt; _______________________________________________<br>

&gt; Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

&gt; <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

&gt;<br>

&gt; Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

&gt; Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

&gt; Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>

&gt;<br>

<br>

_______________________________________________<br>

Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>

</blockquote></div><br></div></div>