Hi Andreas,<div><br></div><div>I did what you advised me, but still having the same issue. I made the resource migration and then removed that "cli-pref .." part, but when the Node that went down return and joins again the cluster, resources are still being restarted again.</div>
<div><br></div><div>Best regards,</div><div>Jose</div><div><br></div><div><div class="gmail_quote"><br></div></div><div><div class="gmail_quote">On Tue, Mar 6, 2012 at 1:29 AM, Andreas Kurz <span dir="ltr"><<a href="mailto:andreas@hastexo.com" target="_blank">andreas@hastexo.com</a>></span> wrote:</div>
<div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hello,<br>
<div><div><br>
On 03/05/2012 08:58 PM, José Alonso wrote:<br>
> Hi all,<br>
><br>
> I have 2 Debian nodes with heartbeat and pacemaker 1.1.6 installed, and<br>
> almost everything is working fine, I have only apache configured for<br>
> testing, when a node goes down the failover is done correctly, but<br>
> there's a problem when a node failbacks.<br>
><br>
> For example, let's say that Node1 has the lead on apache resource, then<br>
> I reboot Node1, so Pacemaker detect it goes down, then apache is<br>
> promoted to the Node2 and it keeps there running fine, that's fine, but<br>
> when Node1 recovers and joins the cluster again, apache is restarted on<br>
> Node2 again.<br>
><br>
> Anyone knows, why resources are restarted when a node rejoins a cluster ?<br>
><br>
> This is my pacemaker configuration:<br>
><br>
> node $id="2ac5f37d-cd54-4932-92dc-418b4fd0e6e6" nodo2 \<br>
> attributes standby="off"<br>
> node $id="938594ef-839a-40bb-aa5e-5715622693b3" nodo1 \<br>
> attributes standby="off"<br>
> primitive apache2 lsb:apache2 \<br>
> meta migration-threshold="1" failure-timeout="2" \<br>
> op monitor interval="5s" resource-stickiness="INFINITY"<br>
> primitive ip1 ocf:heartbeat:IPaddr2 \<br>
> params ip="192.168.1.38" nic="eth0:0"<br>
> primitive ip1arp ocf:heartbeat:SendArp \<br>
> params ip="192.168.1.38" nic="eth0:0"<br>
> group WebServices ip1 ip1arp apache2<br>
> location cli-prefer-WebServices WebServices \<br>
> rule $id="cli-prefer-rule-WebServices" inf: #uname eq nodo2<br>
<br>
</div></div>remove that migration constraint ("cli-prefer-....") and try again ...<br>
best practice is to remove such a constraint immediately after the<br>
resource migration is completed.<br>
<br>
Regards,<br>
Andreas<br>
<br>
--<br>
Need help with Pacemaker?<br>
<a href="http://www.hastexo.com/now" target="_blank">http://www.hastexo.com/now</a><br>
<div><div><br>
<br>
<br>
> colocation ip_with_arp inf: ip1 ip1arp<br>
> colocation web_with_ip inf: apache2 ip1<br>
> order arp_after_ip inf: ip1:start ip1arp:start<br>
> order web_after_ip inf: ip1arp:start apache2:start<br>
> property $id="cib-bootstrap-options" \<br>
> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \<br>
> cluster-infrastructure="Heartbeat" \<br>
> expected-quorum-votes="2" \<br>
> stonith-enabled="false" \<br>
> no-quorum-policy="ignore"<br>
> rsc_defaults $id="rsc-options" \<br>
> resource-stickiness="INFINITY"<br>
><br>
><br>
> This is what I see on crm_mon:<br>
><br>
> 1-. Node1 and Node1 OK:<br>
><br>
> Online: [ node1 node2 ]<br>
><br>
> Resource Group: WebServices<br>
> ip1 (ocf::heartbeat:IPaddr2): Started node1<br>
> ip1arp (ocf::heartbeat:SendArp): Started node1<br>
> apache2 (lsb:apache2): Started node1<br>
><br>
><br>
> 2-. I reboot Node1 so Pacemaker promotes resources to Node2:<br>
><br>
> Online: [ node2 ]<br>
> OFFLINE: [node1]<br>
><br>
> Resource Group: WebServices<br>
> ip1 (ocf::heartbeat:IPaddr2): Started node2<br>
> ip1arp (ocf::heartbeat:SendArp): Started node2<br>
> apache2 (lsb:apache2): Started node2<br>
><br>
><br>
> 3-. Node1 is online again and join the cluster, resources still on Node2:<br>
><br>
> Online: [ node1 node2 ]<br>
><br>
> Resource Group: WebServices<br>
> ip1 (ocf::heartbeat:IPaddr2): Started node2<br>
> ip1arp (ocf::heartbeat:SendArp): Started node2<br>
> apache2 (lsb:apache2): Started node2<br>
><br>
> 4-. But after some seconds, resources are stopped on Node2 and restarted<br>
> again on the same Node2:<br>
><br>
> Online: [ node1 node2 ]<br>
><br>
> Resource Group: WebServices<br>
> ip1 (ocf::heartbeat:IPaddr2): Started node2<br>
> ip1arp (ocf::heartbeat:SendArp): Stopped<br>
> apache2 (lsb:apache2): Stopped<br>
><br>
><br>
> 5-. Resources restarted and still on Node2<br>
><br>
> Online: [ node1 node2 ]<br>
><br>
> Resource Group: WebServices<br>
> ip1 (ocf::heartbeat:IPaddr2): Started node2<br>
> ip1arp (ocf::heartbeat:SendArp): Started node2<br>
> apache2 (lsb:apache2): Started node2<br>
><br>
><br>
><br>
> Why resources were restarted on Node2 if they where running fine?<br>
><br>
><br>
</div></div>> This body part will be downloaded on demand.<br>
<br>
<br>
<br>
<br>_______________________________________________<br>
Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org" target="_blank">Pacemaker@oss.clusterlabs.org</a><br>
<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
<br></blockquote></div><br></div>