[Pacemaker] Pacemaker restart resources when node joins cluster after failback

Tue Mar 6 17:50:47 CET 2012

Hi Andreas,

I did what you advised me, but still having the same issue. I made the
resource migration and then removed that "cli-pref .." part, but when the
Node that went down return and joins again the cluster, resources are still
being restarted again.

Best regards,
Jose

On Tue, Mar 6, 2012 at 1:29 AM, Andreas Kurz <andreas at hastexo.com> wrote:

> Hello,
>
> On 03/05/2012 08:58 PM, José Alonso wrote:
> > Hi all,
> >
> > I have 2 Debian nodes with heartbeat and pacemaker 1.1.6 installed, and
> > almost everything is working fine, I have only apache configured for
> > testing, when a node goes down the failover is done correctly, but
> > there's a problem when a node failbacks.
> >
> > For example, let's say that Node1 has the lead on apache resource, then
> > I reboot Node1, so Pacemaker detect it goes down, then apache is
> > promoted to the Node2 and it keeps there running fine, that's fine, but
> > when Node1 recovers and joins the cluster again, apache is restarted on
> > Node2 again.
> >
> > Anyone knows, why resources are restarted when a node rejoins a cluster ?
> >
> > This is my pacemaker configuration:
> >
> > node $id="2ac5f37d-cd54-4932-92dc-418b4fd0e6e6" nodo2 \
> > attributes standby="off"
> > node $id="938594ef-839a-40bb-aa5e-5715622693b3" nodo1 \
> > attributes standby="off"
> > primitive apache2 lsb:apache2 \
> > meta migration-threshold="1" failure-timeout="2" \
> > op monitor interval="5s" resource-stickiness="INFINITY"
> > primitive ip1 ocf:heartbeat:IPaddr2 \
> > params ip="192.168.1.38" nic="eth0:0"
> > primitive ip1arp ocf:heartbeat:SendArp \
> > params ip="192.168.1.38" nic="eth0:0"
> > group WebServices ip1 ip1arp apache2
> > location cli-prefer-WebServices WebServices \
> > rule $id="cli-prefer-rule-WebServices" inf: #uname eq nodo2
>
> remove that migration constraint ("cli-prefer-....") and try again ...
> best practice is to remove such a constraint immediately after the
> resource migration is completed.
>
> Regards,
> Andreas
>
> --
> Need help with Pacemaker?
> http://www.hastexo.com/now
>
>
>
> > colocation ip_with_arp inf: ip1 ip1arp
> > colocation web_with_ip inf: apache2 ip1
> > order arp_after_ip inf: ip1:start ip1arp:start
> > order web_after_ip inf: ip1arp:start apache2:start
> > property $id="cib-bootstrap-options" \
> > dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
> > cluster-infrastructure="Heartbeat" \
> > expected-quorum-votes="2" \
> > stonith-enabled="false" \
> > no-quorum-policy="ignore"
> > rsc_defaults $id="rsc-options" \
> > resource-stickiness="INFINITY"
> >
> >
> > This is what I see on crm_mon:
> >
> > 1-. Node1 and Node1 OK:
> >
> > Online: [ node1 node2 ]
> >
> > Resource Group: WebServices
> > ip1 (ocf::heartbeat:IPaddr2): Started node1
> > ip1arp (ocf::heartbeat:SendArp): Started node1
> > apache2 (lsb:apache2): Started node1
> >
> >
> > 2-. I reboot Node1 so Pacemaker promotes resources to Node2:
> >
> > Online: [ node2 ]
> > OFFLINE: [node1]
> >
> > Resource Group: WebServices
> > ip1 (ocf::heartbeat:IPaddr2): Started node2
> > ip1arp (ocf::heartbeat:SendArp): Started node2
> > apache2 (lsb:apache2): Started node2
> >
> >
> > 3-. Node1 is online again and join the cluster, resources still on Node2:
> >
> > Online: [ node1 node2 ]
> >
> > Resource Group: WebServices
> > ip1 (ocf::heartbeat:IPaddr2): Started node2
> > ip1arp (ocf::heartbeat:SendArp): Started node2
> > apache2 (lsb:apache2): Started node2
> >
> > 4-. But after some seconds, resources are stopped on Node2 and restarted
> > again on the same Node2:
> >
> > Online: [ node1 node2 ]
> >
> > Resource Group: WebServices
> > ip1 (ocf::heartbeat:IPaddr2): Started node2
> > ip1arp (ocf::heartbeat:SendArp): Stopped
> > apache2 (lsb:apache2): Stopped
> >
> >
> > 5-. Resources restarted and still on Node2
> >
> > Online: [ node1 node2 ]
> >
> > Resource Group: WebServices
> > ip1 (ocf::heartbeat:IPaddr2): Started node2
> > ip1arp (ocf::heartbeat:SendArp): Started node2
> > apache2 (lsb:apache2): Started node2
> >
> >
> >
> > Why resources were restarted on Node2 if they where running fine?
> >
> >
> > This body part will be downloaded on demand.
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20120306/5df00897/attachment-0001.html>