[Pacemaker] Pacemaker restart resources when node joins cluster after failback

Fri Mar 16 04:09:08 CET 2012

2012/3/6 José Alonso <jah at transtelco.net>:
> Hi all,
>
> I have 2 Debian nodes with heartbeat and pacemaker 1.1.6 installed, and
> almost everything is working fine, I have only apache configured for
> testing, when a node goes down the failover is done correctly, but there's a
> problem when a node failbacks.
>
> For example, let's say that Node1 has the lead on apache resource, then I
> reboot Node1, so Pacemaker detect it goes down, then apache is promoted to
> the Node2 and it keeps there running fine, that's fine, but when Node1
> recovers and joins the cluster again, apache is restarted on Node2 again.
>
> Anyone knows, why resources are restarted when a node rejoins a cluster ?

I suspect we think its running in both places and you're seeing our
automated recovery (stop it everywhere before choosing a new
location).
Logs?

>
> This is my pacemaker configuration:
>
> node $id="2ac5f37d-cd54-4932-92dc-418b4fd0e6e6" nodo2 \
> attributes standby="off"
> node $id="938594ef-839a-40bb-aa5e-5715622693b3" nodo1 \
> attributes standby="off"
> primitive apache2 lsb:apache2 \
> meta migration-threshold="1" failure-timeout="2" \
> op monitor interval="5s" resource-stickiness="INFINITY"
> primitive ip1 ocf:heartbeat:IPaddr2 \
> params ip="192.168.1.38" nic="eth0:0"
> primitive ip1arp ocf:heartbeat:SendArp \
> params ip="192.168.1.38" nic="eth0:0"
> group WebServices ip1 ip1arp apache2
> location cli-prefer-WebServices WebServices \
> rule $id="cli-prefer-rule-WebServices" inf: #uname eq nodo2
> colocation ip_with_arp inf: ip1 ip1arp
> colocation web_with_ip inf: apache2 ip1
> order arp_after_ip inf: ip1:start ip1arp:start
> order web_after_ip inf: ip1arp:start apache2:start
> property $id="cib-bootstrap-options" \
> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
> cluster-infrastructure="Heartbeat" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="INFINITY"
>
>
> This is what I see on crm_mon:
>
> 1-. Node1 and Node1 OK:
>
> Online: [ node1 node2 ]
>
> Resource Group: WebServices
> ip1 (ocf::heartbeat:IPaddr2): Started node1
> ip1arp (ocf::heartbeat:SendArp): Started node1
> apache2 (lsb:apache2): Started node1
>
>
> 2-. I reboot Node1 so Pacemaker promotes resources to Node2:
>
> Online: [ node2 ]
> OFFLINE: [node1]
>
> Resource Group: WebServices
> ip1 (ocf::heartbeat:IPaddr2): Started node2
> ip1arp (ocf::heartbeat:SendArp): Started node2
> apache2 (lsb:apache2): Started node2
>
>
> 3-. Node1 is online again and join the cluster, resources still on Node2:
>
> Online: [ node1 node2 ]
>
> Resource Group: WebServices
> ip1 (ocf::heartbeat:IPaddr2): Started node2
> ip1arp (ocf::heartbeat:SendArp): Started node2
> apache2 (lsb:apache2): Started node2
>
> 4-. But after some seconds, resources are stopped on Node2 and restarted
> again on the same Node2:
>
> Online: [ node1 node2 ]
>
> Resource Group: WebServices
> ip1 (ocf::heartbeat:IPaddr2): Started node2
> ip1arp (ocf::heartbeat:SendArp): Stopped
> apache2 (lsb:apache2): Stopped
>
>
> 5-. Resources restarted and still on Node2
>
> Online: [ node1 node2 ]
>
> Resource Group: WebServices
> ip1 (ocf::heartbeat:IPaddr2): Started node2
> ip1arp (ocf::heartbeat:SendArp): Started node2
> apache2 (lsb:apache2): Started node2
>
>
>
> Why resources were restarted on Node2 if they where running fine?
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>