[Pacemaker] Failback problem with active/active cluster

Fri Mar 11 05:47:30 EST 2011

On Thu, Mar 10, 2011 at 1:50 PM, Charles KOPROWSKI <cko at audaxis.com> wrote:
> Hello,
>
> I set up a 2 nodes cluster (active/active) to build an http reverse
> proxy/firewall. There is one vip shared by both nodes and an apache instance
> running on each node.
>
> Here is the configuration :
>
> node lpa \
>        attributes standby="off"
> node lpb \
>        attributes standby="off"
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>        params ip="10.1.52.3" cidr_netmask="16" clusterip_hash="sourceip" \
>        op monitor interval="30s"
> primitive HttpProxy ocf:heartbeat:apache \
>        params configfile="/etc/apache2/apache2.conf" \
>        op monitor interval="1min"
> clone HttpProxyClone HttpProxy
> clone ProxyIP ClusterIP \
>        meta globally-unique="true" clone-max="2" clone-node-max="2"
> colocation HttpProxy-with-ClusterIP inf: HttpProxyClone ProxyIP
> order HttpProxyClone-after-ProxyIP inf: ProxyIP HttpProxyClone
> property $id="cib-bootstrap-options" \
>        dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
>        cluster-infrastructure="openais" \
>        expected-quorum-votes="2" \
>        stonith-enabled="false" \
>        no-quorum-policy="ignore"
>
>
> Everything works fine at the beginning :
>
>
> Online: [ lpa lpb ]
>
>  Clone Set: ProxyIP (unique)
>     ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started lpa
>     ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started lpb
>  Clone Set: HttpProxyClone
>     Started: [ lpa lpb ]
>
>
> But after simulating an outage of one of the nodes with "crm node standby"
> and a recovery with "crm node online", all resources stay on the same node :
>
>
> Online: [ lpa lpb ]
>
>  Clone Set: ProxyIP (unique)
>     ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started lpa
>     ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started lpa
>  Clone Set: HttpProxyClone
>     Started: [ lpa ]
>     Stopped: [ HttpProxy:1 ]
>
>
> Can you tell me if something is wrong in my configuration ?

Essentially you have encountered a limitation in the allocation
algorithm for clones in 1.0.x
The recently released 1.1.5 has the behavior you're looking for, but
the patch is far too invasive to consider back-porting to 1.0.

>
> crm_verify give me the following output :
>
> crm_verify[22555]: 2011/03/10_13:49:00 ERROR: clone_rsc_order_lh: Cannot
> interleave clone ProxyIP and HttpProxyClone because they do not support the
> same number of resources per node
> crm_verify[22555]: 2011/03/10_13:49:00 ERROR: clone_rsc_order_lh: Cannot
> interleave clone HttpProxyClone and ProxyIP because they do not support the
> same number of resources per node
>
>
> Many thanks,
>
> Regards,
>
> --
> Charles KOPROWSKI
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>