[Pacemaker] load balancing in a 3-node cluster
Andrew Beekhof
andrew at beekhof.net
Thu Sep 29 07:06:24 UTC 2011
On Wed, Sep 28, 2011 at 8:52 AM, Mark Smith <mark at bumptechnologies.com> wrote:
> Hi all,
>
> Here at Bump we currently have our handset traffic routed through a
> single server. For obvious reasons, we want to expand this to
> multiple nodes for redundancy. The load balancer is doing two tasks:
> TLS termination and then directing traffic to one of our internal
> application servers.
>
> We want to split the single load balancer into an HA cluster. Our
> chosen solution involves creating one public facing VIP for each
> machine, and then floating those VIPs between the load balancer
> machines. Ideally there is one public IP per machine and we use DNS
> round robin to send traffic to the IPs.
>
> We considered having two nodes and floating a single VIP between them,
> the canonical heartbeat setup, but would prefer to avoid that because
> we know we're going to run into the situation where our TLS
> termination takes more CPU than we have available on a single node.
> Balancing across N nodes seems the most obvious way to address that.
>
> We have allocated three (3) nodes to our cluster. I want to run our
> design by this group and tell you our problems and see if anybody has
> some advice.
>
> * no-quorum-policy set to ignore. We would, ideally, like to have our
> cluster continue to operate even if we lose the majority of nodes.
> Even if we're in a CPU limited situation, it would be better to serve
> slowly than to drop 33% or 66% of our traffic on the floor because we
> lost quorum and the floating VIPs weren't migrated to the remaining
> nodes.
>
> * STONITH disabled. Originally I tried to enable this, but with the
> no-quorum-policy set to ignore, it seems to go on killing sprees.
Try no-quorum-policy=freeze instead.
> It
> has fenced healthy nodes for no reason I could determine:
>
> - "node standby lb1"
> * resources properly migrate to lb2, lb3
> * everything looks stable and correct
> - "node online lb1"
> * resources start migrating back to lb1
> * lb2 gets fenced! (why? it was healthy)
Did a stop action fail?
> * resources migrating off of lb2
>
> I have seen it double-fence, too, with lb1 being the only surviving
> node and lb2 and lb3 being unceremoniously rebooted. I'm not sure
> why. STONITH seems to be suboptimal (heh) in this particular set up.
>
> Anyway -- that means our configuration is very, very simple:
>
> node $id="65c71911-737e-4848-b7d7-897d0ede172a" patron
> node $id="b5f2fd18-acf1-4b25-a571-a0827e07188b" oldfashioned
> node $id="ef11cced-0062-411b-93dd-d03c2b8b198c" nattylight
> primitive cluster-monitor ocf:pacemaker:ClusterMon \
> params extra_options="--mail-to blah" htmlfile="blah" \
> meta target-role="Started"
> primitive floating_216 ocf:heartbeat:IPaddr \
> params ip="173.192.13.216" cidr_netmask="255.255.255.252" nic="eth1" \
> op monitor interval="60s" timeout="30s" \
> meta target-role="Started"
> primitive floating_217 ocf:heartbeat:IPaddr \
> params ip="173.192.13.217" cidr_netmask="255.255.255.252" nic="eth1" \
> op monitor interval="60s" timeout="30s" \
> meta target-role="Started"
> primitive floating_218 ocf:heartbeat:IPaddr \
> params ip="173.192.13.218" cidr_netmask="255.255.255.252" nic="eth1" \
> op monitor interval="60s" timeout="30s" \
> meta target-role="Started"
> property $id="cib-bootstrap-options" \
> dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
> cluster-infrastructure="Heartbeat" \
> stonith-enabled="false" \
> no-quorum-policy="ignore" \
> symmetric-cluster="true" \
> last-lrm-refresh="1317079926"
>
> Am I on the right track with this? Am I missing something obvious?
> Am I misapplying this tool to our problem and should I go in a
> different direction?
>
> In the real world, I would use ECMP (or something like that) between
> the router and my load balancers. However, I'm living in the world of
> managed server hosting (we're not quite big enough to colo) so I don't
> have that option. :-)
>
>
> --
> Mark Smith // Operations Lead
> mark at bumptechnologies.com
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
More information about the Pacemaker
mailing list