[ClusterLabs] Installed Galera, now HAProxy won't start

Wed Mar 16 23:10:08 CET 2016

Sorry, folks, for being a pest here, but I'm finding the learning curve on this clustering stuff to be pretty steep.

I'm following the docs to set up a three-node Openstack Controller cluster. I got Pacemaker running and I had two resources, the virtual IP and HAProxy, up and running and I could move these resources to any of the three nodes. Success!

I then moved on to installing Galera.

The MariaDB engine started fine on 2 of the 3 nodes but refused to start on the third. After some digging and poking (and swearing), I found that HAProxy was listening on the virtual IP on the mySQL port, which prevented MariaDB from listening on that port. Makes sense. So I moved HAProxy to another node and started MariaDB on my third node and now I have a three-node Galera cluster.

But.

Now HAPRoxy won't start on any node. I imagine it's because MariaDB is already listening on the same IP:Port combination that Galera wants. (After all, HAProxy is supposed to proxy that IP:Port, right?) Unfortunately, I don't see anything useful in the HAProxy.log file so I don't really know what's wrong.

So.... thinking this through logically, it seems to me that the Openstack docs were wrong in telling me to configure MariaDB server to bind to all available ports (http://docs.openstack.org/ha-guide/controller-ha-galera-config.html, scroll to "Database Configuration," note that bind-address is 0.0.0.0.). If MariaDB binds to the virtual IP address, then HAProxy can't bind to that address and therefore won't start. Right?

Am I thinking correctly here, or is something else wrong with my setup? In general, I've found that the OpenStack documents tend to be right, but in this case my understanding of the concepts involved makes me wonder.

In any case, I'm having difficulty getting HAProxy and Galera running on the same nodes. My HAProxy config file is:

global
  chroot  /var/lib/haproxy
  daemon
  group  haproxy
  maxconn  4000
  pidfile  /var/run/haproxy.pid
  user  haproxy

defaults
  log  global
  maxconn  4000
  option  redispatch
  retries  3
  timeout  http-request 10s
  timeout  queue 1m
  timeout  connect 10s
  timeout  client 1m
  timeout  server 1m
  timeout  check 10s

listen galera_cluster
  bind 10.0.0.10:3306
  balance  source
  option  httpchk
  server controller1 10.0.0.11:3306 check port 9200 inter 2000 rise 2 fall 5
  server controller2 10.0.0.12:3306 backup check port 9200 inter 2000 rise 2 fall 5
  server controller3 10.0.0.13:3306 backup check port 9200 inter 2000 rise 2 fall 5

Does the server name under "listen galera_cluster" need to match the hostname of the node? What else could be causing these two daemons to not play nicely together?

Thanks!

-Matthew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://clusterlabs.org/pipermail/users/attachments/20160316/9a758f64/attachment.html>