[Pacemaker] Circular replication help needed - how to make sure VIP runs on same node with a healthy mysql

Sun Nov 6 13:41:43 EST 2011

Hi Florian,

First of all thanks for getting back to me . You will find my answers inline.

-----Original Message-----
From: Florian Haas [mailto:florian at hastexo.com] 
Sent: 2011. november 6. 15:34
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] Circular replication help needed - how to make sure VIP runs on same node with a healthy mysql

On 2011-11-05 18:20, Attila Megyeri wrote:
> Hello,
> 
>  
> 
> I am having a hard time configuring a relatively simple mysql environment.
> 
>  
> 
> What I'd like to achieve is:
> 
> *         One master and a slave, with replication
> 
> *         Relatively quick failover if the master node, or Mysql fail.
> 
>  
> 
> I tried the Mysql cluster approach, but seemed to be too slow, and to 
> many limitations (foreign keys, triggers, views, etc).
> 
>  
> 
> I decided to go with the Mysql replication.

You want a simple 2-node MySQL cluster with failover? Why not go with DRBD then, as everyone else would in that situation?

My reasons:
- I am using virtualization, and DRBD seemed to be to complex compared to a mysql replication.
- I had some experience with M/S mysql setups (it was available actually) and I thought that applying pacemaker with RAs I can make the automatic  failover easily.
- I tried mysql ndbcluster as well, but was not happy with the results - so came back to mysql replication.
Then I read the article on clusterlabs and liked the idea of having two circular masters and I thought this would be great - but I would need only one master / where I would assign a VIP, and failover would finally be easy as there is no need to track the binlog positions, etc. So basically I wanted to user M/M with a VIP which is collocated with an ACTIVE mysql instance. But with clone I cannot do this, so I chose ms resource...

> Tried to use the mysql RA from clusterlabs 3.9.2 - but no luck, 
> replicaiton simply did not work out.

Sorry to say this, but as a co-author of that agent I'll say that that's exactly the kind of feedback we strongly dislike, as it doesn't help us at all improving the agent, or its documentation. So,

- What were you trying to achieve?
- What was your configuration?
- What went wrong?
- What were you unable to fix?

Sorry if my post was too generic. I spent days trying to set it up and I failed / even though I never give up things easily.

My system is a virtualized Debian, with debian's mysql and pacemaker. I tried first with the pacemaker in the stable branch (1.0.9) then also from the backports (1.1.5).
As the RAs were very obsolete in 1.0.9 I installed them from the 3.9.2 tar.gz.

My intention was to "convert" my nicely working M/M circular replication into a M/S, just to make sure that writes will happen to the same node always.

After applying the ms resource, I was expecting that I will see an active master node ,and a slave node receiving logs from the master, and in case of master failure the slave would become the master, VIP would be assigned to the newly promoted masternode and the applications would not notice any difference. Unfortunately this never happened.

I had many issues, some of them I was able to resolve, but then I simply gave up. Some of the issues I had:
- monitoring was not working for mysql. No idea why. VIP was being checked every X seconds, but mysql was not. Then somehow this started to work.
- Corosync froze many times, only kill -9 helped.
- So far these issues were not RA related, I know. But then - when I finally had my mysql master and slave up and running, slave was not configured (by the RA) to get the binlogs. (I checked the mysql log, and there was simply no CHANGE MASTER ... command.) I saw some STOP / START SLAVE commands, some readonly on/off commands, but the save has never received anything from the master.
- The node attributes in the "crm configure" showed invalid binlog entries.

I just did not have any more time to spend with it / and you probably know how difficult it is to troubleshoot such issues, so I finally gave up, installed M/M circular replication and asked here for help :)

I have deleted my previous config so I cannot really copy them here, but I asked some folks on this list who had similar problems and the answers I got so far suggest me they weren't able to resolve their problems either.

For the DRBD based approach (which I would highly recommend), do consider taking a look at http://www.hastexo.com/content/mysql-high-availability-sprint-launch-pacemaker.
We'll be happy to provide you with the virtual images used in this tutorial, so you can set things up yourself in a cleanroom testing environment.

Cheers,
Florian

Cheers,

Attila