[Pacemaker] Question about Pacemaker master/slave and mysql replication

Wed Aug 17 22:36:36 CET 2011

Thanks very much for the link.  The percona mysql script does pretty much
exactly what I need in regards to master/slave promotion/demotion.  I did
run into a couple of issues revolving around virtual IPs and how they should
follow the master/slave(s) around.

The problem that I'm now running into has to do with the grouping of my vip
for writer/reader.  I'm trying to get it so that the vip_writer will stay
with the Master sql server and the reader will stay with the Slaves.  I'll
have N slaves and I only care about having 1 virtual IP for the slaves.  I
don't care which slave pacemaker picks to have the vip, it just needs to
pick one.  I've tried to setup some colocation types to map them together
but I get into states where the Master sql server will be sitting with the
reader_vip.  I've included my cib output and what the crm_mon is reporting
my current status to be.  As you can see, the sql master and writer_vip are
not lined up.

Basically, I now have a cib that looks like this (crm configure show):

node $id="0d6be727-1552-4028-ad8a-cf54b2766da0" three \
attributes IP="172.17.0.130" standby="off"
node $id="7deca2cd-9a64-476c-8ea2-372bca859a4f" four \
attributes IP="172.17.0.131" standby="off"
node $id="bb15cdbc-8bec-4f64-83bb-8bbd6d4ca1a7" seven \
 attributes IP="172.17.0.134" standby="off"
primitive p_sql ocf:percona:MySQL_replication \
params reader_vip_prefix="reader_vip_"
ms_replication_resource_name="ms_novaSQL" master_log_file="mysql-bin.000038"
master_log_pos="106" promoted_coordinates="::" master_host="172.17.0.131" \
 params super_db_user="root" super_db_password="nova" \
params repl_db_user="novaSlave" repl_db_password="nova" allowed_sbm="10" \
 params state_file="/var/run/heartbeat/novaSQL.state"
recover_file="/var/run/heartbeat/novaSQL.recovery" \
params p_replication_resource_name="p_sql" \
 params heartbeat_table="ocf.heartbeat" \
op monitor interval="10s" role="Master" \
 op monitor interval="10s" role="Slave"
primitive reader_vip_1 ocf:heartbeat:IPaddr2 \
params ip="172.17.0.97" nic="eth0" \
 meta target-role="Started"
primitive writer_vip ocf:heartbeat:IPaddr2 \
 params ip="172.17.0.96" nic="eth0" \
 meta target-role="Started"
ms ms_novaSQL p_sql \
meta master-max="1" master-node-max="1" clone-max="3" clone-node-max="1"
target-role="Master" notify="false" globally-unique="false"
colocation reader_vip_coloc_slave inf: ms_novaSQL:Slave reader_vip_1
colocation writer_vip_coloc_master inf: ms_novaSQL:Master writer_vip
order order_writer_vip_after_master inf: ms_novaSQL:promote writer_vip:start
property $id="cib-bootstrap-options" \
dc-version="1.0.9-unknown" \
cluster-infrastructure="Heartbeat" \
 stonith-enabled="false" \
no-quorum-policy="ignore" \
 last-lrm-refresh="1313615481"

Along with the output from crm_mon.  You can see that the Master sql server
has the reader_vip and the slave has the writer_vip (which should be
reversed).

Node four (7deca2cd-9a64-476c-8ea2-372bca859a4f): online
        p_sql:1 (ocf::percona:MySQL_replication) Master
        reader_vip_1    (ocf::heartbeat:IPaddr2) Started
Node three (0d6be727-1552-4028-ad8a-cf54b2766da0): online
        p_sql:0 (ocf::percona:MySQL_replication) Slave
        writer_vip      (ocf::heartbeat:IPaddr2) Started
Node seven (bb15cdbc-8bec-4f64-83bb-8bbd6d4ca1a7): online
        p_sql:2 (ocf::percona:MySQL_replication) Slave

As always, any ideas/suggestions are appreciated.

-Mike.

On Mon, Aug 15, 2011 at 1:04 PM, Viacheslav Biriukov <v.v.biriukov at gmail.com
> wrote:

> Hello.
> Check it out https://code.launchpad.net/percona-prm.
> And presentation:
> http://www.percona.com/files/presentations/percona-live/nyc-2011/PerconaLiveNYC2011-MySQL-High-Availability-with-Pacemaker.pdf
>
> 2011/8/15 Michael Szilagyi <mszilagyi at gmail.com>
>
>> I'm already using the mysql RA file from
>> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/mysql(which also seems to have replication support in it).
>>
>> Basically what seems to be happening is that pacemaker detects that the
>> master has dropped and promotes a slave up to master.  However, it is not
>> properly reconfiguring the slaves with a CHANGE MASTER TO.  I can see some
>> lines in the ocf file that relate to changing master but it isn't setting it
>> up properly.  If I login to the slave and issue a CHANGE MASTER TO ... /
>> START SLAVE then replication will start up normally again.
>>
>> Since I can see that the script does allow the master host to get set when
>> going through set/unset_master I'm hoping it's just something I'm missing
>> and not a limitation of using Pacemaker to manage the sql replication
>> cluster.
>>
>> Hopefully someone can point me at what I am missing.
>>
>> -Mike.
>>
>> On Mon, Aug 15, 2011 at 1:56 AM, Dan Frincu <df.cluster at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> On Sat, Aug 13, 2011 at 2:53 AM, Michael Szilagyi <mszilagyi at gmail.com>
>>> wrote:
>>> > I'm new to Pacemaker and trying to understand exactly what it can and
>>> can't
>>> > do.
>>> > I currently have a small, mysql master/slave cluster setup that is
>>> getting
>>> > monitored within Heartbeat/Pacemaker:  What I'd like to be able to do
>>> (and
>>> > am hoping Pacemaker will do) is to have 1 node designated as Master and
>>> in
>>> > the event of a failure, automatically promote a slave to master and
>>> realign
>>> > all of the existing slaves to be slaves of the newly promoted master.
>>> >  Currently what seems to be happening, however, is heartbeat correctly
>>> sees
>>> > that a node goes down and pacemaker promotes it up to master but the
>>> > replication is not adjusted so that it is now feeding everyone else.
>>>  It
>>> > seems like this should be possible to do from within Pacemaker but I
>>> feel
>>> > like I'm missing a part of the puzzle.  Any suggestions would be
>>> > appreciated.
>>>
>>> You could try the mysql RA => from
>>> https://github.com/fghaas/resource-agents/blob/master/heartbeat/mysql
>>> Last I heard, it had replication support.
>>>
>>> HTH.
>>>
>>> >
>>> > Here's an output of my crm configure show:
>>> > node $id="7deca2cd-9a64-476c-8ea2-372bca859a4f" four \
>>> > attributes 172.17.0.130-log-file-p_sql="mysql-bin.000013"
>>> > 172.17.0.130-log-pos-p_sql="632"
>>> > node $id="9b355ab7-8c81-485c-8dcd-1facedde5d03" three \
>>> > attributes 172.17.0.131-log-file-p_sql="mysql-bin.000020"
>>> > 172.17.0.131-log-pos-p_sql="106"
>>> > primitive p_sql ocf:heartbeat:mysql \
>>> > params config="/etc/mysql/my.cnf" binary="/usr/bin/mysqld_safe"
>>> > datadir="/var/lib/mysql" \
>>> > params pid="/var/lib/mysql/novaSQL.pid"
>>> socket="/var/run/mysqld/mysqld.sock"
>>> > \
>>> > params max_slave_lag="120" \
>>> > params replication_user="novaSlave" replication_passwd="nova" \
>>> > params additional_parameters="--skip-external-locking
>>> > --relay-log=novaSQL-relay-bin --relay-log-index=relay-bin.index
>>> > --relay-log-info-file=relay-bin.info" \
>>> > op start interval="0" timeout="120" \
>>> > op stop interval="0" timeout="120" \
>>> > op promote interval="0" timeout="120" \
>>> > op demote interval="0" timeout="120" \
>>> > op monitor interval="10" role="Master" timeout="30" \
>>> > op monitor interval="30" role="Slave" timeout="30"
>>> > primitive p_sqlIP ocf:heartbeat:IPaddr2 \
>>> > params ip="172.17.0.96" \
>>> > op monitor interval="10s"
>>> > ms ms_sql p_sql \
>>> > meta target-role="Started" is-managed="true"
>>> > location l_sqlMaster p_sqlIP 10: three
>>> > location l_sqlSlave1 p_sqlIP 5: four
>>> > property $id="cib-bootstrap-options" \
>>> > dc-version="1.0.9-unknown" \
>>> > cluster-infrastructure="Heartbeat" \
>>> > stonith-enabled="false" \
>>> > no-quorum-policy="ignore" \
>>> > last-lrm-refresh="1313187103"
>>> >
>>> > Thanks!
>>> > -Mike.
>>> > _______________________________________________
>>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> >
>>> > Project Home: http://www.clusterlabs.org
>>> > Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> > Bugs:
>>> >
>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Dan Frincu
>>> CCNA, RHCE
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs:
>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>>
>
>
> --
> Viacheslav Biriukov
> BR
> http://biriukov.com
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20110817/919b865a/attachment-0001.html>