[Pacemaker] [SOLVED] RE: Slave does not start after failover: Mysql circular replication and master-slave resources
Andreas Kurz
andreas at hastexo.com
Mon Dec 19 15:19:14 CET 2011
On 12/17/2011 10:51 AM, Attila Megyeri wrote:
> Hi all,
>
> For anyone interested.
> I finally made the mysql replication work. For some strange reason there were no [mysql] log entries at all neither in corosync.log nor in the syslog. After a couple of corosync restarts (?!) [mysql] RA debug/error entries started to show up.
> The issue was that the slave could not apply the binary logs due to some duplicate errors. I am not sure how this could happen, but the solution was to ignore the duplicate errors on the slaves, by adding the following line to the my.conf:
>
> slave-skip-errors = 1062
although you use different "auto-increment-offset" values?
>
> I hope this helps to some of you guys as well.
>
> P.S. Did anyone else notice missing mysql debug/info/error entries in corosync log as well?
There is no RA output/log in any of your syslogs? ... in absence of a
connected tty and no configured logd, logger should feed all logs to
syslog ... what is your distribution, any "fancy" syslog configuration?
Regards,
Andreas
--
Need help with Pacemaker?
http://www.hastexo.com/now
>
> Cheers,
> Attila
>
>
> -----Original Message-----
> From: Attila Megyeri [mailto:amegyeri at minerva-soft.com]
> Sent: 2011. december 16. 12:39
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] Slave does not start after failover: Mysql circular replication and master-slave resources
>
> Hi Andreas,
>
> The slave lag cannot be high, as the slave was restarted within 1-2 mins and there are no active users on the system yet.
> I did not find anything at all in the logs.
>
> I will doublecheck if the RA is the latest.
>
> Thanks,
>
> Attila
>
>
> -----Original Message-----
> From: Andreas Kurz [mailto:andreas at hastexo.com]
> Sent: 2011. december 16. 1:50
> To: pacemaker at oss.clusterlabs.org
> Subject: Re: [Pacemaker] Slave does not start after failover: Mysql circular replication and master-slave resources
>
> Hello Attila,
>
> ... see below ...
>
> On 12/15/2011 02:42 PM, Attila Megyeri wrote:
>> Hi All,
>>
>>
>>
>> Some time ago I exchanged a couple of posts with you here regarding
>> Mysql active-active HA.
>>
>> The best solution I found so far was the Mysql multi-master
>> replication, also referred to as circular replication.
>>
>>
>>
>> Basically I set up two nodes, both were capable of the master role,
>> and the changes were immediately propagated to the other node.
>>
>>
>>
>> But still I wanted to have a M/S approach, to have a RW master and a
>> RO slave - mainly because I prefer to have a signle master VIP where
>> my apps can connect to.
>>
>>
>>
>> (In the first approach I configured a two node clone, and the master
>> IP was always bound to one of the nodes)
>>
>>
>>
>> I applied the following configuration:
>>
>>
>>
>> node db1 \
>>
>> attributes IP="10.100.1.31" \
>>
>> attributes standby="off"
>> db2-log-file-db-mysql="mysql-bin.000021" db2-log-pos-db-mysql="40730"
>>
>> node db2 \
>>
>> attributes IP="10.100.1.32" \
>>
>> attributes standby="off"
>>
>> primitive db-ip-master ocf:heartbeat:IPaddr2 \
>>
>> params lvs_support="true" ip="10.100.1.30" cidr_netmask="8"
>> broadcast="10.255.255.255" \
>>
>> op monitor interval="20s" timeout="20s" \
>>
>> meta target-role="Started"
>>
>> primitive db-mysql ocf:heartbeat:mysql \
>>
>> params binary="/usr/bin/mysqld_safe" config="/etc/mysql/my.cnf"
>> datadir="/var/lib/mysql" user="mysql" pid="/var/run/mysqld/mysqld.pid"
>> socket="/var/run/mysqld/mysqld.sock" test_passwd="XXXXX"
>>
>> test_table="replicatest.connectioncheck" test_user="slave_user"
>> replication_user="slave_user" replication_passwd="XXXXX"
>> additional_parameters="--skip-slave-start" \
>>
>> op start interval="0" timeout="120s" \
>>
>> op stop interval="0" timeout="120s" \
>>
>> op monitor interval="30" timeout="30s" OCF_CHECK_LEVEL="1" \
>>
>> op promote interval="0" timeout="120" \
>>
>> op demote interval="0" timeout="120"
>>
>> ms db-ms-mysql db-mysql \
>>
>> meta notify="true" master-max="1" clone-max="2"
>> target-role="Started"
>>
>> colocation db-ip-with-master inf: db-ip-master db-ms-mysql:Master
>>
>> property $id="cib-bootstrap-options" \
>>
>> dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
>>
>> cluster-infrastructure="openais" \
>>
>> expected-quorum-votes="2" \
>>
>> stonith-enabled="false" \
>>
>> no-quorum-policy="ignore"
>>
>> rsc_defaults $id="rsc-options" \
>>
>> resource-stickiness="0"
>>
>>
>>
>>
>>
>> The setup works in the basic conditions:
>>
>> * After the "first" startup, nodes start up as slaves, and
>> shortly after, one of them is promoted to master.
>>
>> * Updates to the master are replicated properly to the slave.
>>
>> * Slave accepts updates, which is Wrong, but I can live with
>> this - I will allow connect to the Master VIP only.
>>
>> * If I stop the slave for some time, and re-start it, it will
>> catch up with the master shortly and get into sync.
>>
>>
>>
>> I have, however a serious issue:
>>
>> * If I stop the current master, the slave is promoted, accepts
>> RW queries, the Master IP is bound to it - ALL fine.
>>
>> * BUT - when I want to bring the other node online, it simply
>> shows: Stopped (not installed)
>>
>>
>>
>> Online: [ db1 db2 ]
>>
>>
>>
>> db-ip-master (ocf::heartbeat:IPaddr2): Started db1
>>
>> Master/Slave Set: db-ms-mysql [db-mysql]
>>
>> Masters: [ db1 ]
>>
>> Stopped: [ db-mysql:1 ]
>>
>>
>>
>> Node Attributes:
>>
>> * Node db1:
>>
>> + IP : 10.100.1.31
>>
>> + db2-log-file-db-mysql : mysql-bin.000021
>>
>> + db2-log-pos-db-mysql : 40730
>>
>> + master-db-mysql:0 : 3601
>>
>> * Node db2:
>>
>> + IP : 10.100.1.32
>>
>>
>>
>> Failed actions:
>>
>> db-mysql:0_monitor_30000 (node=db2, call=58, rc=5, status=complete):
>> not installed
>>
>
> Looking at the RA (latest from git) I'd say the problem is somewhere in the check_slave() function. Either the check for replication errors or for a too high slave lag ... though on both errors you should see the log. entries.
>
> Regards,
> Andreas
>
> --
> Need help with Pacemaker?
> http://www.hastexo.com/now
>
>
>>
>>
>>
>>
>> I checked the logs, and could not find a reason why the slave at db2
>> is not started.
>>
>> Any IDEA Anyone ?
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Attila
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 286 bytes
Desc: OpenPGP digital signature
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20111219/a45fb66e/attachment.sig>
More information about the Pacemaker
mailing list