[Pacemaker] Issue with clusterlab mysql ocf script
Raoul Bhatia [IPAX]
r.bhatia at ipax.at
Wed Oct 19 13:43:46 UTC 2011
we might need marek on this one because i did not implement
the master/slave logic and am not using it for multiple slaves.
marek, can you please comment on this?
thanks,
raoul
On 2011-08-29 23:08, Michael Szilagyi wrote:
> Did some more testing and figured I would add that even Slave resources
> rejoin the cluster as a Master role briefly before switching back to
> Slave. Of course, since the mysql RA uses event notification this still
> has the effect of unsetting all masters whenever a new node joins.
> Since a master role is possibly configured already, the pre-premote
> notification event doesn't get fired again and replication remains
> broken. It seems likely that I must be doing something wrong since this
> would be a pretty normal use case and completely breaks the mysql
> replication cluster.
>
> Thoughts anyone?
>
>
> On Fri, Aug 26, 2011 at 10:19 AM, Michael Szilagyi <mszilagyi at gmail.com
> <mailto:mszilagyi at gmail.com>> wrote:
>
> I'm having a problem with master/slave promotion using the most
> recent version of the mysql ocf script hosted off the
> clusterLabs/resource-agents github repo.
>
> The script works well failing over to a slave if a master looses
> connection with the cluster. However, when the master rejoins the
> cluster the script is doing some undesirable things. Basically, if
> the master looses connection (say I pull the network cable) then a
> new slave is promoted and the old master is just orphaned (which is
> fine, I don't have STONITH setup yet or anything). If i plug that
> machine's cable back in then the node rejoins the cluster and
> initially there are now two masters (the old, orphaned one and the
> newly promoted one). Pacemaker properly sees this and demotes the
> old master to a slave.
>
> After some time debugging the ocf I think what is happening is that
> the script sees the old master join and fires off a post-demote
> notification event for the returning master which causes a
> unset_master command to be executed. This causes all the slaves to
> remove their master connection info. However, since the other
> master server has already been promoted and is (to its mind) already
> replicating to the other slaves in the cluster, a new pre-promote is
> never fired which means that the slaves do not get a new CHANGE
> MASTER TO issued so I wind up with a broken replication setup.
>
> I'm not sure if I'm missing something in how this is supposed to be
> working or if this is a limitation of the script. It seems like
> there must be either a bug or something I've got setup wrong,
> however, since it's not all that unlikely that such a scenario could
> occur. If anyone has any ideas or suggestions on how the script is
> supposed to work (or what I may be doing wrong) I'd appreciate some
> ideas.
>
> I'll include the output of my crm configure show in case it'll be
> useful:
>
> node $id="a1a3266d-24e2-4d1b-bfd7-de3bac929661" seven \
> attributes 172.17.0.130-log-file-p_mysql="mysql-bin.000005"
> 172.17.0.130-log-pos-p_mysql="865"
> 172.17.0.131-log-file-p_mysql="mysql-bin.000038"
> 172.17.0.131-log-pos-p_mysql="607"
> four-log-file-p_mysql="mysql-bin.000040" four-log-pos-p_mysql="2150"
> node $id="cc0227a2-a7bc-4a0d-ba1b-f6ecb7e7d845" four \
> attributes 172.17.0.130-log-file-p_mysql="mysql-bin.000005"
> 172.17.0.130-log-pos-p_mysql="865"
> three-log-file-p_mysql="mysql-bin.000022" three-log-pos-p_mysql="106"
> node $id="d9d3c6cb-bf60-4468-926f-d9716e56fb0f" three \
> attributes 172.17.0.131-log-file-p_mysql="mysql-bin.000038"
> 172.17.0.131-log-pos-p_mysql="607" three-log-pos-p_mysql="4"
> primitive p_mysql ocf:heartbeat:mysql \
> params binary="/usr/sbin/mysqld" config="/etc/mysql/my.cnf" \
> params pid="/var/lib/mysql/mySQL.pid"
> socket="/var/run/mysqld/mysqld.sock" \
> params replication_user="sqlSlave" replication_passwd="slave" \
> params additional_parameters="--skip-slave-start" \
> op start interval="0" timeout="120" \
> op stop interval="0" timeout="120" \
> op promote interval="0" timeout="120" \
> op demote interval="0" timeout="120" \
> op monitor interval="5" role="Master" timeout="30" \
> op monitor interval="10" role="Slave" timeout="30"
> ms ms_mysql p_mysql \
> meta master-max="1" clone-max="3" target-role="Started"
> is-managed="true" notify="true" \
> meta target-role="Started"
> property $id="cib-bootstrap-options" \
> dc-version="1.0.9-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
> cluster-infrastructure="Heartbeat" \
> stonith-enabled="false" \
> last-lrm-refresh="1314307995"
>
> Thanks!
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc. email. r.bhatia at ipax.at
Technischer Leiter
IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
Barawitzkagasse 10/2/2/11 email. office at ipax.at
1190 Wien tel. +43 1 3670030
FN 277995t HG Wien fax. +43 1 3670030 15
____________________________________________________________________
More information about the Pacemaker
mailing list