[Pacemaker] Pacemaker cannot start the failed master as a new slave?
Andreas Kurz
andreas at hastexo.com
Mon Jul 9 22:08:47 UTC 2012
On 07/09/2012 06:11 AM, quanta wrote:
> Related thread:
> http://oss.clusterlabs.org/pipermail/pacemaker/2011-December/012499.html
>
> I'm going to setup failover for MySQL replication (1 master and 1 slave)
> follow this guide:
> https://github.com/jayjanssen/Percona-Pacemaker-Resource-Agents/blob/master/doc/PRM-setup-guide.rst
and you also use the latest mysql RA from resource-agents github?
>
> Here're the output of `crm configure show`:
>
> node serving-6192 \
> attributes p_mysql_mysql_master_IP="192.168.6.192"
> node svr184R-638.localdomain \
> attributes p_mysql_mysql_master_IP="192.168.6.38"
> primitive p_mysql ocf:percona:mysql \
> params config="/etc/my.cnf" pid="/var/run/mysqld/mysqld.pid"
> socket="/var/lib/mysql/mysql.sock" replication_user="repl"
> replication_passwd="x" test_user="test_user" test_passwd="x" \
> op monitor interval="5s" role="Master" OCF_CHECK_LEVEL="1" \
> op monitor interval="2s" role="Slave" timeout="30s"
> OCF_CHECK_LEVEL="1" \
> op start interval="0" timeout="120s" \
> op stop interval="0" timeout="120s"
> primitive writer_vip ocf:heartbeat:IPaddr2 \
> params ip="192.168.6.8" cidr_netmask="32" \
> op monitor interval="10s" \
> meta is-managed="true"
> ms ms_MySQL p_mysql \
> meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true" globally-unique="false"
> target-role="Master" is-managed="true"
> colocation writer_vip_on_master inf: writer_vip ms_MySQL:Master
> order ms_MySQL_promote_before_vip inf: ms_MySQL:promote writer_vip:start
> property $id="cib-bootstrap-options" \
> dc-version="1.0.12-unknown" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> no-quorum-policy="ignore" \
> stonith-enabled="false" \
> last-lrm-refresh="1341801689"
> property $id="mysql_replication" \
> p_mysql_REPL_INFO="192.168.6.192|mysql-bin.000006|338"
>
> `crm_mon`:
>
> Last updated: Mon Jul 9 10:30:01 2012
> Stack: openais
> Current DC: serving-6192 - partition with quorum
> Version: 1.0.12-unknown
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
>
> Online: [ serving-6192 svr184R-638.localdomain ]
>
> Master/Slave Set: ms_MySQL
> Masters: [ serving-6192 ]
> Slaves: [ svr184R-638.localdomain ]
> writer_vip (ocf::heartbeat:IPaddr2): Started serving-6192
> Editing `/etc/my.cnf` on the serving-6192 of wrong syntax to test
> failover and it's working fine:
> - svr184R-638.localdomain being promoted to become the master
> - writer_vip switch to svr184R-638.localdomain
>
> Last updated: Mon Jul 9 10:35:57 2012
> Stack: openais
> Current DC: serving-6192 - partition with quorum
> Version: 1.0.12-unknown
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
>
> Online: [ serving-6192 svr184R-638.localdomain ]
>
> Master/Slave Set: ms_MySQL
> Masters: [ svr184R-638.localdomain ]
> Stopped: [ p_mysql:0 ]
> writer_vip (ocf::heartbeat:IPaddr2): Started svr184R-638.localdomain
>
> Failed actions:
> p_mysql:0_monitor_5000 (node=serving-6192, call=15, rc=7,
> status=complete): not running
> p_mysql:0_demote_0 (node=serving-6192, call=22, rc=7,
> status=complete): not running
> p_mysql:0_start_0 (node=serving-6192, call=26, rc=-2, status=Timed
> Out): unknown exec error
>
> Remove the wrong syntax from `/etc/my.cnf` on serving-6192, and restart
> corosync, what I would like to see is serving-6192 was started as a new
> slave but it doesn't:
>
> Failed actions:
> p_mysql:0_start_0 (node=serving-6192, call=4, rc=1,
> status=complete): unknown error
>
> Here're snippet of the logs which I'm suspecting:
>
> Jul 09 10:46:32 serving-6192 lrmd: [7321]: info: rsc:p_mysql:0:4: start
> Jul 09 10:46:32 serving-6192 lrmd: [7321]: info: RA output:
> (p_mysql:0:start:stderr) Error performing operation: The
> object/attribute does not exist
>
> Jul 09 10:46:32 serving-6192 crm_attribute: [7420]: info: Invoked:
> /usr/sbin/crm_attribute -N serving-6192 -l reboot --name readable -v 0
Not enough logs ... at least for me ... to give more hints.
Regards,
Andreas
--
Need help with Pacemaker?
http://www.hastexo.com/now
>
> The strange thing is I can starting it manually:
>
> export OCF_ROOT=/usr/lib/ocf
> export OCF_RESKEY_config="/etc/my.cnf"
> export OCF_RESKEY_pid="/var/run/mysqld/mysqld.pid"
> export OCF_RESKEY_socket="/var/lib/mysql/mysql.sock"
> export OCF_RESKEY_replication_user="repl"
> export OCF_RESKEY_replication_passwd="x"
> export OCF_RESKEY_max_slave_lag="60"
> export OCF_RESKEY_evict_outdated_slaves="false"
> export OCF_RESKEY_test_user="test_user"
> export OCF_RESKEY_test_passwd="x"
>
> `sh -x /usr/lib/ocf/resource.d/percona/mysql start`: http://fpaste.org/RVGh/
>
> Did I make something wrong?
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120710/dcd81b64/attachment-0004.sig>
More information about the Pacemaker
mailing list