[Pacemaker] Pacemaker cannot start the failed master as a new slave?

Andreas Kurz andreas at hastexo.com
Mon Jul 9 22:08:47 UTC 2012


On 07/09/2012 06:11 AM, quanta wrote:
> Related thread:
> http://oss.clusterlabs.org/pipermail/pacemaker/2011-December/012499.html
> 
> I'm going to setup failover for MySQL replication (1 master and 1 slave)
> follow this guide:
> https://github.com/jayjanssen/Percona-Pacemaker-Resource-Agents/blob/master/doc/PRM-setup-guide.rst

and you also use the latest mysql RA from resource-agents github?

> 
> Here're the output of `crm configure show`:
> 
> node serving-6192 \
>     attributes p_mysql_mysql_master_IP="192.168.6.192"
> node svr184R-638.localdomain \
>     attributes p_mysql_mysql_master_IP="192.168.6.38"
> primitive p_mysql ocf:percona:mysql \
>     params config="/etc/my.cnf" pid="/var/run/mysqld/mysqld.pid"
> socket="/var/lib/mysql/mysql.sock" replication_user="repl"
> replication_passwd="x" test_user="test_user" test_passwd="x" \
>     op monitor interval="5s" role="Master" OCF_CHECK_LEVEL="1" \
>     op monitor interval="2s" role="Slave" timeout="30s"
> OCF_CHECK_LEVEL="1" \
>     op start interval="0" timeout="120s" \
>     op stop interval="0" timeout="120s"
> primitive writer_vip ocf:heartbeat:IPaddr2 \
>     params ip="192.168.6.8" cidr_netmask="32" \
>     op monitor interval="10s" \
>     meta is-managed="true"
> ms ms_MySQL p_mysql \
>     meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true" globally-unique="false"
> target-role="Master" is-managed="true"
> colocation writer_vip_on_master inf: writer_vip ms_MySQL:Master
> order ms_MySQL_promote_before_vip inf: ms_MySQL:promote writer_vip:start
> property $id="cib-bootstrap-options" \
>     dc-version="1.0.12-unknown" \
>     cluster-infrastructure="openais" \
>     expected-quorum-votes="2" \
>     no-quorum-policy="ignore" \
>     stonith-enabled="false" \
>     last-lrm-refresh="1341801689"
> property $id="mysql_replication" \
>     p_mysql_REPL_INFO="192.168.6.192|mysql-bin.000006|338"
> 
> `crm_mon`:
> 
> Last updated: Mon Jul  9 10:30:01 2012
> Stack: openais
> Current DC: serving-6192 - partition with quorum
> Version: 1.0.12-unknown
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
> 
> Online: [ serving-6192 svr184R-638.localdomain ]
> 
>  Master/Slave Set: ms_MySQL
>      Masters: [ serving-6192 ]
>      Slaves: [ svr184R-638.localdomain ]
> writer_vip    (ocf::heartbeat:IPaddr2):    Started serving-6192
> Editing `/etc/my.cnf` on the serving-6192 of wrong syntax to test
> failover and it's working fine:
> - svr184R-638.localdomain being promoted to become the master
> - writer_vip switch to svr184R-638.localdomain
> 
> Last updated: Mon Jul  9 10:35:57 2012
> Stack: openais
> Current DC: serving-6192 - partition with quorum
> Version: 1.0.12-unknown
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
> 
> Online: [ serving-6192 svr184R-638.localdomain ]
> 
>  Master/Slave Set: ms_MySQL
>      Masters: [ svr184R-638.localdomain ]
>      Stopped: [ p_mysql:0 ]
> writer_vip    (ocf::heartbeat:IPaddr2):    Started svr184R-638.localdomain
> 
> Failed actions:
>     p_mysql:0_monitor_5000 (node=serving-6192, call=15, rc=7,
> status=complete): not running
>     p_mysql:0_demote_0 (node=serving-6192, call=22, rc=7,
> status=complete): not running
>     p_mysql:0_start_0 (node=serving-6192, call=26, rc=-2, status=Timed
> Out): unknown exec error
> 
> Remove the wrong syntax from `/etc/my.cnf` on serving-6192, and restart
> corosync, what I would like to see is serving-6192 was started as a new
> slave but it doesn't:
> 
> Failed actions:
>     p_mysql:0_start_0 (node=serving-6192, call=4, rc=1,
> status=complete): unknown error
> 
> Here're snippet of the logs which I'm suspecting:
> 
> Jul 09 10:46:32 serving-6192 lrmd: [7321]: info: rsc:p_mysql:0:4: start
> Jul 09 10:46:32 serving-6192 lrmd: [7321]: info: RA output:
> (p_mysql:0:start:stderr) Error performing operation: The
> object/attribute does not exist
> 
> Jul 09 10:46:32 serving-6192 crm_attribute: [7420]: info: Invoked:
> /usr/sbin/crm_attribute -N serving-6192 -l reboot --name readable -v 0

Not enough logs ... at least for me ... to give more hints.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now


> 
> The strange thing is I can starting it manually:
> 
> export OCF_ROOT=/usr/lib/ocf
> export OCF_RESKEY_config="/etc/my.cnf"
> export OCF_RESKEY_pid="/var/run/mysqld/mysqld.pid"
> export OCF_RESKEY_socket="/var/lib/mysql/mysql.sock"
> export OCF_RESKEY_replication_user="repl"
> export OCF_RESKEY_replication_passwd="x"
> export OCF_RESKEY_max_slave_lag="60"
> export OCF_RESKEY_evict_outdated_slaves="false"
> export OCF_RESKEY_test_user="test_user"
> export OCF_RESKEY_test_passwd="x"
> 
> `sh -x /usr/lib/ocf/resource.d/percona/mysql start`: http://fpaste.org/RVGh/
> 
> Did I make something wrong?
> 
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 





-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120710/dcd81b64/attachment-0004.sig>


More information about the Pacemaker mailing list