[Pacemaker] Trouble with "Failed application of an update diff"

Mon Jun 30 13:25:45 CEST 2014

Thank you, Andrew!
You were right, removing that rule helped me.

2014-06-27 10:08 GMT+04:00 Andrew Beekhof <andrew at beekhof.net>:

>
> On 10 Jun 2014, at 10:44 pm, Виталий Туровец <corebug at corebug.net> wrote:
>
> > Hello there again!
> > Here you are: http://pastebin.com/bUaNQHs1
> > It's also identical on both nodes.
> > Thank you!
> >
> >
> > 2014-06-10 3:20 GMT+03:00 Andrew Beekhof <andrew at beekhof.net>:
> >
> > On 9 Jun 2014, at 11:01 pm, Виталий Туровец <corebug at corebug.net> wrote:
> >
> > > Hello there again, people!
> > >
> > > After upgrading both nodes to such SW versions:
> > >
> > > pacemaker.x86_64       1.1.10-14.el6_5.3
> > > pacemaker-cli.x86_64   1.1.10-14.el6_5.3
> > > pacemaker-cluster-libs.x86_64
> > > pacemaker-libs.x86_64  1.1.10-14.el6_5.3
> > > corosync.x86_64        1.4.1-17.el6_5.1 @updates
> > > corosynclib.x86_64     1.4.1-17.el6_5.1 @updates
> > >
> > > I am still facing the same problem: slave in Master/Slave set of MySQL
> won't start.
> > > Master actually works correctly.
> > > Output of cibadmin -Q on both nodes is identical.
> > >
> > > And here's the log of what happens when i try to do "cleanup
> MySQL_MasterSlave": http://pastebin.com/J90NuyEX.
> > > By now i have MySQL slave running in manual mode, but this definitely
> is not what i'm trying to achieve using Pacemaker.
> > > Can anyone help with this?
>
> Um, I see:
>
>  location cli-standby-MySQL_MasterSlave MySQL_MasterSlave \
>          rule $id="cli-standby-rule-MySQL_MasterSlave" -inf: #uname eq
> wb-db1
>
> which tells pacemaker that the MySQL_MasterSlave resource isn't allowed on
> wb-db1.
> Thats why only one instance is being started and promoted to master.
>
>
> > > Again, my pacemaker configuration:
> >
> > Can you provide the 'cibadmin -Ql' output instead?
> > We need the status section in order to comment.
> >
> > >
> > > node wb-db1 \
> > >         attributes standby=off
> > > node wb-db2 \
> > >         attributes standby=off
> > > primitive ClusterIP IPaddr2 \
> > >         params ip=10.0.1.68 cidr_netmask=32 nic=bond0.100 \
> > >         op monitor interval=30s \
> > >         meta target-role=Started
> > > primitive MySQL mysql \
> > >         params binary="/usr/bin/mysqld_safe" enable_creation=1
> replication_user=slave_user replication_passwd=here_goes_the_password
> datadir="/var/lib/mysql/db" socket="/var/run/mysqld/mysqld.sock"
> config="/etc/my.cnf" reader_attribute=readerOK evict_outdated_slaves=false
> max_slave_lag=600 \
> > >         op monitor interval=30s \
> > >         op monitor interval=35s role=Master OCF_CHECK_LEVEL=1 \
> > >         op monitor interval=60s role=Slave timeout=60s
> OCF_CHECK_LEVEL=1 \
> > >         op notify interval=0 timeout=90 \
> > >         op start interval=0 timeout=120 \
> > >         op stop interval=0 timeout=120
> > > primitive MySQL_Reader_VIP IPaddr2 \
> > >         params ip=10.0.1.66 cidr_netmask=32 nic=bond0.100 \
> > >         meta target-role=Started
> > > primitive ping-gateway ocf:pacemaker:ping \
> > >         params host_list=10.0.1.1 multiplier=100 timeout=1 \
> > >         op monitor interval=10s timeout=20s
> > > primitive resMON ocf:pacemaker:ClusterMon \
> > >         op start interval=0 timeout=90s \
> > >         op stop interval=0 timeout=100s \
> > >         op monitor interval=10s timeout=30s \
> > >         params extra_options="--mail-prefix
> MainDB_Cluster_Notification --mail-from cluster-alarm at gmsu.ua --mail-to
> cluster-alarm at gmsu.ua --mail-host mx.gmsu.ua"
> > > ms MySQL_MasterSlave MySQL \
> > >         meta master-max=1 master-node-max=1 clone-max=2
> clone-node-max=1 notify=true globally-unique=false target-role=Started
> is-managed=true
> > > clone pingclone ping-gateway \
> > >         meta target-role=Started
> > > location No-MySQL_Reader_VIP MySQL_Reader_VIP \
> > >         rule $id="No-MySQL_Reader_VIP-rule" -inf: readerOK eq 0 or
> not_defined readerOK
> > > location cli-prefer-ClusterIP ClusterIP \
> > >         rule $id="cli-prefer-rule-ClusterIP" inf: #uname eq wb-db1
> > > location cli-standby-MySQL_MasterSlave MySQL_MasterSlave \
> > >         rule $id="cli-standby-rule-MySQL_MasterSlave" -inf: #uname eq
> wb-db1
> > > location resourceClusterIPwithping ClusterIP \
> > >         rule $id="resourceClusterIPwithping-rule" -inf: not_defined
> pingd or pingd lte 0
> > > colocation MySQL_Reader_VIP_dislike_ClusterIP -200: MySQL_Reader_VIP
> ClusterIP
> > > colocation MysqlMaster-with-ClusterIP inf: MySQL_MasterSlave:Master
> ClusterIP
> > > order MysqlMaster-after-ClusterIP inf: ClusterIP
> MySQL_MasterSlave:promote
> > > property cib-bootstrap-options: \
> > >         dc-version=1.1.10-14.el6_5.3-368c726 \
> > >         cluster-infrastructure="classic openais (with plugin)" \
> > >         expected-quorum-votes=2 \
> > >         no-quorum-policy=ignore \
> > >         stonith-enabled=false \
> > >         last-lrm-refresh=1402318675
> > > property mysql_replication: \
> > >         MySQL_REPL_INFO="wb-db2|mysql-bin.000126|107"
> > > rsc_defaults rsc-options: \
> > >         resource-stickiness=200
> > >
> > > Thank you!
> > >
> > >
> > > 2014-06-05 3:17 GMT+03:00 Andrew Beekhof <andrew at beekhof.net>:
> > >
> > > On 30 May 2014, at 6:32 pm, Виталий Туровец <corebug at corebug.net>
> wrote:
> > >
> > > > Hello there, people!
> > > > I am new to this list, so please excuse me if i'm posting to the
> wrong place.
> > > >
> > > > I've got a pacemaker cluster with such a configuration:
> http://pastebin.com/1SbWWh4n.
> > > >
> > > > Output of "crm status":
> > > > ============
> > > > Last updated: Fri May 30 11:22:59 2014
> > > > Last change: Thu May 29 03:22:38 2014 via crmd on wb-db2
> > > > Stack: openais
> > > > Current DC: wb-db2 - partition with quorum
> > > > Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
> > > > 2 Nodes configured, 2 expected votes
> > > > 7 Resources configured.
> > > > ============
> > > >
> > > > Online: [ wb-db2 wb-db1 ]
> > > >
> > > >  ClusterIP      (ocf::heartbeat:IPaddr2):       Started wb-db2
> > > >  MySQL_Reader_VIP       (ocf::heartbeat:IPaddr2):       Started
> wb-db2
> > > >  resMON (ocf::pacemaker:ClusterMon):    Started wb-db2
> > > >  Master/Slave Set: MySQL_MasterSlave [MySQL]
> > > >      Masters: [ wb-db2 ]
> > > >      Stopped: [ MySQL:1 ]
> > > >  Clone Set: pingclone [ping-gateway]
> > > >      Started: [ wb-db1 wb-db2 ]
> > > >
> > > > There was an unclean shutdown of a cluster and after that i've faced
> a problem that a slave of MySQL_MasterSlave resource does not come up.
> > > > When i try to do a "cleanup MySQL_MasterSlave" i see such thing in
> logs:
> > >
> > > Most of those errors are cosmetic and fixed in later versions.
> > >
> > > > Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
> > >
> > > It you can get to rhel 6.5 you'll have access to 1.1.10 where these
> are fixed.
> > >
> > > >
> > > > May 29 03:22:22 [4423] wb-db1       crmd:  warning:
> decode_transition_key:      Bad UUID (crm-resource-4819) in sscanf result
> (3) for 0:0:crm-resource-4819
> > > > May 29 03:22:22 [4423] wb-db1       crmd:  warning:
> decode_transition_key:      Bad UUID (crm-resource-4819) in sscanf result
> (3) for 0:0:crm-resource-4819
> > > > May 29 03:22:22 [4423] wb-db1       crmd:     info:
> ais_dispatch_message:       Membership 408: quorum retained
> > > > May 29 03:22:22 [4418] wb-db1        cib:     info:
> set_crm_log_level:  New log level: 3 0
> > > > May 29 03:22:38 [4421] wb-db1      attrd:   notice:
> attrd_ais_dispatch:         Update relayed from wb-db2
> > > > May 29 03:22:38 [4421] wb-db1      attrd:   notice:
> attrd_ais_dispatch:         Update relayed from wb-db2
> > > > May 29 03:22:38 [4418] wb-db1        cib:     info: apply_xml_diff:
>     Digest mis-match: expected 2f5bc3d7f673df3cf37f774211976d69, calculated
> b8a7adf0e34966242551556aab605286
> > > > May 29 03:22:38 [4418] wb-db1        cib:   notice:
> cib_process_diff:   Diff 0.243.4 -> 0.243.5 not applied to 0.243.4: Failed
> application of an update diff
> > > > May 29 03:22:38 [4418] wb-db1        cib:     info:
> cib_server_process_diff:    Requesting re-sync from peer
> > > > May 29 03:22:38 [4418] wb-db1        cib:   notice:
> cib_server_process_diff:    Not applying diff 0.243.4 -> 0.243.5 (sync in
> progress)
> > > > May 29 03:22:38 [4418] wb-db1        cib:     info:
> cib_replace_notify:         Replaced: -1.-1.-1 -> 0.243.5 from wb-db2
> > > > May 29 03:22:38 [4421] wb-db1      attrd:   notice:
> attrd_trigger_update:       Sending flush op to all hosts for: pingd (100)
> > > > May 29 03:22:38 [4421] wb-db1      attrd:   notice:
> attrd_trigger_update:       Sending flush op to all hosts for:
> probe_complete (true)
> > > > May 29 03:22:38 [4418] wb-db1        cib:     info:
> set_crm_log_level:  New log level: 3 0
> > > > May 29 03:22:38 [4418] wb-db1        cib:     info: apply_xml_diff:
>     Digest mis-match: expected 754ed3b1d999e34d93e0835b310fd98a, calculated
> c322686deb255936ab54e064c696b6b8
> > > > May 29 03:22:38 [4418] wb-db1        cib:   notice:
> cib_process_diff:   Diff 0.244.5 -> 0.244.6 not applied to 0.244.5: Failed
> application of an update diff
> > > > May 29 03:22:38 [4418] wb-db1        cib:     info:
> cib_server_process_diff:    Requesting re-sync from peer
> > > > May 29 03:22:38 [4423] wb-db1       crmd:     info: delete_resource:
>    Removing resource MySQL:0 for 4996_crm_resource (internal) on wb-db2
> > > > May 29 03:22:38 [4423] wb-db1       crmd:     info: notify_deleted:
>     Notifying 4996_crm_resource on wb-db2 that MySQL:0 was deleted
> > > > May 29 03:22:38 [4418] wb-db1        cib:   notice:
> cib_server_process_diff:    Not applying diff 0.244.5 -> 0.244.6 (sync in
> progress)
> > > > May 29 03:22:38 [4423] wb-db1       crmd:  warning:
> decode_transition_key:      Bad UUID (crm-resource-4996) in sscanf result
> (3) for 0:0:crm-resource-4996
> > > > May 29 03:22:38 [4418] wb-db1        cib:   notice:
> cib_server_process_diff:    Not applying diff 0.244.6 -> 0.244.7 (sync in
> progress)
> > > > May 29 03:22:38 [4418] wb-db1        cib:   notice:
> cib_server_process_diff:    Not applying diff 0.244.7 -> 0.244.8 (sync in
> progress)
> > > > May 29 03:22:38 [4418] wb-db1        cib:     info:
> cib_replace_notify:         Replaced: -1.-1.-1 -> 0.244.8 from wb-db2
> > > > May 29 03:22:38 [4421] wb-db1      attrd:   notice:
> attrd_trigger_update:       Sending flush op to all hosts for: pingd (100)
> > > > May 29 03:22:38 [4421] wb-db1      attrd:   notice:
> attrd_trigger_update:       Sending flush op to all hosts for:
> probe_complete (true)
> > > > May 29 03:22:38 [4423] wb-db1       crmd:   notice: do_lrm_invoke:
>    Not creating resource for a delete event: (null)
> > > > May 29 03:22:38 [4423] wb-db1       crmd:     info: notify_deleted:
>     Notifying 4996_crm_resource on wb-db2 that MySQL:1 was deleted
> > > > May 29 03:22:38 [4423] wb-db1       crmd:  warning:
> decode_transition_key:      Bad UUID (crm-resource-4996) in sscanf result
> (3) for 0:0:crm-resource-4996
> > > > May 29 03:22:38 [4423] wb-db1       crmd:  warning:
> decode_transition_key:      Bad UUID (crm-resource-4996) in sscanf result
> (3) for 0:0:crm-resource-4996
> > > > May 29 03:22:38 [4418] wb-db1        cib:     info:
> set_crm_log_level:  New log level: 3 0
> > > > May 29 03:22:38 [4423] wb-db1       crmd:     info:
> ais_dispatch_message:       Membership 408: quorum retained
> > > >
> > > > Here's the cibadmin -Q output from node that is alive:
> http://pastebin.com/aeqfTaCe
> > > > And here's the one from failed node: http://pastebin.com/ME2U5vjK
> > > > The question is: how do i somehow cleanup the things for
> master/slave resource MySQL_MasterSlave to start working properly?
> > > >
> > > > Thank you!
> > > >
> > > > --
> > > >
> > > >
> > > >
> > > >
> > > > ~~~
> > > > WBR,
> > > > Vitaliy Turovets
> > > > Lead Operations Engineer
> > > > Global Message Services Ukraine
> > > > +38(093)265-70-55
> > > > VITU-RIPE
> > > >
> > > > _______________________________________________
> > > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > >
> > > > Project Home: http://www.clusterlabs.org
> > > > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > > Bugs: http://bugs.clusterlabs.org
> > >
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> > >
> > >
> > >
> > >
> > > --
> > >
> > >
> > >
> > >
> > > ~~~
> > > WBR,
> > > Vitaliy Turovets
> > > Lead Operations Engineer
> > > Global Message Services Ukraine
> > > +38(093)265-70-55
> > > VITU-RIPE
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> >
> >
> >
> > --
> >
> >
> >
> >
> > ~~~
> > WBR,
> > Vitaliy Turovets
> > Lead Operations Engineer
> > Global Message Services Ukraine
> > +38(093)265-70-55
> > VITU-RIPE
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>

-- 

~~~
WBR,
Vitaliy Turovets
Lead Operations Engineer
Global Message Services
+38(093)265-70-55
VITU-RIPE
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140630/826043e0/attachment-0001.html>