[Pacemaker] Trouble with "Failed application of an update diff"

Wed Jun 4 20:17:41 EDT 2014

On 30 May 2014, at 6:32 pm, Виталий Туровец <corebug at corebug.net> wrote:

> Hello there, people!
> I am new to this list, so please excuse me if i'm posting to the wrong place.
> 
> I've got a pacemaker cluster with such a configuration: http://pastebin.com/1SbWWh4n.
> 
> Output of "crm status":
> ============
> Last updated: Fri May 30 11:22:59 2014
> Last change: Thu May 29 03:22:38 2014 via crmd on wb-db2
> Stack: openais
> Current DC: wb-db2 - partition with quorum
> Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
> 2 Nodes configured, 2 expected votes
> 7 Resources configured.
> ============
> 
> Online: [ wb-db2 wb-db1 ]
> 
>  ClusterIP      (ocf::heartbeat:IPaddr2):       Started wb-db2
>  MySQL_Reader_VIP       (ocf::heartbeat:IPaddr2):       Started wb-db2
>  resMON (ocf::pacemaker:ClusterMon):    Started wb-db2
>  Master/Slave Set: MySQL_MasterSlave [MySQL]
>      Masters: [ wb-db2 ]
>      Stopped: [ MySQL:1 ]
>  Clone Set: pingclone [ping-gateway]
>      Started: [ wb-db1 wb-db2 ]
> 
> There was an unclean shutdown of a cluster and after that i've faced a problem that a slave of MySQL_MasterSlave resource does not come up.
> When i try to do a "cleanup MySQL_MasterSlave" i see such thing in logs:

Most of those errors are cosmetic and fixed in later versions.

> Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14

It you can get to rhel 6.5 you'll have access to 1.1.10 where these are fixed.

> 
> May 29 03:22:22 [4423] wb-db1       crmd:  warning: decode_transition_key:      Bad UUID (crm-resource-4819) in sscanf result (3) for 0:0:crm-resource-4819 
> May 29 03:22:22 [4423] wb-db1       crmd:  warning: decode_transition_key:      Bad UUID (crm-resource-4819) in sscanf result (3) for 0:0:crm-resource-4819 
> May 29 03:22:22 [4423] wb-db1       crmd:     info: ais_dispatch_message:       Membership 408: quorum retained 
> May 29 03:22:22 [4418] wb-db1        cib:     info: set_crm_log_level:  New log level: 3 0 
> May 29 03:22:38 [4421] wb-db1      attrd:   notice: attrd_ais_dispatch:         Update relayed from wb-db2 
> May 29 03:22:38 [4421] wb-db1      attrd:   notice: attrd_ais_dispatch:         Update relayed from wb-db2 
> May 29 03:22:38 [4418] wb-db1        cib:     info: apply_xml_diff:     Digest mis-match: expected 2f5bc3d7f673df3cf37f774211976d69, calculated b8a7adf0e34966242551556aab605286 
> May 29 03:22:38 [4418] wb-db1        cib:   notice: cib_process_diff:   Diff 0.243.4 -> 0.243.5 not applied to 0.243.4: Failed application of an update diff 
> May 29 03:22:38 [4418] wb-db1        cib:     info: cib_server_process_diff:    Requesting re-sync from peer 
> May 29 03:22:38 [4418] wb-db1        cib:   notice: cib_server_process_diff:    Not applying diff 0.243.4 -> 0.243.5 (sync in progress) 
> May 29 03:22:38 [4418] wb-db1        cib:     info: cib_replace_notify:         Replaced: -1.-1.-1 -> 0.243.5 from wb-db2 
> May 29 03:22:38 [4421] wb-db1      attrd:   notice: attrd_trigger_update:       Sending flush op to all hosts for: pingd (100) 
> May 29 03:22:38 [4421] wb-db1      attrd:   notice: attrd_trigger_update:       Sending flush op to all hosts for: probe_complete (true) 
> May 29 03:22:38 [4418] wb-db1        cib:     info: set_crm_log_level:  New log level: 3 0 
> May 29 03:22:38 [4418] wb-db1        cib:     info: apply_xml_diff:     Digest mis-match: expected 754ed3b1d999e34d93e0835b310fd98a, calculated c322686deb255936ab54e064c696b6b8 
> May 29 03:22:38 [4418] wb-db1        cib:   notice: cib_process_diff:   Diff 0.244.5 -> 0.244.6 not applied to 0.244.5: Failed application of an update diff 
> May 29 03:22:38 [4418] wb-db1        cib:     info: cib_server_process_diff:    Requesting re-sync from peer 
> May 29 03:22:38 [4423] wb-db1       crmd:     info: delete_resource:    Removing resource MySQL:0 for 4996_crm_resource (internal) on wb-db2 
> May 29 03:22:38 [4423] wb-db1       crmd:     info: notify_deleted:     Notifying 4996_crm_resource on wb-db2 that MySQL:0 was deleted 
> May 29 03:22:38 [4418] wb-db1        cib:   notice: cib_server_process_diff:    Not applying diff 0.244.5 -> 0.244.6 (sync in progress) 
> May 29 03:22:38 [4423] wb-db1       crmd:  warning: decode_transition_key:      Bad UUID (crm-resource-4996) in sscanf result (3) for 0:0:crm-resource-4996 
> May 29 03:22:38 [4418] wb-db1        cib:   notice: cib_server_process_diff:    Not applying diff 0.244.6 -> 0.244.7 (sync in progress) 
> May 29 03:22:38 [4418] wb-db1        cib:   notice: cib_server_process_diff:    Not applying diff 0.244.7 -> 0.244.8 (sync in progress) 
> May 29 03:22:38 [4418] wb-db1        cib:     info: cib_replace_notify:         Replaced: -1.-1.-1 -> 0.244.8 from wb-db2 
> May 29 03:22:38 [4421] wb-db1      attrd:   notice: attrd_trigger_update:       Sending flush op to all hosts for: pingd (100) 
> May 29 03:22:38 [4421] wb-db1      attrd:   notice: attrd_trigger_update:       Sending flush op to all hosts for: probe_complete (true) 
> May 29 03:22:38 [4423] wb-db1       crmd:   notice: do_lrm_invoke:      Not creating resource for a delete event: (null) 
> May 29 03:22:38 [4423] wb-db1       crmd:     info: notify_deleted:     Notifying 4996_crm_resource on wb-db2 that MySQL:1 was deleted 
> May 29 03:22:38 [4423] wb-db1       crmd:  warning: decode_transition_key:      Bad UUID (crm-resource-4996) in sscanf result (3) for 0:0:crm-resource-4996 
> May 29 03:22:38 [4423] wb-db1       crmd:  warning: decode_transition_key:      Bad UUID (crm-resource-4996) in sscanf result (3) for 0:0:crm-resource-4996 
> May 29 03:22:38 [4418] wb-db1        cib:     info: set_crm_log_level:  New log level: 3 0 
> May 29 03:22:38 [4423] wb-db1       crmd:     info: ais_dispatch_message:       Membership 408: quorum retained 
> 
> Here's the cibadmin -Q output from node that is alive: http://pastebin.com/aeqfTaCe
> And here's the one from failed node: http://pastebin.com/ME2U5vjK
> The question is: how do i somehow cleanup the things for master/slave resource MySQL_MasterSlave to start working properly?
> 
> Thank you!
> 
> -- 
> 
> 
> 
> 
> ~~~
> WBR,
> Vitaliy Turovets
> Lead Operations Engineer
> Global Message Services Ukraine
> +38(093)265-70-55
> VITU-RIPE
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140605/de41aa1e/attachment-0003.sig>