[Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

Andrew Beekhof andrew at beekhof.net
Tue Mar 11 17:40:27 EDT 2014


On 11 Mar 2014, at 6:23 pm, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:

> 07.03.2014 10:30, Vladislav Bogdanov wrote:
>> 07.03.2014 05:43, Andrew Beekhof wrote:
>>> 
>>> On 6 Mar 2014, at 10:39 pm, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>> 
>>>> 18.02.2014 03:49, Andrew Beekhof wrote:
>>>>> 
>>>>> On 31 Jan 2014, at 6:20 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>>>>> 
>>>>>> Hi, all
>>>>>> 
>>>>>> I measure the performance of Pacemaker in the following combinations.
>>>>>> Pacemaker-1.1.11.rc1
>>>>>> libqb-0.16.0
>>>>>> corosync-2.3.2
>>>>>> 
>>>>>> All nodes are KVM virtual machines.
>>>>>> 
>>>>>> stopped the node of vm01 compulsorily from the inside, after starting 14 nodes.
>>>>>> "virsh destroy vm01" was used for the stop.
>>>>>> Then, in addition to the compulsorily stopped node, other nodes are separated from a cluster.
>>>>>> 
>>>>>> The log of "Retransmit List:" is then outputted in large quantities from corosync.
>>>>> 
>>>>> Probably best to poke the corosync guys about this.
>>>>> 
>>>>> However, <= .11 is known to cause significant CPU usage with that many nodes.
>>>>> I can easily imagine this staving corosync of resources and causing breakage.
>>>>> 
>>>>> I would _highly_ recommend retesting with the current git master of pacemaker.
>>>>> I merged the new cib code last week which is faster by _two_ orders of magnitude and uses significantly less CPU.
>>>> 
>>>> Andrew, current git master (ee094a2) almost works, the only issue is
>>>> that crm_diff calculates incorrect diff digest. If I replace digest in
>>>> diff by hands with what cib calculates as "expected". it applies
>>>> correctly. Otherwise - -206.
>>> 
>>> More details?
>> 
>> Hmmm...
>> seems to be crmsh-specific,
>> Cannot reproduce with pure-XML editing.
>> Kristoffer, does 
>> http://hg.savannah.gnu.org/hgweb/crmsh/rev/c42d9361a310 address this?
> 
> The problem seems to be caused by the fact that crmsh does not provide
> <status> section in both orig and new XMLs to crm_diff, and digest
> generation seems to rely on that, so crm_diff and cib daemon produce
> different digests.
> 
> Attached are two sets of XML files, one (orig.xml, new.xml, patch.xml)
> are related to the full CIB operation (with status section included),
> another (orig-edited.xml, new-edited.xml, patch-edited.xml) have that
> section removed like crmsh does do.
> 
> Resulting diffs differ only by digest, and that seems to be the exact issue.

This should help.  As long as crmsh isn't passing -c to crm_diff, then the digest will no longer be present.

  https://github.com/beekhof/pacemaker/commit/c8d443d
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140312/f6d35e8c/attachment-0003.sig>


More information about the Pacemaker mailing list