[Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?
Andrew Beekhof
andrew at beekhof.net
Tue Mar 11 21:40:27 UTC 2014
On 11 Mar 2014, at 6:23 pm, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
> 07.03.2014 10:30, Vladislav Bogdanov wrote:
>> 07.03.2014 05:43, Andrew Beekhof wrote:
>>>
>>> On 6 Mar 2014, at 10:39 pm, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>>
>>>> 18.02.2014 03:49, Andrew Beekhof wrote:
>>>>>
>>>>> On 31 Jan 2014, at 6:20 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>>>>>
>>>>>> Hi, all
>>>>>>
>>>>>> I measure the performance of Pacemaker in the following combinations.
>>>>>> Pacemaker-1.1.11.rc1
>>>>>> libqb-0.16.0
>>>>>> corosync-2.3.2
>>>>>>
>>>>>> All nodes are KVM virtual machines.
>>>>>>
>>>>>> stopped the node of vm01 compulsorily from the inside, after starting 14 nodes.
>>>>>> "virsh destroy vm01" was used for the stop.
>>>>>> Then, in addition to the compulsorily stopped node, other nodes are separated from a cluster.
>>>>>>
>>>>>> The log of "Retransmit List:" is then outputted in large quantities from corosync.
>>>>>
>>>>> Probably best to poke the corosync guys about this.
>>>>>
>>>>> However, <= .11 is known to cause significant CPU usage with that many nodes.
>>>>> I can easily imagine this staving corosync of resources and causing breakage.
>>>>>
>>>>> I would _highly_ recommend retesting with the current git master of pacemaker.
>>>>> I merged the new cib code last week which is faster by _two_ orders of magnitude and uses significantly less CPU.
>>>>
>>>> Andrew, current git master (ee094a2) almost works, the only issue is
>>>> that crm_diff calculates incorrect diff digest. If I replace digest in
>>>> diff by hands with what cib calculates as "expected". it applies
>>>> correctly. Otherwise - -206.
>>>
>>> More details?
>>
>> Hmmm...
>> seems to be crmsh-specific,
>> Cannot reproduce with pure-XML editing.
>> Kristoffer, does
>> http://hg.savannah.gnu.org/hgweb/crmsh/rev/c42d9361a310 address this?
>
> The problem seems to be caused by the fact that crmsh does not provide
> <status> section in both orig and new XMLs to crm_diff, and digest
> generation seems to rely on that, so crm_diff and cib daemon produce
> different digests.
>
> Attached are two sets of XML files, one (orig.xml, new.xml, patch.xml)
> are related to the full CIB operation (with status section included),
> another (orig-edited.xml, new-edited.xml, patch-edited.xml) have that
> section removed like crmsh does do.
>
> Resulting diffs differ only by digest, and that seems to be the exact issue.
This should help. As long as crmsh isn't passing -c to crm_diff, then the digest will no longer be present.
https://github.com/beekhof/pacemaker/commit/c8d443d
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140312/f6d35e8c/attachment-0004.sig>
More information about the Pacemaker
mailing list