[Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

Vladislav Bogdanov bubble at hoster-ok.com
Wed Mar 12 09:40:57 EDT 2014


12.03.2014 00:40, Andrew Beekhof wrote:
> 
> On 11 Mar 2014, at 6:23 pm, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
> 
>> 07.03.2014 10:30, Vladislav Bogdanov wrote:
>>> 07.03.2014 05:43, Andrew Beekhof wrote:
>>>>
>>>> On 6 Mar 2014, at 10:39 pm, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>>>
>>>>> 18.02.2014 03:49, Andrew Beekhof wrote:
>>>>>>
>>>>>> On 31 Jan 2014, at 6:20 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>>>>>>
>>>>>>> Hi, all
>>>>>>>
>>>>>>> I measure the performance of Pacemaker in the following combinations.
>>>>>>> Pacemaker-1.1.11.rc1
>>>>>>> libqb-0.16.0
>>>>>>> corosync-2.3.2
>>>>>>>
>>>>>>> All nodes are KVM virtual machines.
>>>>>>>
>>>>>>> stopped the node of vm01 compulsorily from the inside, after starting 14 nodes.
>>>>>>> "virsh destroy vm01" was used for the stop.
>>>>>>> Then, in addition to the compulsorily stopped node, other nodes are separated from a cluster.
>>>>>>>
>>>>>>> The log of "Retransmit List:" is then outputted in large quantities from corosync.
>>>>>>
>>>>>> Probably best to poke the corosync guys about this.
>>>>>>
>>>>>> However, <= .11 is known to cause significant CPU usage with that many nodes.
>>>>>> I can easily imagine this staving corosync of resources and causing breakage.
>>>>>>
>>>>>> I would _highly_ recommend retesting with the current git master of pacemaker.
>>>>>> I merged the new cib code last week which is faster by _two_ orders of magnitude and uses significantly less CPU.
>>>>>
>>>>> Andrew, current git master (ee094a2) almost works, the only issue is
>>>>> that crm_diff calculates incorrect diff digest. If I replace digest in
>>>>> diff by hands with what cib calculates as "expected". it applies
>>>>> correctly. Otherwise - -206.
>>>>
>>>> More details?
>>>
>>> Hmmm...
>>> seems to be crmsh-specific,
>>> Cannot reproduce with pure-XML editing.
>>> Kristoffer, does 
>>> http://hg.savannah.gnu.org/hgweb/crmsh/rev/c42d9361a310 address this?
>>
>> The problem seems to be caused by the fact that crmsh does not provide
>> <status> section in both orig and new XMLs to crm_diff, and digest
>> generation seems to rely on that, so crm_diff and cib daemon produce
>> different digests.
>>
>> Attached are two sets of XML files, one (orig.xml, new.xml, patch.xml)
>> are related to the full CIB operation (with status section included),
>> another (orig-edited.xml, new-edited.xml, patch-edited.xml) have that
>> section removed like crmsh does do.
>>
>> Resulting diffs differ only by digest, and that seems to be the exact issue.
> 
> This should help.  As long as crmsh isn't passing -c to crm_diff, then the digest will no longer be present.
> 
>   https://github.com/beekhof/pacemaker/commit/c8d443d

Yep, that helped.
Thank you!





More information about the Pacemaker mailing list