[Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

Vladislav Bogdanov bubble at hoster-ok.com
Tue Mar 11 08:23:48 CET 2014


07.03.2014 10:30, Vladislav Bogdanov wrote:
> 07.03.2014 05:43, Andrew Beekhof wrote:
>>
>> On 6 Mar 2014, at 10:39 pm, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>
>>> 18.02.2014 03:49, Andrew Beekhof wrote:
>>>>
>>>> On 31 Jan 2014, at 6:20 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>>>>
>>>>> Hi, all
>>>>>
>>>>> I measure the performance of Pacemaker in the following combinations.
>>>>> Pacemaker-1.1.11.rc1
>>>>> libqb-0.16.0
>>>>> corosync-2.3.2
>>>>>
>>>>> All nodes are KVM virtual machines.
>>>>>
>>>>> stopped the node of vm01 compulsorily from the inside, after starting 14 nodes.
>>>>> "virsh destroy vm01" was used for the stop.
>>>>> Then, in addition to the compulsorily stopped node, other nodes are separated from a cluster.
>>>>>
>>>>> The log of "Retransmit List:" is then outputted in large quantities from corosync.
>>>>
>>>> Probably best to poke the corosync guys about this.
>>>>
>>>> However, <= .11 is known to cause significant CPU usage with that many nodes.
>>>> I can easily imagine this staving corosync of resources and causing breakage.
>>>>
>>>> I would _highly_ recommend retesting with the current git master of pacemaker.
>>>> I merged the new cib code last week which is faster by _two_ orders of magnitude and uses significantly less CPU.
>>>
>>> Andrew, current git master (ee094a2) almost works, the only issue is
>>> that crm_diff calculates incorrect diff digest. If I replace digest in
>>> diff by hands with what cib calculates as "expected". it applies
>>> correctly. Otherwise - -206.
>>
>> More details?
> 
> Hmmm...
> seems to be crmsh-specific,
> Cannot reproduce with pure-XML editing.
> Kristoffer, does 
> http://hg.savannah.gnu.org/hgweb/crmsh/rev/c42d9361a310 address this?

The problem seems to be caused by the fact that crmsh does not provide
<status> section in both orig and new XMLs to crm_diff, and digest
generation seems to rely on that, so crm_diff and cib daemon produce
different digests.

Attached are two sets of XML files, one (orig.xml, new.xml, patch.xml)
are related to the full CIB operation (with status section included),
another (orig-edited.xml, new-edited.xml, patch-edited.xml) have that
section removed like crmsh does do.

Resulting diffs differ only by digest, and that seems to be the exact issue.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: new.xml
Type: text/xml
Size: 1399 bytes
Desc: not available
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140311/908e059c/attachment.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: new-edited.xml
Type: text/xml
Size: 867 bytes
Desc: not available
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140311/908e059c/attachment-0001.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: orig.xml
Type: text/xml
Size: 1298 bytes
Desc: not available
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140311/908e059c/attachment-0002.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: orig-edited.xml
Type: text/xml
Size: 766 bytes
Desc: not available
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140311/908e059c/attachment-0003.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch.xml
Type: text/xml
Size: 898 bytes
Desc: not available
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140311/908e059c/attachment-0004.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch-edited.xml
Type: text/xml
Size: 898 bytes
Desc: not available
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140311/908e059c/attachment-0005.xml>


More information about the Pacemaker mailing list