[Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?
Andrew Beekhof
andrew at beekhof.net
Tue Mar 11 22:43:28 CET 2014
On 12 Mar 2014, at 8:40 am, Andrew Beekhof <andrew at beekhof.net> wrote:
>
> On 11 Mar 2014, at 6:23 pm, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>
>> 07.03.2014 10:30, Vladislav Bogdanov wrote:
>>> 07.03.2014 05:43, Andrew Beekhof wrote:
>>>>
>>>> On 6 Mar 2014, at 10:39 pm, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>>>
>>>>> 18.02.2014 03:49, Andrew Beekhof wrote:
>>>>>>
>>>>>> On 31 Jan 2014, at 6:20 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>>>>>>
>>>>>>> Hi, all
>>>>>>>
>>>>>>> I measure the performance of Pacemaker in the following combinations.
>>>>>>> Pacemaker-1.1.11.rc1
>>>>>>> libqb-0.16.0
>>>>>>> corosync-2.3.2
>>>>>>>
>>>>>>> All nodes are KVM virtual machines.
>>>>>>>
>>>>>>> stopped the node of vm01 compulsorily from the inside, after starting 14 nodes.
>>>>>>> "virsh destroy vm01" was used for the stop.
>>>>>>> Then, in addition to the compulsorily stopped node, other nodes are separated from a cluster.
>>>>>>>
>>>>>>> The log of "Retransmit List:" is then outputted in large quantities from corosync.
>>>>>>
>>>>>> Probably best to poke the corosync guys about this.
>>>>>>
>>>>>> However, <= .11 is known to cause significant CPU usage with that many nodes.
>>>>>> I can easily imagine this staving corosync of resources and causing breakage.
>>>>>>
>>>>>> I would _highly_ recommend retesting with the current git master of pacemaker.
>>>>>> I merged the new cib code last week which is faster by _two_ orders of magnitude and uses significantly less CPU.
>>>>>
>>>>> Andrew, current git master (ee094a2) almost works, the only issue is
>>>>> that crm_diff calculates incorrect diff digest. If I replace digest in
>>>>> diff by hands with what cib calculates as "expected". it applies
>>>>> correctly. Otherwise - -206.
>>>>
>>>> More details?
>>>
>>> Hmmm...
>>> seems to be crmsh-specific,
>>> Cannot reproduce with pure-XML editing.
>>> Kristoffer, does
>>> http://hg.savannah.gnu.org/hgweb/crmsh/rev/c42d9361a310 address this?
>>
>> The problem seems to be caused by the fact that crmsh does not provide
>> <status> section in both orig and new XMLs to crm_diff, and digest
>> generation seems to rely on that, so crm_diff and cib daemon produce
>> different digests.
>>
>> Attached are two sets of XML files, one (orig.xml, new.xml, patch.xml)
>> are related to the full CIB operation (with status section included),
>> another (orig-edited.xml, new-edited.xml, patch-edited.xml) have that
>> section removed like crmsh does do.
>>
>> Resulting diffs differ only by digest, and that seems to be the exact issue.
>
> This should help. As long as crmsh isn't passing -c to crm_diff, then the digest will no longer be present.
>
> https://github.com/beekhof/pacemaker/commit/c8d443d
Github seems to be doing something weird at the moment... here's the raw patch:
commit c8d443d8d1604dde2727cf716951231ed05926e4
Author: Andrew Beekhof <andrew at beekhof.net>
Date: Wed Mar 12 08:38:58 2014 +1100
Fix: crm_diff: Allow the generation of xml patchsets without digests
diff --git a/tools/xml_diff.c b/tools/xml_diff.c
index c8673b9..b98859e 100644
--- a/tools/xml_diff.c
+++ b/tools/xml_diff.c
@@ -199,7 +199,7 @@ main(int argc, char **argv)
xml_calculate_changes(object_1, object_2);
crm_log_xml_debug(object_2, xml_file_2?xml_file_2:"target");
- output = xml_create_patchset(0, object_1, object_2, NULL, FALSE, TRUE);
+ output = xml_create_patchset(0, object_1, object_2, NULL, FALSE, as_cib);
if(as_cib && output) {
int add[] = { 0, 0, 0 };
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140312/964d90da/attachment.sig>
More information about the Pacemaker
mailing list