[Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

Fri Mar 7 02:30:13 EST 2014

07.03.2014 05:43, Andrew Beekhof wrote:
> 
> On 6 Mar 2014, at 10:39 pm, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
> 
>> 18.02.2014 03:49, Andrew Beekhof wrote:
>>>
>>> On 31 Jan 2014, at 6:20 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>>>
>>>> Hi, all
>>>>
>>>> I measure the performance of Pacemaker in the following combinations.
>>>> Pacemaker-1.1.11.rc1
>>>> libqb-0.16.0
>>>> corosync-2.3.2
>>>>
>>>> All nodes are KVM virtual machines.
>>>>
>>>> stopped the node of vm01 compulsorily from the inside, after starting 14 nodes.
>>>> "virsh destroy vm01" was used for the stop.
>>>> Then, in addition to the compulsorily stopped node, other nodes are separated from a cluster.
>>>>
>>>> The log of "Retransmit List:" is then outputted in large quantities from corosync.
>>>
>>> Probably best to poke the corosync guys about this.
>>>
>>> However, <= .11 is known to cause significant CPU usage with that many nodes.
>>> I can easily imagine this staving corosync of resources and causing breakage.
>>>
>>> I would _highly_ recommend retesting with the current git master of pacemaker.
>>> I merged the new cib code last week which is faster by _two_ orders of magnitude and uses significantly less CPU.
>>
>> Andrew, current git master (ee094a2) almost works, the only issue is
>> that crm_diff calculates incorrect diff digest. If I replace digest in
>> diff by hands with what cib calculates as "expected". it applies
>> correctly. Otherwise - -206.
> 
> More details?

Hmmm...
seems to be crmsh-specific,
Cannot reproduce with pure-XML editing.
Kristoffer, does 
http://hg.savannah.gnu.org/hgweb/crmsh/rev/c42d9361a310 address this?

> 
>>
>>>
>>> I'd be interested to hear your feedback.
>>>
>>>>
>>>> What is the reason which the node in which failure has not occurred carries out "lost"?
>>>>
>>>> Please advise, if there is a problem in a setup in something.
>>>>
>>>> I attached the report when the problem occurred.
>>>> https://drive.google.com/file/d/0BwMFJItoO-fVMkFWWWlQQldsSFU/edit?usp=sharing
>>>>
>>>> Regards,
>>>> Yusuke
>>>> -- 
>>>> ---------------------------------------- 
>>>> METRO SYSTEMS CO., LTD 
>>>>
>>>> Yusuke Iida 
>>>> Mail: yusk.iida at gmail.com
>>>> ---------------------------------------- 
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>