[Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

Andrew Beekhof andrew at beekhof.net
Fri Mar 7 03:43:57 CET 2014


On 6 Mar 2014, at 10:39 pm, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:

> 18.02.2014 03:49, Andrew Beekhof wrote:
>> 
>> On 31 Jan 2014, at 6:20 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>> 
>>> Hi, all
>>> 
>>> I measure the performance of Pacemaker in the following combinations.
>>> Pacemaker-1.1.11.rc1
>>> libqb-0.16.0
>>> corosync-2.3.2
>>> 
>>> All nodes are KVM virtual machines.
>>> 
>>> stopped the node of vm01 compulsorily from the inside, after starting 14 nodes.
>>> "virsh destroy vm01" was used for the stop.
>>> Then, in addition to the compulsorily stopped node, other nodes are separated from a cluster.
>>> 
>>> The log of "Retransmit List:" is then outputted in large quantities from corosync.
>> 
>> Probably best to poke the corosync guys about this.
>> 
>> However, <= .11 is known to cause significant CPU usage with that many nodes.
>> I can easily imagine this staving corosync of resources and causing breakage.
>> 
>> I would _highly_ recommend retesting with the current git master of pacemaker.
>> I merged the new cib code last week which is faster by _two_ orders of magnitude and uses significantly less CPU.
> 
> Andrew, current git master (ee094a2) almost works, the only issue is
> that crm_diff calculates incorrect diff digest. If I replace digest in
> diff by hands with what cib calculates as "expected". it applies
> correctly. Otherwise - -206.

More details?

> 
>> 
>> I'd be interested to hear your feedback.
>> 
>>> 
>>> What is the reason which the node in which failure has not occurred carries out "lost"?
>>> 
>>> Please advise, if there is a problem in a setup in something.
>>> 
>>> I attached the report when the problem occurred.
>>> https://drive.google.com/file/d/0BwMFJItoO-fVMkFWWWlQQldsSFU/edit?usp=sharing
>>> 
>>> Regards,
>>> Yusuke
>>> -- 
>>> ---------------------------------------- 
>>> METRO SYSTEMS CO., LTD 
>>> 
>>> Yusuke Iida 
>>> Mail: yusk.iida at gmail.com
>>> ---------------------------------------- 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140307/efdd1b9a/attachment-0001.sig>


More information about the Pacemaker mailing list