[Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

Andrew Beekhof andrew at beekhof.net
Tue Feb 18 04:22:21 EST 2014


On 18 Feb 2014, at 8:18 pm, Andrew Beekhof <andrew at beekhof.net> wrote:

> 
> On 18 Feb 2014, at 7:40 pm, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
> 
>> 18.02.2014 03:49, Andrew Beekhof wrote:
>>> 
>>> On 31 Jan 2014, at 6:20 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>>> 
>>>> Hi, all
>>>> 
>>>> I measure the performance of Pacemaker in the following combinations.
>>>> Pacemaker-1.1.11.rc1
>>>> libqb-0.16.0
>>>> corosync-2.3.2
>>>> 
>>>> All nodes are KVM virtual machines.
>>>> 
>>>> stopped the node of vm01 compulsorily from the inside, after starting 14 nodes.
>>>> "virsh destroy vm01" was used for the stop.
>>>> Then, in addition to the compulsorily stopped node, other nodes are separated from a cluster.
>>>> 
>>>> The log of "Retransmit List:" is then outputted in large quantities from corosync.
>>> 
>>> Probably best to poke the corosync guys about this.
>>> 
>>> However, <= .11 is known to cause significant CPU usage with that many nodes.
>>> I can easily imagine this staving corosync of resources and causing breakage.
>>> 
>>> I would _highly_ recommend retesting with the current git master of pacemaker.
>>> I merged the new cib code last week which is faster by _two_ orders of magnitude and uses significantly less CPU.
>> 
>> Andrew, you mean your cib-performance branch, am I correct?
> 
> Yes
> 
>> 
>> Unfortunately it is not in .11
> 
> Intentionally so :)
> 
>> (sorry if I overlooked it there), and
>> even not in Clusterlabs/master yet and seems to be merged and then
>> reverted in beekhof/master...
> 
> This has just been brought to my attention :-(
> 
> https://github.com/beekhof/pacemaker/commit/1d98f6fd9eb76bd2498bc6356a3aa6e91a8a70e4#commitcomment-5405620
> 
> Give me a few minutes and i'll correct it

Ok, i've force pushed an tree without the above screwup.
I'll merge into ClusterLabs tomorrow

> 
>> 
>> 
>>> 
>>> I'd be interested to hear your feedback.
>>> 
>>>> 
>>>> What is the reason which the node in which failure has not occurred carries out "lost"?
>>>> 
>>>> Please advise, if there is a problem in a setup in something.
>>>> 
>>>> I attached the report when the problem occurred.
>>>> https://drive.google.com/file/d/0BwMFJItoO-fVMkFWWWlQQldsSFU/edit?usp=sharing
>>>> 
>>>> Regards,
>>>> Yusuke
>>>> -- 
>>>> ---------------------------------------- 
>>>> METRO SYSTEMS CO., LTD 
>>>> 
>>>> Yusuke Iida 
>>>> Mail: yusk.iida at gmail.com
>>>> ---------------------------------------- 
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>> 
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140218/43f2635a/attachment-0003.sig>


More information about the Pacemaker mailing list