[Pacemaker] The larger cluster is tested.

Mon Nov 11 01:08:39 EST 2013

Hi, Andrew

I tested by the following versions.
https://github.com/yuusuke/pacemaker/commit/3b90af1b11a4389f8b4a95a20ef12b8c259e73dc

However, the problem has not been solved yet.

I do not think that this problem can cope with it by batch-limit.
Execution of a job is interrupted by batch-limit temporarily.
However, graph will be immediately resumed by trigger_graph called in
match_graph_event.
Since the synchronous message of CIB is sent now ceaseless, the IPC
message sent from crmd cannot be processed.

The following methods can be considered to solve a problem for this
CPG message sent continuously.

In order to make the time when a CPG message is processed, it stops
that DC sends job for a definite period of time.

Or I think that it is necessary to make the priority of a CPG message
be the same as that of G_PRIORITY_DEFAULT defined by
gio_poll_dispatch_add().

I attach report which tested.
https://drive.google.com/file/d/0BwMFJItoO-fVdlIwTVdFOGRkQ0U/edit?usp=sharing

Regards,
Yusuke

2013/11/8 Andrew Beekhof <andrew at beekhof.net>:
>
> On 8 Nov 2013, at 12:10 am, yusuke iida <yusk.iida at gmail.com> wrote:
>
>> Hi, Andrew
>>
>> The shown code seems not to process correctly.
>> I wrote correction.
>> Please check.
>> https://github.com/yuusuke/pacemaker/commit/3b90af1b11a4389f8b4a95a20ef12b8c259e73dc
>
> Ah, yes that looks better.
> Did it help at all?
>
>>
>> Regards,
>> Yusuke
>>
>> 2013/11/7 Andrew Beekhof <andrew at beekhof.net>:
>>>
>>> On 7 Nov 2013, at 12:43 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>>>
>>>> Hi, Andrew
>>>>
>>>> 2013/11/7 Andrew Beekhof <andrew at beekhof.net>:
>>>>>
>>>>> On 6 Nov 2013, at 4:48 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>>>>>
>>>>>> Hi, Andrew
>>>>>>
>>>>>> I tested by the following versions.
>>>>>> https://github.com/ClusterLabs/pacemaker/commit/3492fec7fe58a6fd94071632df27d3fd3fc3ffe3
>>>>>>
>>>>>> load-threshold was checked at 60%, 40%, and 20%.
>>>>>>
>>>>>> However, the problem was not solved.
>>>>>> It will not change but timeout will occur.
>>>>>
>>>>> That is extremely surprising.  I will have a look at your logs today.
>>>>> How many cores do these machines have btw?
>>>>
>>>> The machine which I am using by the test is a virtual machine of KVM.
>>>> There are four physical servers. Four virtual machines are started on
>>>> each server.
>>>> Has four core physical server, I am assigned a core of separate to the
>>>> virtual machine.
>>>> The number of CPUs currently assigned to the virtual machine is one piece.
>>>> The memory is assigning 2048 MB per set.
>>>
>>> I think I understand whats happening...
>>>
>>> The throttling code is designed to keep the cib's CPU usage from reaching 100% (ie. 1 core completely busy).
>>> In a single core setup, thats already much too late, and with 16 nodes I can easily imagine that even 1 job per machine is going to be too much for an underpowered CPU.
>>>
>>> I'm currently experimenting with:
>>>
>>>   http://paste.fedoraproject.org/52283/37994581
>>>
>>> which may help on both fronts.
>>>
>>> Essentially it is trying to dynamically infer a "good" value for batch-limit when the CIB is using too much CPU.
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>> --
>> ----------------------------------------
>> METRO SYSTEMS CO., LTD
>>
>> Yusuke Iida
>> Mail: yusk.iida at gmail.com
>> ----------------------------------------
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
----------------------------------------
METRO SYSTEMS CO., LTD

Yusuke Iida
Mail: yusk.iida at gmail.com
----------------------------------------