[Pacemaker] The larger cluster is tested.

Mon Nov 11 07:48:10 EST 2013

Hi, Andrew

I check the log of the DC.

As long as the following log is seen, change of batch-limit seems to
have succeeded.

Is the initial value of 0 at first.
batch-limit has been changed into 16 if load becomes high.

# grep throttle_get_total_job_limit pacemaker.log
Nov 08 15:26:05 [2473] vm12       crmd: (  throttle.c:629   )   trace:
throttle_get_total_job_limit:    No change to batch-limit=0
Nov 08 15:26:05 [2473] vm12       crmd: (  throttle.c:629   )   trace:
throttle_get_total_job_limit:    No change to batch-limit=0
Nov 08 15:26:27 [2473] vm12       crmd: (  throttle.c:629   )   trace:
throttle_get_total_job_limit:    No change to batch-limit=0
Nov 08 15:26:28 [2473] vm12       crmd: (  throttle.c:629   )   trace:
throttle_get_total_job_limit:    No change to batch-limit=0
Nov 08 15:26:28 [2473] vm12       crmd: (  throttle.c:629   )   trace:
throttle_get_total_job_limit:    No change to batch-limit=0
Nov 08 15:26:29 [2473] vm12       crmd: (  throttle.c:629   )   trace:
throttle_get_total_job_limit:    No change to batch-limit=0
Nov 08 15:26:29 [2473] vm12       crmd: (  throttle.c:629   )   trace:
throttle_get_total_job_limit:    No change to batch-limit=0
Nov 08 15:26:29 [2473] vm12       crmd: (  throttle.c:629   )   trace:
throttle_get_total_job_limit:    No change to batch-limit=0
Nov 08 15:26:30 [2473] vm12       crmd: (  throttle.c:629   )   trace:
throttle_get_total_job_limit:    No change to batch-limit=0
(snip)
Nov 08 15:27:13 [2473] vm12       crmd: (  throttle.c:629   )   trace:
throttle_get_total_job_limit:    No change to batch-limit=0
Nov 08 15:27:13 [2473] vm12       crmd: (  throttle.c:632   )   trace:
throttle_get_total_job_limit:    Using batch-limit=16
Nov 08 15:27:14 [2473] vm12       crmd: (  throttle.c:632   )   trace:
throttle_get_total_job_limit:    Using batch-limit=16
Nov 08 15:27:14 [2473] vm12       crmd: (  throttle.c:632   )   trace:
throttle_get_total_job_limit:    Using batch-limit=16
Nov 08 15:27:14 [2473] vm12       crmd: (  throttle.c:632   )   trace:
throttle_get_total_job_limit:    Using batch-limit=16
Nov 08 15:27:15 [2473] vm12       crmd: (  throttle.c:632   )   trace:
throttle_get_total_job_limit:    Using batch-limit=16

Execution of the graph was also checked.
Since the number of pending(s) is restricted to 16 from the middle, it
is judged that batch-limit is effective.
Observing here, even if a job is restricted by batch-limit, two or
more jobs are always fired(ed) in 1 second.
These performed jobs return a result and the synchronous message of
CIB generates them.
The node which continued receiving a synchronous message processes
there preferentially, and postpones an internal IPC message.
I think that it caused timeout.

# grep run_graph pacemaker.log

(snip) ### The number of jobs with which job limit=2 performs at once
immediately after load is restricted by 32 pieces.

Nov 08 15:26:27 [2473] vm12       crmd: (     graph.c:336   )   debug:
run_graph:       Transition 1 (Complete=0, Pending=32, Fired=51,
Skipped=0, Incomplete=3124,
Source=/var/lib/pacemaker/pengine/pe-input-67.bz2): In-progress
Nov 08 15:26:28 [2473] vm12       crmd: (     graph.c:336   )   debug:
run_graph:       Transition 1 (Complete=26, Pending=32, Fired=7,
Skipped=0, Incomplete=3117,
Source=/var/lib/pacemaker/pengine/pe-input-67.bz2): In-progress
Nov 08 15:26:28 [2473] vm12       crmd: (     graph.c:336   )   debug:
run_graph:       Transition 1 (Complete=33, Pending=32, Fired=7,
Skipped=0, Incomplete=3110,
Source=/var/lib/pacemaker/pengine/pe-input-67.bz2): In-progress
Nov 08 15:26:29 [2473] vm12       crmd: (     graph.c:336   )   debug:
run_graph:       Transition 1 (Complete=43, Pending=32, Fired=10,
Skipped=0, Incomplete=3100,
Source=/var/lib/pacemaker/pengine/pe-input-67.bz2): In-progress
Nov 08 15:26:29 [2473] vm12       crmd: (     graph.c:336   )   debug:
run_graph:       Transition 1 (Complete=51, Pending=32, Fired=8,
Skipped=0, Incomplete=3092,
Source=/var/lib/pacemaker/pengine/pe-input-67.bz2): In-progress

(snip) ### After a while, other nodes judge it as High and batch-limit
is restricted to 16.

Nov 08 15:27:13 [2473] vm12       crmd: (     graph.c:277   )   debug:
run_graph:       Throttling output: batch limit (16) reached
Nov 08 15:27:13 [2473] vm12       crmd: (     graph.c:336   )   debug:
run_graph:       Transition 1 (Complete=583, Pending=29, Fired=0,
Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-67.bz2): In-progress
Nov 08 15:27:14 [2473] vm12       crmd: (     graph.c:277   )   debug:
run_graph:       Throttling output: batch limit (16) reached
Nov 08 15:27:14 [2473] vm12       crmd: (     graph.c:336   )   debug:
run_graph:       Transition 1 (Complete=586, Pending=26, Fired=0,
Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-67.bz2): In-progress
Nov 08 15:27:14 [2473] vm12       crmd: (     graph.c:277   )   debug:
run_graph:       Throttling output: batch limit (16) reached
Nov 08 15:27:14 [2473] vm12       crmd: (     graph.c:336   )   debug:
run_graph:       Transition 1 (Complete=590, Pending=22, Fired=0,
Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-67.bz2): In-progress
Nov 08 15:27:14 [2473] vm12       crmd: (     graph.c:277   )   debug:
run_graph:       Throttling output: batch limit (16) reached
Nov 08 15:27:14 [2473] vm12       crmd: (     graph.c:336   )   debug:
run_graph:       Transition 1 (Complete=592, Pending=20, Fired=0,
Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-67.bz2): In-progress
Nov 08 15:27:15 [2473] vm12       crmd: (     graph.c:277   )   debug:
run_graph:       Throttling output: batch limit (16) reached
Nov 08 15:27:15 [2473] vm12       crmd: (     graph.c:336   )   debug:
run_graph:       Transition 1 (Complete=594, Pending=18, Fired=0,
Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-67.bz2): In-progress
Nov 08 15:27:15 [2473] vm12       crmd: (     graph.c:277   )   debug:
run_graph:       Throttling output: batch limit (16) reached
Nov 08 15:27:15 [2473] vm12       crmd: (     graph.c:336   )   debug:
run_graph:       Transition 1 (Complete=596, Pending=16, Fired=0,
Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-67.bz2): In-progress
Nov 08 15:27:15 [2473] vm12       crmd: (     graph.c:277   )   debug:
run_graph:       Throttling output: batch limit (16) reached
Nov 08 15:27:15 [2473] vm12       crmd: (     graph.c:336   )   debug:
run_graph:       Transition 1 (Complete=598, Pending=16, Fired=2,
Skipped=0, Incomplete=228,
Source=/var/lib/pacemaker/pengine/pe-input-67.bz2): In-progress
Nov 08 15:27:16 [2473] vm12       crmd: (     graph.c:277   )   debug:
run_graph:       Throttling output: batch limit (16) reached
Nov 08 15:27:16 [2473] vm12       crmd: (     graph.c:336   )   debug:
run_graph:       Transition 1 (Complete=600, Pending=16, Fired=2,
Skipped=0, Incomplete=233,
Source=/var/lib/pacemaker/pengine/pe-input-67.bz2): In-progress
Nov 08 15:27:16 [2473] vm12       crmd: (     graph.c:277   )   debug:
run_graph:       Throttling output: batch limit (16) reached
Nov 08 15:27:16 [2473] vm12       crmd: (     graph.c:336   )   debug:
run_graph:       Transition 1 (Complete=603, Pending=16, Fired=3,
Skipped=0, Incomplete=233,
Source=/var/lib/pacemaker/pengine/pe-input-67.bz2): In-progress
Nov 08 15:27:16 [2473] vm12       crmd: (     graph.c:277   )   debug:
run_graph:       Throttling output: batch limit (16) reached
Nov 08 15:27:16 [2473] vm12       crmd: (     graph.c:336   )   debug:
run_graph:       Transition 1 (Complete=605, Pending=16, Fired=2,
Skipped=0, Incomplete=241,
Source=/var/lib/pacemaker/pengine/pe-input-67.bz2): In-progress
Nov 08 15:27:17 [2473] vm12       crmd: (     graph.c:277   )   debug:
run_graph:       Throttling output: batch limit (16) reached
Nov 08 15:27:17 [2473] vm12       crmd: (     graph.c:336   )   debug:
run_graph:       Transition 1 (Complete=609, Pending=16, Fired=4,
Skipped=0, Incomplete=272,
Source=/var/lib/pacemaker/pengine/pe-input-67.bz2): In-progress
Nov 08 15:27:17 [2473] vm12       crmd: (     graph.c:277   )   debug:
run_graph:       Throttling output: batch limit (16) reached
Nov 08 15:27:17 [2473] vm12       crmd: (     graph.c:336   )   debug:
run_graph:       Transition 1 (Complete=611, Pending=16, Fired=2,
Skipped=0, Incomplete=243,
Source=/var/lib/pacemaker/pengine/pe-input-67.bz2): In-progress

Regards,
Yusuke
2013/11/11 Andrew Beekhof <andrew at beekhof.net>:
>
> On 11 Nov 2013, at 5:08 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>
>> Hi, Andrew
>>
>> I tested by the following versions.
>> https://github.com/yuusuke/pacemaker/commit/3b90af1b11a4389f8b4a95a20ef12b8c259e73dc
>>
>> However, the problem has not been solved yet.
>>
>> I do not think that this problem can cope with it by batch-limit.
>> Execution of a job is interrupted by batch-limit temporarily.
>> However, graph will be immediately resumed by trigger_graph called in
>> match_graph_event.
>
> batch-limit controls how many in-flight jobs can be performed (and therefor how busy the CIB can be).
> If batch-limit=10 and there are still 10 jobs in progress, then calling trigger_graph() over and over does nothing until there are 9 jobs (or less).
> At which point one more can be scheduled.
>
> So if "synchronous message of CIB is sent now ceaseless", then there is a bug somewhere.
> Did you confirm that throttle_get_total_job_limit() was returning an appropriate value?
>
>> Since the synchronous message of CIB is sent now ceaseless, the IPC
>> message sent from crmd cannot be processed.
>>
>> The following methods can be considered to solve a problem for this
>> CPG message sent continuously.
>>
>> In order to make the time when a CPG message is processed, it stops
>> that DC sends job for a definite period of time.
>>
>> Or I think that it is necessary to make the priority of a CPG message
>> be the same as that of G_PRIORITY_DEFAULT defined by
>> gio_poll_dispatch_add().
>>
>> I attach report which tested.
>> https://drive.google.com/file/d/0BwMFJItoO-fVdlIwTVdFOGRkQ0U/edit?usp=sharing
>>
>> Regards,
>> Yusuke
>>
>> 2013/11/8 Andrew Beekhof <andrew at beekhof.net>:
>>>
>>> On 8 Nov 2013, at 12:10 am, yusuke iida <yusk.iida at gmail.com> wrote:
>>>
>>>> Hi, Andrew
>>>>
>>>> The shown code seems not to process correctly.
>>>> I wrote correction.
>>>> Please check.
>>>> https://github.com/yuusuke/pacemaker/commit/3b90af1b11a4389f8b4a95a20ef12b8c259e73dc
>>>
>>> Ah, yes that looks better.
>>> Did it help at all?
>>>
>>>>
>>>> Regards,
>>>> Yusuke
>>>>
>>>> 2013/11/7 Andrew Beekhof <andrew at beekhof.net>:
>>>>>
>>>>> On 7 Nov 2013, at 12:43 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>>>>>
>>>>>> Hi, Andrew
>>>>>>
>>>>>> 2013/11/7 Andrew Beekhof <andrew at beekhof.net>:
>>>>>>>
>>>>>>> On 6 Nov 2013, at 4:48 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi, Andrew
>>>>>>>>
>>>>>>>> I tested by the following versions.
>>>>>>>> https://github.com/ClusterLabs/pacemaker/commit/3492fec7fe58a6fd94071632df27d3fd3fc3ffe3
>>>>>>>>
>>>>>>>> load-threshold was checked at 60%, 40%, and 20%.
>>>>>>>>
>>>>>>>> However, the problem was not solved.
>>>>>>>> It will not change but timeout will occur.
>>>>>>>
>>>>>>> That is extremely surprising.  I will have a look at your logs today.
>>>>>>> How many cores do these machines have btw?
>>>>>>
>>>>>> The machine which I am using by the test is a virtual machine of KVM.
>>>>>> There are four physical servers. Four virtual machines are started on
>>>>>> each server.
>>>>>> Has four core physical server, I am assigned a core of separate to the
>>>>>> virtual machine.
>>>>>> The number of CPUs currently assigned to the virtual machine is one piece.
>>>>>> The memory is assigning 2048 MB per set.
>>>>>
>>>>> I think I understand whats happening...
>>>>>
>>>>> The throttling code is designed to keep the cib's CPU usage from reaching 100% (ie. 1 core completely busy).
>>>>> In a single core setup, thats already much too late, and with 16 nodes I can easily imagine that even 1 job per machine is going to be too much for an underpowered CPU.
>>>>>
>>>>> I'm currently experimenting with:
>>>>>
>>>>>  http://paste.fedoraproject.org/52283/37994581
>>>>>
>>>>> which may help on both fronts.
>>>>>
>>>>> Essentially it is trying to dynamically infer a "good" value for batch-limit when the CIB is using too much CPU.
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>> --
>>>> ----------------------------------------
>>>> METRO SYSTEMS CO., LTD
>>>>
>>>> Yusuke Iida
>>>> Mail: yusk.iida at gmail.com
>>>> ----------------------------------------
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>> --
>> ----------------------------------------
>> METRO SYSTEMS CO., LTD
>>
>> Yusuke Iida
>> Mail: yusk.iida at gmail.com
>> ----------------------------------------
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
----------------------------------------
METRO SYSTEMS CO., LTD

Yusuke Iida
Mail: yusk.iida at gmail.com
----------------------------------------