[Pacemaker] The larger cluster is tested.

Fri Nov 15 08:22:25 EST 2013

Hi, Andrew

Thanks for the suggestion variety.

I fixed and tested the value of batch-limit by 1, 2, 3, and 4 from the
beginning, in order to confirm what batch-limit is suitable.

It was something like the following in my environment.
Timeout did not occur batch-limit=1 and 2.
batch-limit = 3 was 1 timeout.
batch-limit = 4 was 5 timeout.

I think the limit is still high in; From the above results, "limit =
QB_MAX (1, peers / 4)".

So I have created a fix to fixed to 2 batch-limit when it became a
state of extreme.
https://github.com/yuusuke/pacemaker/commit/efe2d6ebc55be39b8be43de38e7662f039b61dec

Results of the test several times, it seems to work without problems.

When batch-limit is fixed and tested, below has a report.
batch-limit=1
https://drive.google.com/file/d/0BwMFJItoO-fVNk8wTGlYNjNnSHc/edit?usp=sharing
batch-limit=2
https://drive.google.com/file/d/0BwMFJItoO-fVTnc4bXY2YXF2M2M/edit?usp=sharing
batch-limit=3
https://drive.google.com/file/d/0BwMFJItoO-fVYl9Gbks2VlJMR0k/edit?usp=sharing
batch-limit=4
https://drive.google.com/file/d/0BwMFJItoO-fVZnJIazd5MFQ1aGs/edit?usp=sharing

The report at the time of making it operate by my test code is the following.
https://drive.google.com/file/d/0BwMFJItoO-fVbzB0NjFLeVY3Zmc/edit?usp=sharing

Regards,
Yusuke

2013/11/13 Andrew Beekhof <andrew at beekhof.net>:
> Did you look at the load numbers in the logs?
> The CPUs are being slammed for over 20 minutes.
>
> The automatic tuning can only help so much, you're simply asking the cluster to do more work than it is capable of.
> Giving more priority to cib operations the come via IPC is one option, but as I explained earlier, it comes at the cost of correctness.
>
> Given the huge mismatch between the nodes' capacity and the tasks you're asking them to achieve, your best path forward is probably setting a load-threshold < 40% or a batch-limit <= 8.
> Or we could try a patch like the one below if we think that the defaults are not aggressive enough.
>
> diff --git a/crmd/throttle.c b/crmd/throttle.c
> index d77195a..7636d4a 100644
> --- a/crmd/throttle.c
> +++ b/crmd/throttle.c
> @@ -611,14 +611,14 @@ throttle_get_total_job_limit(int l)
>          switch(r->mode) {
>
>              case throttle_extreme:
> -                if(limit == 0 || limit > peers/2) {
> -                    limit = peers/2;
> +                if(limit == 0 || limit > peers/4) {
> +                    limit = QB_MAX(1, peers/4);
>                  }
>                  break;
>
>              case throttle_high:
> -                if(limit == 0 || limit > peers) {
> -                    limit = peers;
> +                if(limit == 0 || limit > peers/2) {
> +                    limit = QB_MAX(1, peers/2);
>                  }
>                  break;
>              default:
>
> This may also be worthwhile:
>
> diff --git a/crmd/throttle.c b/crmd/throttle.c
> index d77195a..586513a 100644
> --- a/crmd/throttle.c
> +++ b/crmd/throttle.c
> @@ -387,22 +387,36 @@ static bool throttle_io_load(float *load, unsigned int *blocked)
>  }
>
>  static enum throttle_state_e
> -throttle_handle_load(float load, const char *desc)
> +throttle_handle_load(float load, const char *desc, int cores)
>  {
> -    if(load > THROTTLE_FACTOR_HIGH * throttle_load_target) {
> +    float adjusted_load = load;
> +
> +    if(cores <= 0) {
> +        /* No adjusting of the supplied load value */
> +
> +    } else if(cores == 1) {
> +        /* On a single core machine, a load of 1.0 is already too high */
> +        adjusted_load = load * THROTTLE_FACTOR_MEDIUM;
> +
> +    } else {
> +        /* Normalize the load to be per-core */
> +        adjusted_load = load / cores;
> +    }
> +
> +    if(adjusted_load > THROTTLE_FACTOR_HIGH * throttle_load_target) {
>          crm_notice("High %s detected: %f", desc, load);
>          return throttle_high;
>
> -    } else if(load > THROTTLE_FACTOR_MEDIUM * throttle_load_target) {
> +    } else if(adjusted_load > THROTTLE_FACTOR_MEDIUM * throttle_load_target) {
>          crm_info("Moderate %s detected: %f", desc, load);
>          return throttle_med;
>
> -    } else if(load > THROTTLE_FACTOR_LOW * throttle_load_target) {
> +    } else if(adjusted_load > THROTTLE_FACTOR_LOW * throttle_load_target) {
>          crm_debug("Noticable %s detected: %f", desc, load);
>          return throttle_low;
>      }
>
> -    crm_trace("Negligable %s detected: %f", desc, load);
> +    crm_trace("Negligable %s detected: %f", desc, adjusted_load);
>      return throttle_none;
>  }
>
> @@ -464,22 +478,12 @@ throttle_mode(void)
>      }
>
>      if(throttle_load_avg(&load)) {
> -        float simple = load / cores;
> -        mode |= throttle_handle_load(simple, "CPU load");
> +        mode |= throttle_handle_load(load, "CPU load", cores);
>      }
>
>      if(throttle_io_load(&load, &blocked)) {
> -        float blocked_ratio = 0.0;
> -
> -        mode |= throttle_handle_load(load, "IO load");
> -
> -        if(cores) {
> -            blocked_ratio = blocked / cores;
> -        } else {
> -            blocked_ratio = blocked;
> -        }
> -
> -        mode |= throttle_handle_load(blocked_ratio, "blocked IO ratio");
> +        mode |= throttle_handle_load(load, "IO load", 0);
> +        mode |= throttle_handle_load(blocked, "blocked IO ratio", cores);
>      }
>
>      if(mode & throttle_extreme) {
>
>
>
>
> On 12 Nov 2013, at 3:25 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>
>> Hi, Andrew
>>
>> I'm sorry.
>> This report was a thing when two cores were assigned to the virtual machine.
>> https://drive.google.com/file/d/0BwMFJItoO-fVdlIwTVdFOGRkQ0U/edit?usp=sharing
>>
>> I'm sorry to be misleading.
>>
>> This is the report acquired with one core.
>> https://drive.google.com/file/d/0BwMFJItoO-fVSlo0dE0xMzNORGc/edit?usp=sharing
>>
>> It does not define the LRMD_MAX_CHILDREN on any node.
>> load-threshold is still default.
>> cib_max_cpu is set to 0.4 by the following processing.
>>
>>        if(cores == 1) {
>>            cib_max_cpu = 0.4;
>>        }
>>
>> since -- if it exceeds 60%, it will be in the state of Extreme.
>> Nov 08 11:08:31 [2390] vm01       crmd: (  throttle.c:441   )  notice:
>> throttle_mode:        Extreme CIB load detected: 0.670000
>>
>> From the state of a bit, DC is detecting that vm01 is in the state of Extreme.
>> Nov 08 11:08:32 [2387] vm13       crmd: (  throttle.c:701   )   debug:
>> throttle_update:     Host vm01 supports a maximum of 2 jobs and
>> throttle mode 1000.  New job limit is 1
>>
>> From the following log, a dynamic change of batch-limit also seems to
>> process satisfactorily.
>> # grep "throttle_get_total_job_limit" pacemaker.log
>> (snip)
>> Nov 08 11:08:31 [2387] vm13       crmd: (  throttle.c:629   )   trace:
>> throttle_get_total_job_limit:    No change to batch-limit=0
>> Nov 08 11:08:32 [2387] vm13       crmd: (  throttle.c:632   )   trace:
>> throttle_get_total_job_limit:    Using batch-limit=8
>> (snip)
>> Nov 08 11:10:32 [2387] vm13       crmd: (  throttle.c:632   )   trace:
>> throttle_get_total_job_limit:    Using batch-limit=16
>>
>> The above shows that it is not solved even if it restricts the whole
>> number of jobs by batch-limit.
>> Are there any other methods of reducing a synchronous message?
>>
>> Internal IPC message is not so much.
>> Do not be able to handle even a little it on the way to handle the
>> synchronization message?
>>
>> Regards,
>> Yusuke
>>
>> 2013/11/12 Andrew Beekhof <andrew at beekhof.net>:
>>>
>>> On 11 Nov 2013, at 11:48 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>>>
>>>> Execution of the graph was also checked.
>>>> Since the number of pending(s) is restricted to 16 from the middle, it
>>>> is judged that batch-limit is effective.
>>>> Observing here, even if a job is restricted by batch-limit, two or
>>>> more jobs are always fired(ed) in 1 second.
>>>> These performed jobs return a result and the synchronous message of
>>>> CIB generates them.
>>>> The node which continued receiving a synchronous message processes
>>>> there preferentially, and postpones an internal IPC message.
>>>> I think that it caused timeout.
>>>
>>> What load-threshold were you running this with?
>>>
>>> I see this in the logs:
>>> "Host vm10 supports a maximum of 4 jobs and throttle mode 0100.  New job limit is 1"
>>>
>>> Have you set LRMD_MAX_CHILDREN=4 on these nodes?
>>> I wouldn't recommend that for a single core VM.  I'd let the default of 2*cores be used.
>>>
>>>
>>> Also, I'm not seeing "Extreme CIB load detected".  Are these still single core machines?
>>> If so it would suggest that something about:
>>>
>>>        if(cores == 1) {
>>>            cib_max_cpu = 0.4;
>>>        }
>>>        if(throttle_load_target > 0.0 && throttle_load_target < cib_max_cpu) {
>>>            cib_max_cpu = throttle_load_target;
>>>        }
>>>
>>>        if(load > 1.5 * cib_max_cpu) {
>>>            /* Can only happen on machines with a low number of cores */
>>>            crm_notice("Extreme %s detected: %f", desc, load);
>>>            mode |= throttle_extreme;
>>>
>>> is wrong.
>>>
>>> What was load-threshold configured as?
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>>
>> --
>> ----------------------------------------
>> METRO SYSTEMS CO., LTD
>>
>> Yusuke Iida
>> Mail: yusk.iida at gmail.com
>> ----------------------------------------
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
----------------------------------------
METRO SYSTEMS CO., LTD

Yusuke Iida
Mail: yusk.iida at gmail.com
----------------------------------------