[Pacemaker] The larger cluster is tested.

Mon Nov 11 23:03:20 UTC 2013

On 11 Nov 2013, at 11:48 pm, yusuke iida <yusk.iida at gmail.com> wrote:

> Execution of the graph was also checked.
> Since the number of pending(s) is restricted to 16 from the middle, it
> is judged that batch-limit is effective.
> Observing here, even if a job is restricted by batch-limit, two or
> more jobs are always fired(ed) in 1 second.
> These performed jobs return a result and the synchronous message of
> CIB generates them.
> The node which continued receiving a synchronous message processes
> there preferentially, and postpones an internal IPC message.
> I think that it caused timeout.

What load-threshold were you running this with?

I see this in the logs:
"Host vm10 supports a maximum of 4 jobs and throttle mode 0100.  New job limit is 1"

Have you set LRMD_MAX_CHILDREN=4 on these nodes?
I wouldn't recommend that for a single core VM.  I'd let the default of 2*cores be used.

Also, I'm not seeing "Extreme CIB load detected".  Are these still single core machines?
If so it would suggest that something about:

        if(cores == 1) {
            cib_max_cpu = 0.4;
        }
        if(throttle_load_target > 0.0 && throttle_load_target < cib_max_cpu) {
            cib_max_cpu = throttle_load_target;
        }

        if(load > 1.5 * cib_max_cpu) {
            /* Can only happen on machines with a low number of cores */
            crm_notice("Extreme %s detected: %f", desc, load);
            mode |= throttle_extreme;

is wrong.

What was load-threshold configured as?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131112/aa0eadfa/attachment-0004.sig>