[Pacemaker] The larger cluster is tested.

Tue Nov 12 18:58:01 EST 2013

Did you look at the load numbers in the logs?
The CPUs are being slammed for over 20 minutes.

The automatic tuning can only help so much, you're simply asking the cluster to do more work than it is capable of.
Giving more priority to cib operations the come via IPC is one option, but as I explained earlier, it comes at the cost of correctness.

Given the huge mismatch between the nodes' capacity and the tasks you're asking them to achieve, your best path forward is probably setting a load-threshold < 40% or a batch-limit <= 8.
Or we could try a patch like the one below if we think that the defaults are not aggressive enough.

diff --git a/crmd/throttle.c b/crmd/throttle.c
index d77195a..7636d4a 100644
--- a/crmd/throttle.c
+++ b/crmd/throttle.c
@@ -611,14 +611,14 @@ throttle_get_total_job_limit(int l)
         switch(r->mode) {
 
             case throttle_extreme:
-                if(limit == 0 || limit > peers/2) {
-                    limit = peers/2;
+                if(limit == 0 || limit > peers/4) {
+                    limit = QB_MAX(1, peers/4);
                 }
                 break;
 
             case throttle_high:
-                if(limit == 0 || limit > peers) {
-                    limit = peers;
+                if(limit == 0 || limit > peers/2) {
+                    limit = QB_MAX(1, peers/2);
                 }
                 break;
             default:

This may also be worthwhile:

diff --git a/crmd/throttle.c b/crmd/throttle.c
index d77195a..586513a 100644
--- a/crmd/throttle.c
+++ b/crmd/throttle.c
@@ -387,22 +387,36 @@ static bool throttle_io_load(float *load, unsigned int *blocked)
 }
 
 static enum throttle_state_e
-throttle_handle_load(float load, const char *desc)
+throttle_handle_load(float load, const char *desc, int cores)
 {
-    if(load > THROTTLE_FACTOR_HIGH * throttle_load_target) {
+    float adjusted_load = load;
+
+    if(cores <= 0) {
+        /* No adjusting of the supplied load value */
+
+    } else if(cores == 1) {
+        /* On a single core machine, a load of 1.0 is already too high */
+        adjusted_load = load * THROTTLE_FACTOR_MEDIUM;
+
+    } else {
+        /* Normalize the load to be per-core */
+        adjusted_load = load / cores;
+    }
+
+    if(adjusted_load > THROTTLE_FACTOR_HIGH * throttle_load_target) {
         crm_notice("High %s detected: %f", desc, load);
         return throttle_high;
 
-    } else if(load > THROTTLE_FACTOR_MEDIUM * throttle_load_target) {
+    } else if(adjusted_load > THROTTLE_FACTOR_MEDIUM * throttle_load_target) {
         crm_info("Moderate %s detected: %f", desc, load);
         return throttle_med;
 
-    } else if(load > THROTTLE_FACTOR_LOW * throttle_load_target) {
+    } else if(adjusted_load > THROTTLE_FACTOR_LOW * throttle_load_target) {
         crm_debug("Noticable %s detected: %f", desc, load);
         return throttle_low;
     }
 
-    crm_trace("Negligable %s detected: %f", desc, load);
+    crm_trace("Negligable %s detected: %f", desc, adjusted_load);
     return throttle_none;
 }
 
@@ -464,22 +478,12 @@ throttle_mode(void)
     }
 
     if(throttle_load_avg(&load)) {
-        float simple = load / cores;
-        mode |= throttle_handle_load(simple, "CPU load");
+        mode |= throttle_handle_load(load, "CPU load", cores);
     }
 
     if(throttle_io_load(&load, &blocked)) {
-        float blocked_ratio = 0.0;
-
-        mode |= throttle_handle_load(load, "IO load");
-
-        if(cores) {
-            blocked_ratio = blocked / cores;
-        } else {
-            blocked_ratio = blocked;
-        }
-
-        mode |= throttle_handle_load(blocked_ratio, "blocked IO ratio");
+        mode |= throttle_handle_load(load, "IO load", 0);
+        mode |= throttle_handle_load(blocked, "blocked IO ratio", cores);
     }
 
     if(mode & throttle_extreme) {




On 12 Nov 2013, at 3:25 pm, yusuke iida <yusk.iida at gmail.com> wrote:

> Hi, Andrew
> 
> I'm sorry.
> This report was a thing when two cores were assigned to the virtual machine.
> https://drive.google.com/file/d/0BwMFJItoO-fVdlIwTVdFOGRkQ0U/edit?usp=sharing
> 
> I'm sorry to be misleading.
> 
> This is the report acquired with one core.
> https://drive.google.com/file/d/0BwMFJItoO-fVSlo0dE0xMzNORGc/edit?usp=sharing
> 
> It does not define the LRMD_MAX_CHILDREN on any node.
> load-threshold is still default.
> cib_max_cpu is set to 0.4 by the following processing.
> 
>        if(cores == 1) {
>            cib_max_cpu = 0.4;
>        }
> 
> since -- if it exceeds 60%, it will be in the state of Extreme.
> Nov 08 11:08:31 [2390] vm01       crmd: (  throttle.c:441   )  notice:
> throttle_mode:        Extreme CIB load detected: 0.670000
> 
> From the state of a bit, DC is detecting that vm01 is in the state of Extreme.
> Nov 08 11:08:32 [2387] vm13       crmd: (  throttle.c:701   )   debug:
> throttle_update:     Host vm01 supports a maximum of 2 jobs and
> throttle mode 1000.  New job limit is 1
> 
> From the following log, a dynamic change of batch-limit also seems to
> process satisfactorily.
> # grep "throttle_get_total_job_limit" pacemaker.log
> (snip)
> Nov 08 11:08:31 [2387] vm13       crmd: (  throttle.c:629   )   trace:
> throttle_get_total_job_limit:    No change to batch-limit=0
> Nov 08 11:08:32 [2387] vm13       crmd: (  throttle.c:632   )   trace:
> throttle_get_total_job_limit:    Using batch-limit=8
> (snip)
> Nov 08 11:10:32 [2387] vm13       crmd: (  throttle.c:632   )   trace:
> throttle_get_total_job_limit:    Using batch-limit=16
> 
> The above shows that it is not solved even if it restricts the whole
> number of jobs by batch-limit.
> Are there any other methods of reducing a synchronous message?
> 
> Internal IPC message is not so much.
> Do not be able to handle even a little it on the way to handle the
> synchronization message?
> 
> Regards,
> Yusuke
> 
> 2013/11/12 Andrew Beekhof <andrew at beekhof.net>:
>> 
>> On 11 Nov 2013, at 11:48 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>> 
>>> Execution of the graph was also checked.
>>> Since the number of pending(s) is restricted to 16 from the middle, it
>>> is judged that batch-limit is effective.
>>> Observing here, even if a job is restricted by batch-limit, two or
>>> more jobs are always fired(ed) in 1 second.
>>> These performed jobs return a result and the synchronous message of
>>> CIB generates them.
>>> The node which continued receiving a synchronous message processes
>>> there preferentially, and postpones an internal IPC message.
>>> I think that it caused timeout.
>> 
>> What load-threshold were you running this with?
>> 
>> I see this in the logs:
>> "Host vm10 supports a maximum of 4 jobs and throttle mode 0100.  New job limit is 1"
>> 
>> Have you set LRMD_MAX_CHILDREN=4 on these nodes?
>> I wouldn't recommend that for a single core VM.  I'd let the default of 2*cores be used.
>> 
>> 
>> Also, I'm not seeing "Extreme CIB load detected".  Are these still single core machines?
>> If so it would suggest that something about:
>> 
>>        if(cores == 1) {
>>            cib_max_cpu = 0.4;
>>        }
>>        if(throttle_load_target > 0.0 && throttle_load_target < cib_max_cpu) {
>>            cib_max_cpu = throttle_load_target;
>>        }
>> 
>>        if(load > 1.5 * cib_max_cpu) {
>>            /* Can only happen on machines with a low number of cores */
>>            crm_notice("Extreme %s detected: %f", desc, load);
>>            mode |= throttle_extreme;
>> 
>> is wrong.
>> 
>> What was load-threshold configured as?
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> 
> 
> -- 
> ----------------------------------------
> METRO SYSTEMS CO., LTD
> 
> Yusuke Iida
> Mail: yusk.iida at gmail.com
> ----------------------------------------
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131113/34fd9e88/attachment-0003.sig>