[Pacemaker] The larger cluster is tested.

Tue Nov 5 01:48:41 UTC 2013

Hi, Andrew

I tested by this commitment.
https://github.com/beekhof/pacemaker/commit/145c782e432d8108ca865f994640cf5a62406363

However, the problem has not improved.
It seems that it will be preferentially processed since the message of
CPG is set as G_PRIORITY_MED.

I suggest that you lower the priority of CPG instead.
How is this?
https://github.com/yuusuke/pacemaker/commit/22a14318cc740b3043106609923f47039c3aa407

I did not find the method of lowering only the priority of the CPG
message of a CIB process.

Reports when the error came out were collected.
I want you to note that it is delayed that an IPC message is processed
as follows.

Nov 01 21:53:52 [9246] vm01       crmd: (cib_native.c:397   )   trace:
cib_native_perform_op_delegate:  Async call, returning 32
(snip)
Nov 01 21:55:57 [9241] vm01        cib: ( callbacks.c:688   )    info:
cib_process_request:     Forwarding cib_modify operation for section
status to master (origin=local/crmd/32)

Since size is large, I want you to download from the following.
https://drive.google.com/file/d/0BwMFJItoO-fVWDg1Sjc2WXltUjQ/edit?usp=sharing

Regards,
Yusuke

2013/10/31 Andrew Beekhof <andrew at beekhof.net>:
>
> On 29 Oct 2013, at 12:12 am, yusuke iida <yusk.iida at gmail.com> wrote:
>
>> Hi, Andrew
>>
>> I tested using following commit.
>> https://github.com/beekhof/pacemaker/commit/b6fa1e650f64b1ba73fdb143f41323aa8cb3544e
>>
>> However, timeout of operation has still occurred.
>>
>> I analyzed the log.
>>
>> I am noting that it is late that the ipc message transmitted to cib
>> from crmd of local is processed.
>> Since the CIB synchronous message by which the CIB process came from
>> the outside will have priority and will be processed, this happens?
>>
>>
>> I made the following corrections so that the priority of the message
>> which CIB processes might be changed.
>> In this case, timeout does not occur.
>>
>> diff --git a/lib/cluster/cpg.c b/lib/cluster/cpg.c
>> index 8522cbf..3a67998 100644
>> --- a/lib/cluster/cpg.c
>> +++ b/lib/cluster/cpg.c
>> @@ -212,7 +212,7 @@ pcmk_cpg_dispatch(gpointer user_data)
>>     int rc = 0;
>>     crm_cluster_t *cluster = (crm_cluster_t*) user_data;
>>
>> -    rc = cpg_dispatch(cluster->cpg_handle, CS_DISPATCH_ALL);
>> +    rc = cpg_dispatch(cluster->cpg_handle, CS_DISPATCH_ONE);
>>     if (rc != CS_OK) {
>>         crm_err("Connection to the CPG API failed: %s (%d)",
>> ais_error2text(rc), rc);
>>         cluster->cpg_handle = 0;
>> diff --git a/lib/common/mainloop.c b/lib/common/mainloop.c
>> index 18a67e6..d605288 100644
>> --- a/lib/common/mainloop.c
>> +++ b/lib/common/mainloop.c
>> @@ -482,7 +482,7 @@ gio_poll_dispatch_add(enum qb_loop_priority p,
>> int32_t fd, int32_t evts,
>>     adaptor->p = p;
>>     adaptor->is_used = QB_TRUE;
>>     adaptor->source =
>> -        g_io_add_watch_full(channel, G_PRIORITY_DEFAULT, evts,
>> gio_read_socket, adaptor,
>> +        g_io_add_watch_full(channel, G_PRIORITY_MEDIUM, evts,
>> gio_read_socket, adaptor,
>>                             gio_poll_destroy);
>>
>>     /* Now that mainloop now holds a reference to channel,
>>
>> I do not know this fix is correct.
>> Can't the comment to this correction be got?
>
> The CS_DISPATCH_ONE change looks ok: https://github.com/beekhof/pacemaker/commit/6384053
> Did you try with just that?  I'd like to avoid the mainloop priority change if possible.
>
>>
>> Regards,
>> Yusuke
>>
>> 2013/10/20 Andrew Beekhof <andrew at beekhof.net>:
>>>
>>> On 18/10/2013, at 10:12 PM, yusuke iida <yusk.iida at gmail.com> wrote:
>>>
>>>> Hi, Andrew
>>>>
>>>> Now, I am testing the configuration of one standby node and active node of 15.
>>>> About 10 Dummy resources are started per node.
>>>>
>>>> If all the nodes are started with this composition, before all the
>>>> resources start, it will take the time for about 20 minutes.
>>>>
>>>> And some resources have caused start timeout.
>>>> probe is performed all at once by all the nodes at a start-up.
>>>> The result is written in cib and synchronizes with all the nodes.
>>>> This processing requires very high load.
>>>> I think that timeout has occurred owing to it.
>>>
>>> More than likely, yes.
>>>
>>>>
>>>> I am very interested in whether this problem is solvable, if you use
>>>> throttle created now.
>>>
>>> I have been using it, I have found it more effective than batch-limit for bounding CPU usage and avoiding timeouts.
>>> I would be interested to hear your feedback if you have the time to do some testing.
>>>
>>>> When is throttle due to be merged into the repository of ClusterLabs?
>>>
>>> It is queued up behind a compatibility patch that is needed for some changes I made to the pacemaker-remote wire protocol.
>>>
>>>>
>>>> Best Regards,
>>>>
>>>> --
>>>> ----------------------------------------
>>>> METRO SYSTEMS CO., LTD
>>>>
>>>> Yusuke Iida
>>>> Mail: yusk.iida at gmail.com
>>>> ----------------------------------------
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>>
>> --
>> ----------------------------------------
>> METRO SYSTEMS CO., LTD
>>
>> Yusuke Iida
>> Mail: yusk.iida at gmail.com
>> ----------------------------------------
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
----------------------------------------
METRO SYSTEMS CO., LTD

Yusuke Iida
Mail: yusk.iida at gmail.com
----------------------------------------