[ClusterLabs] Corosync main process was not scheduled for 115935.2266 ms (threshold is 800.0000 ms). Consider token timeout increase.
Jan Friesse
jfriesse at redhat.com
Wed Feb 17 16:47:03 UTC 2016
Kostiantyn Ponomarenko napsal(a):
> Thank you for the suggestion.
> The OS is Debian 8. All Packages are build by myself.
> libqb-0.17.2
> corosync-2.3.5
> cluster-glue-1.0.12
> pacemaker-1.1.13
>
> It is really important for me to understand what is happening with the
> cluster under the high load.
For Corosync it's really simple. Corosync has to be scheduled by OS
regularly (more often than it's current token timeout) to be able to
detect membership changes and send/receive messages (cpg). If it's not
scheduled, membership is not up to date and eventually when it's finally
scheduled, it logs "process was not scheduled for ... ms" message
(warning for user) and if corosync was not scheduled for more than token
timeout "Process pause detected for ..." message is displayed and new
membership is formed. Other nodes (if scheduled regularly) sees non
regularly scheduled node as dead.
> So I would appreciate any help here =)
There is really no help. It's best to make sure corosync is scheduled
regularly.
>
>
> Thank you,
> Kostia
>
> On Wed, Feb 17, 2016 at 5:02 PM, Greg Woods <woods at ucar.edu> wrote:
>
>>
>> On Wed, Feb 17, 2016 at 3:30 AM, Kostiantyn Ponomarenko <
>> konstantin.ponomarenko at gmail.com> wrote:
>>
>>> Jan 29 07:00:43 B5-2U-205-LS corosync[2742]: [MAIN ] Corosync main
>>> process was not scheduled for 12483.7363 ms (threshold is 800.0000 ms).
>>> Consider token timeout increase.
>>
>>
>> I was having this problem as well. You don't say which version of corosync
>> you are running or on what OS, but on CentOS 7, there is an available
This update sets round robin realtime scheduling for corosync by
default. Same can be achieved without update by editing
/etc/sysconfig/corosync and changing COROSYNC_OPTIONS line to something
like COROSYNC_OPTIONS="-r"
Regards,
Honza
>> update that looks like it might address this (it has to do with
>> scheduling). We haven't gotten around to actually applying it yet because
>> it will require some down time on production services (we do have a few
>> node-locked VMs in our cluster), and it only happens when the system is
>> under very high load, so I can't say for sure the update will fix the
>> issue, but it might be worth looking into.
>>
>> --Greg
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Users
mailing list