[ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

Fri Sep 1 05:50:02 EDT 2017

Jan Friesse <jfriesse at redhat.com> writes:

> wferi at niif.hu writes:
>
>> Jan Friesse <jfriesse at redhat.com> writes:
>>
>>> wferi at niif.hu writes:
>>>
>>>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day
>>>> (in August; in May, it happened 0-2 times a day only, it's slowly
>>>> ramping up):
>>>>
>>>> vhbl08 corosync[3687]:   [TOTEM ] A processor failed, forming new configuration.
>>>> vhbl03 corosync[3890]:   [TOTEM ] A processor failed, forming new configuration.
>>>> vhbl07 corosync[3805]:   [MAIN  ] Corosync main process was not scheduled for 4317.0054 ms (threshold is 2400.0000 ms). Consider token timeout increase.
>>>
>>> ^^^ This is main problem you have to solve. It usually means that
>>> machine is too overloaded. [...]
>>
>> Before I start tracing the scheduler, I'd like to ask something: what
>> wakes up the Corosync main process periodically?  The token making a
>> full circle?  (Please forgive my simplistic understanding of the TOTEM
>> protocol.)  That would explain the recommendation in the log message,
>> but does not fit well with the overload assumption: totally idle nodes
>> could just as easily produce such warnings if there are no other regular
>> wakeup sources.  (I'm looking at timer_function_scheduler_timeout but I
>> know too little of libqb to decide.)
>
> Corosync main loop is based on epoll, so corosync is waked up ether by
> receiving data (network socket or unix socket for services) or when
> there are data to sent and socket is ready for non blocking write or
> after timeout. This timeout is exactly what you call other wakeup
> resource.
>
> Timeout is used for scheduling periodical tasks inside corosync.
>
> One of periodical tasks is scheduler pause detector. It is basically
> scheduled every (token_timeout / 3) msec and it computes diff between
> current and last time. If diff is larger than (token_timeout * 0.8) it
> displays warning.

Thanks, I can work with this.  I'll come back as soon as I find
something (or need further information :).

>>> As a start you can try what message say = Consider token timeout
>>> increase. Currently you have 3 seconds, in theory 6 second should be
>>> enough.
>>
>> It was probably high time I realized that token timeout is scaled
>> automatically when one has a nodelist.  When you say Corosync should
>> work OK with default settings up to 16 nodes, you assume this scaling is
>> in effect, don't you?  On the other hand, I've got no nodelist in the
>> config, but token = 3000, which is less than the default 1000+4*650 with
>> six nodes, and this will get worse as the cluster grows.
>
> This is described in corosync.conf man page (token_coefficient).

Yes, that's how I found out.  It also says: "This value is used only
when nodelist section is specified and contains at least 3 nodes."

> Final timeout is computed using totem.token as a base value. So if you
> set totem.token to 3000 it means that final totem timeout value is not
> 3000 but (3000 + 4 * 650).

But I've got no nodelist section, and according to the warning, my token
timeout is indeed 3 seconds, as you promptly deduced.  So the
documentation seems to be correct.
-- 
Thanks,
Feri