[ClusterLabs] Corosync main process was not scheduled for 115935.2266 ms (threshold is 800.0000 ms). Consider token timeout increase.
Adam Spiers
aspiers at suse.com
Wed Feb 24 12:34:42 UTC 2016
Hi all,
Jan Friesse <jfriesse at redhat.com> wrote:
> >>>There is really no help. It's best to make sure corosync is scheduled
> >regularly.
> >I may sound silly, but how can I do it?
>
> It's actually very hard to say. Pauses like 30 sec is really unusual
> and shouldn't happen (specially with RT scheduling). It is usually
> happening on VM where host is overcommitted.
It's funny you are discussing this during the same period where my
team is seeing this happen fairly regularly within VMs on a host which
is overcommitted. In other words, I can confirm Jan's statement above
is true.
Like Konstiantyn, we have also sometimes seen no fencing occur as a
result of these pauses, e.g.
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [MAIN ] Corosync main process was not scheduled for 7343.1909 ms (threshold is 4000.0000 ms). Consider token timeout increase.
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [TOTEM ] A processor failed, forming new configuration.
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] CLM CONFIGURATION CHANGE
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] New Configuration:
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] #011r(0) ip(192.168.2.82)
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] #011r(0) ip(192.168.2.84)
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] Members Left:
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] Members Joined:
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 32: memb=2, new=0, lost=0
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [pcmk ] info: pcmk_peer_update: memb: d52-54-77-77-77-01 1084752466
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [pcmk ] info: pcmk_peer_update: memb: d52-54-77-77-77-02 1084752468
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] CLM CONFIGURATION CHANGE
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] New Configuration:
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] #011r(0) ip(192.168.2.82)
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] #011r(0) ip(192.168.2.84)
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] Members Left:
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] Members Joined:
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 32: memb=2, new=0, lost=0
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [pcmk ] info: pcmk_peer_update: MEMB: d52-54-77-77-77-01 1084752466
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [pcmk ] info: pcmk_peer_update: MEMB: d52-54-77-77-77-02 1084752468
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CPG ] chosen downlist: sender r(0) ip(192.168.2.82) ; members(old:2 left:0)
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [MAIN ] Completed service synchronization, ready to provide service.
I don't understand why it claims a processor failed, forming a new
configuration, when the configuration appears no different from
before: no members joined or left. Can anyone explain this?
More information about the Users
mailing list