[Pacemaker] corosync [TOTEM ] Process pause detected for 577 ms

Wed Apr 30 13:14:04 EDT 2014

Helllo Jan,

I'm using corosync+pacemaker on Sles 11 Sp1 and this is a critical system,
i don't think i'll get the authorization for upgrade system, but i would
like to know if there is any bug about this issue in my current corosync
release.

Thanks
Emmanuel

2014-04-30 17:07 GMT+02:00 Jan Friesse <jfriesse at redhat.com>:

> Emmanuel,
>
> emmanuel segura napsal(a):
> > Hello Jan,
> >
> > Thanks for the explanation, but i saw this in my log.
> >
> >
> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> >
> > corosync [TOTEM ] Process pause detected for 577 ms, flushing membership
> > messages.
> > corosync [TOTEM ] Process pause detected for 538 ms, flushing membership
> > messages.
> > corosync [TOTEM ] A processor failed, forming new configuration.
> > corosync [CLM   ] CLM CONFIGURATION CHANGE
> > corosync [CLM   ] New Configuration:
> > corosync [CLM   ]       r(0) ip(10.xxx.xxx.xxx)
> > corosync [CLM   ] Members Left:
> > corosync [CLM   ]       r(0) ip(10.xxx.xxx.xxx)
> > corosync [CLM   ] Members Joined:
> > corosync [pcmk  ] notice: pcmk_peer_update: Transitional membership event
> > on ring 6904: memb=1, new=0, lost=1
> > corosync [pcmk  ] info: pcmk_peer_update: memb: node01 891257354
> > corosync [pcmk  ] info: pcmk_peer_update: lost: node02 874480
> >
> >
> :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> >
> > when this happen, corosync needs to retransmit the toten?
> > from what i understood the toten need to be retransmit, but in my case a
> > new configuration was formed
> >
> > This my corosync version
> >
> > corosync-1.3.3-0.3.1
> >
>
> 1.3.3 is unsupported for ages. Please upgrade to newest 1.4.6 (if you
> are using cman) or 2.3.3 (if you are not using cman). Also please change
> your pacemaker to not use plugin (upgrade to 2.3.3 will solve it
> automatically, because plugins in corosync 2.x are no longer support).
>
> Regards,
>   Honza
>
>
> > Thanks
> >
> >
> > 2014-04-30 9:42 GMT+02:00 Jan Friesse <jfriesse at redhat.com>:
> >
> >> Emmanuel,
> >> there is no need to trigger fencing on "Process pause detected...".
> >>
> >> Also fencing is not triggered if membership didn't changed. So let's say
> >> token was lost but during gather state all nodes replied, then there is
> >> no change of membership and no need to fence.
> >>
> >> I believe your situation was:
> >> - one node is little overloaded
> >> - token lost
> >> - overload over
> >> - gather state
> >> - every node is alive
> >> -> no fencing
> >>
> >> Regards,
> >>   Honza
> >>
> >> emmanuel segura napsal(a):
> >>> Hello Jan,
> >>>
> >>> Forget the last mail:
> >>>
> >>> Hello Jan,
> >>>
> >>> I found this problem in two hp blade system and the strange thing is
> the
> >>> fencing was not triggered :(, but it's enabled
> >>>
> >>>
> >>> 2014-04-25 18:36 GMT+02:00 emmanuel segura <emi2fast at gmail.com>:
> >>>
> >>>> Hello Jan,
> >>>>
> >>>> I found this problem in two hp blade system and the strange thing is
> the
> >>>> fencing was triggered :(
> >>>>
> >>>>
> >>>> 2014-04-25 9:27 GMT+02:00 Jan Friesse <jfriesse at redhat.com>:
> >>>>
> >>>> Emanuel,
> >>>>>
> >>>>> emmanuel segura napsal(a):
> >>>>>
> >>>>>  Hello List,
> >>>>>>
> >>>>>> I have this two lines in my cluster logs, somebody can help to know
> >> what
> >>>>>> this means.
> >>>>>>
> >>>>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>> ::::::::::::::
> >>>>>>
> >>>>>> corosync [TOTEM ] Process pause detected for 577 ms, flushing
> >> membership
> >>>>>> messages.
> >>>>>> corosync [TOTEM ] Process pause detected for 538 ms, flushing
> >> membership
> >>>>>> messages.
> >>>>>>
> >>>>>
> >>>>> Corosync internally checks gap between member join messages. If such
> >> gap
> >>>>> is > token/2, it means, that corosync was not scheduled to run by
> >> kernel
> >>>>> for too long, and it should discard membership messages.
> >>>>>
> >>>>> Original intend was to detect paused process. If pause is detected,
> >> it's
> >>>>> better to discard old membership messages and initiate new query then
> >>>>> sending outdated view.
> >>>>>
> >>>>> So there are various reasons why this is triggered, but today it's
> >>>>> usually VM with overloaded host machine.
> >>>>>
> >>>>>
> >>>>>
> >>>>>  corosync [TOTEM ] A processor failed, forming new configuration.
> >>>>>>
> >>>>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>> ::::::::::::::
> >>>>>>
> >>>>>> I know the "corosync [TOTEM ] A processor failed, forming new
> >>>>>> configuration" message is when the toten package is definitely lost.
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>>
> >>>>> Regards,
> >>>>>   Honza
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>>>
> >>>>>> Project Home: http://www.clusterlabs.org
> >>>>>> Getting started:
> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>>>> Bugs: http://bugs.clusterlabs.org
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>>
> >>>>> Project Home: http://www.clusterlabs.org
> >>>>> Getting started:
> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>>> Bugs: http://bugs.clusterlabs.org
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> esta es mi vida e me la vivo hasta que dios quiera
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >>>
> >>
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >>
> >
> >
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140430/67d96ed1/attachment-0003.html>