[Pacemaker] corosync [TOTEM ] Process pause detected for 577 ms

emmanuel segura emi2fast at gmail.com
Wed Apr 30 09:12:21 EDT 2014


Hello Jan,

Thanks for the explanation, but i saw this in my log.

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

corosync [TOTEM ] Process pause detected for 577 ms, flushing membership
messages.
corosync [TOTEM ] Process pause detected for 538 ms, flushing membership
messages.
corosync [TOTEM ] A processor failed, forming new configuration.
corosync [CLM   ] CLM CONFIGURATION CHANGE
corosync [CLM   ] New Configuration:
corosync [CLM   ]       r(0) ip(10.xxx.xxx.xxx)
corosync [CLM   ] Members Left:
corosync [CLM   ]       r(0) ip(10.xxx.xxx.xxx)
corosync [CLM   ] Members Joined:
corosync [pcmk  ] notice: pcmk_peer_update: Transitional membership event
on ring 6904: memb=1, new=0, lost=1
corosync [pcmk  ] info: pcmk_peer_update: memb: node01 891257354
corosync [pcmk  ] info: pcmk_peer_update: lost: node02 874480

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

when this happen, corosync needs to retransmit the toten?
from what i understood the toten need to be retransmit, but in my case a
new configuration was formed

This my corosync version

corosync-1.3.3-0.3.1

Thanks


2014-04-30 9:42 GMT+02:00 Jan Friesse <jfriesse at redhat.com>:

> Emmanuel,
> there is no need to trigger fencing on "Process pause detected...".
>
> Also fencing is not triggered if membership didn't changed. So let's say
> token was lost but during gather state all nodes replied, then there is
> no change of membership and no need to fence.
>
> I believe your situation was:
> - one node is little overloaded
> - token lost
> - overload over
> - gather state
> - every node is alive
> -> no fencing
>
> Regards,
>   Honza
>
> emmanuel segura napsal(a):
> > Hello Jan,
> >
> > Forget the last mail:
> >
> > Hello Jan,
> >
> > I found this problem in two hp blade system and the strange thing is the
> > fencing was not triggered :(, but it's enabled
> >
> >
> > 2014-04-25 18:36 GMT+02:00 emmanuel segura <emi2fast at gmail.com>:
> >
> >> Hello Jan,
> >>
> >> I found this problem in two hp blade system and the strange thing is the
> >> fencing was triggered :(
> >>
> >>
> >> 2014-04-25 9:27 GMT+02:00 Jan Friesse <jfriesse at redhat.com>:
> >>
> >> Emanuel,
> >>>
> >>> emmanuel segura napsal(a):
> >>>
> >>>  Hello List,
> >>>>
> >>>> I have this two lines in my cluster logs, somebody can help to know
> what
> >>>> this means.
> >>>>
> >>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> >>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> >>>> ::::::::::::::
> >>>>
> >>>> corosync [TOTEM ] Process pause detected for 577 ms, flushing
> membership
> >>>> messages.
> >>>> corosync [TOTEM ] Process pause detected for 538 ms, flushing
> membership
> >>>> messages.
> >>>>
> >>>
> >>> Corosync internally checks gap between member join messages. If such
> gap
> >>> is > token/2, it means, that corosync was not scheduled to run by
> kernel
> >>> for too long, and it should discard membership messages.
> >>>
> >>> Original intend was to detect paused process. If pause is detected,
> it's
> >>> better to discard old membership messages and initiate new query then
> >>> sending outdated view.
> >>>
> >>> So there are various reasons why this is triggered, but today it's
> >>> usually VM with overloaded host machine.
> >>>
> >>>
> >>>
> >>>  corosync [TOTEM ] A processor failed, forming new configuration.
> >>>>
> >>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> >>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> >>>> ::::::::::::::
> >>>>
> >>>> I know the "corosync [TOTEM ] A processor failed, forming new
> >>>> configuration" message is when the toten package is definitely lost.
> >>>>
> >>>> Thanks
> >>>>
> >>>>
> >>> Regards,
> >>>   Honza
> >>>
> >>>
> >>>>
> >>>> _______________________________________________
> >>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>
> >>>> Project Home: http://www.clusterlabs.org
> >>>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>>
> >>>
> >>> _______________________________________________
> >>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >>>
> >>
> >>
> >>
> >> --
> >> esta es mi vida e me la vivo hasta que dios quiera
> >>
> >
> >
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140430/451d0445/attachment-0003.html>


More information about the Pacemaker mailing list