[Pacemaker] corosync [TOTEM ] Process pause detected for 577 ms

emmanuel segura emi2fast at gmail.com
Mon May 5 10:39:58 EDT 2014


Hello Jan,

Thanks very much for your help :), i will try to read the patches you posted

Emmanuel


2014-05-05 16:14 GMT+02:00 Jan Friesse <jfriesse at redhat.com>:

> Emmanuel,
>
> emmanuel segura napsal(a):
> > Helllo Jan,
> >
> > I'm using corosync+pacemaker on Sles 11 Sp1 and this is a critical
> system,
>
> Oh, ok.
>
> > i don't think i'll get the authorization for upgrade system, but i would
> > like to know if there is any bug about this issue in my current corosync
> > release.
>
> This is hard to say. Suse guys probably included many patches, so it
> would make sense to try to contact Suse support.
>
> After very very quick look to git, following patches may be related:
> 559d4083ed8355fe83f275e53b9c8f52a91694b2,
> 02c5dffa5bb8579c223006fa1587de9ba7409a3d,
> 64d0e5ace025cc929e42896c5d6beb3ef75b8244,
> 6fae42ba72006941c1fde99616ea30f4f10ebb38,
> c7e686181bcd0e975b09725502bef02c7d0c338a.
>
> But still keep in mind that between latest 1.3.6 (what I believe is more
> or less what you are using) and current origin/flatiron are 118 patches...
>
> Regards,
>   Honza
>
> >
> > Thanks
> > Emmanuel
> >
> >
> > 2014-04-30 17:07 GMT+02:00 Jan Friesse <jfriesse at redhat.com>:
> >
> >> Emmanuel,
> >>
> >> emmanuel segura napsal(a):
> >>> Hello Jan,
> >>>
> >>> Thanks for the explanation, but i saw this in my log.
> >>>
> >>>
> >>
> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> >>>
> >>> corosync [TOTEM ] Process pause detected for 577 ms, flushing
> membership
> >>> messages.
> >>> corosync [TOTEM ] Process pause detected for 538 ms, flushing
> membership
> >>> messages.
> >>> corosync [TOTEM ] A processor failed, forming new configuration.
> >>> corosync [CLM   ] CLM CONFIGURATION CHANGE
> >>> corosync [CLM   ] New Configuration:
> >>> corosync [CLM   ]       r(0) ip(10.xxx.xxx.xxx)
> >>> corosync [CLM   ] Members Left:
> >>> corosync [CLM   ]       r(0) ip(10.xxx.xxx.xxx)
> >>> corosync [CLM   ] Members Joined:
> >>> corosync [pcmk  ] notice: pcmk_peer_update: Transitional membership
> event
> >>> on ring 6904: memb=1, new=0, lost=1
> >>> corosync [pcmk  ] info: pcmk_peer_update: memb: node01 891257354
> >>> corosync [pcmk  ] info: pcmk_peer_update: lost: node02 874480
> >>>
> >>>
> >>
> :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> >>>
> >>> when this happen, corosync needs to retransmit the toten?
> >>> from what i understood the toten need to be retransmit, but in my case
> a
> >>> new configuration was formed
> >>>
> >>> This my corosync version
> >>>
> >>> corosync-1.3.3-0.3.1
> >>>
> >>
> >> 1.3.3 is unsupported for ages. Please upgrade to newest 1.4.6 (if you
> >> are using cman) or 2.3.3 (if you are not using cman). Also please change
> >> your pacemaker to not use plugin (upgrade to 2.3.3 will solve it
> >> automatically, because plugins in corosync 2.x are no longer support).
> >>
> >> Regards,
> >>   Honza
> >>
> >>
> >>> Thanks
> >>>
> >>>
> >>> 2014-04-30 9:42 GMT+02:00 Jan Friesse <jfriesse at redhat.com>:
> >>>
> >>>> Emmanuel,
> >>>> there is no need to trigger fencing on "Process pause detected...".
> >>>>
> >>>> Also fencing is not triggered if membership didn't changed. So let's
> say
> >>>> token was lost but during gather state all nodes replied, then there
> is
> >>>> no change of membership and no need to fence.
> >>>>
> >>>> I believe your situation was:
> >>>> - one node is little overloaded
> >>>> - token lost
> >>>> - overload over
> >>>> - gather state
> >>>> - every node is alive
> >>>> -> no fencing
> >>>>
> >>>> Regards,
> >>>>   Honza
> >>>>
> >>>> emmanuel segura napsal(a):
> >>>>> Hello Jan,
> >>>>>
> >>>>> Forget the last mail:
> >>>>>
> >>>>> Hello Jan,
> >>>>>
> >>>>> I found this problem in two hp blade system and the strange thing is
> >> the
> >>>>> fencing was not triggered :(, but it's enabled
> >>>>>
> >>>>>
> >>>>> 2014-04-25 18:36 GMT+02:00 emmanuel segura <emi2fast at gmail.com>:
> >>>>>
> >>>>>> Hello Jan,
> >>>>>>
> >>>>>> I found this problem in two hp blade system and the strange thing is
> >> the
> >>>>>> fencing was triggered :(
> >>>>>>
> >>>>>>
> >>>>>> 2014-04-25 9:27 GMT+02:00 Jan Friesse <jfriesse at redhat.com>:
> >>>>>>
> >>>>>> Emanuel,
> >>>>>>>
> >>>>>>> emmanuel segura napsal(a):
> >>>>>>>
> >>>>>>>  Hello List,
> >>>>>>>>
> >>>>>>>> I have this two lines in my cluster logs, somebody can help to
> know
> >>>> what
> >>>>>>>> this means.
> >>>>>>>>
> >>>>>>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>>>> ::::::::::::::
> >>>>>>>>
> >>>>>>>> corosync [TOTEM ] Process pause detected for 577 ms, flushing
> >>>> membership
> >>>>>>>> messages.
> >>>>>>>> corosync [TOTEM ] Process pause detected for 538 ms, flushing
> >>>> membership
> >>>>>>>> messages.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Corosync internally checks gap between member join messages. If
> such
> >>>> gap
> >>>>>>> is > token/2, it means, that corosync was not scheduled to run by
> >>>> kernel
> >>>>>>> for too long, and it should discard membership messages.
> >>>>>>>
> >>>>>>> Original intend was to detect paused process. If pause is detected,
> >>>> it's
> >>>>>>> better to discard old membership messages and initiate new query
> then
> >>>>>>> sending outdated view.
> >>>>>>>
> >>>>>>> So there are various reasons why this is triggered, but today it's
> >>>>>>> usually VM with overloaded host machine.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>  corosync [TOTEM ] A processor failed, forming new configuration.
> >>>>>>>>
> >>>>>>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>>>> ::::::::::::::
> >>>>>>>>
> >>>>>>>> I know the "corosync [TOTEM ] A processor failed, forming new
> >>>>>>>> configuration" message is when the toten package is definitely
> lost.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>>
> >>>>>>> Regards,
> >>>>>>>   Honza
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>>>>>
> >>>>>>>> Project Home: http://www.clusterlabs.org
> >>>>>>>> Getting started:
> >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>>>>>> Bugs: http://bugs.clusterlabs.org
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>>>>
> >>>>>>> Project Home: http://www.clusterlabs.org
> >>>>>>> Getting started:
> >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>>>>> Bugs: http://bugs.clusterlabs.org
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> esta es mi vida e me la vivo hasta que dios quiera
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>>
> >>>>> Project Home: http://www.clusterlabs.org
> >>>>> Getting started:
> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>>> Bugs: http://bugs.clusterlabs.org
> >>>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>
> >>>> Project Home: http://www.clusterlabs.org
> >>>> Getting started:
> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >>>
> >>
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >>
> >
> >
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140505/8c7a3a30/attachment-0003.html>


More information about the Pacemaker mailing list