[Pacemaker] corosync [TOTEM ] Process pause detected for 577 ms

Jan Friesse jfriesse at redhat.com
Mon May 5 10:14:30 EDT 2014


Emmanuel,

emmanuel segura napsal(a):
> Helllo Jan,
> 
> I'm using corosync+pacemaker on Sles 11 Sp1 and this is a critical system,

Oh, ok.

> i don't think i'll get the authorization for upgrade system, but i would
> like to know if there is any bug about this issue in my current corosync
> release.

This is hard to say. Suse guys probably included many patches, so it
would make sense to try to contact Suse support.

After very very quick look to git, following patches may be related:
559d4083ed8355fe83f275e53b9c8f52a91694b2,
02c5dffa5bb8579c223006fa1587de9ba7409a3d,
64d0e5ace025cc929e42896c5d6beb3ef75b8244,
6fae42ba72006941c1fde99616ea30f4f10ebb38,
c7e686181bcd0e975b09725502bef02c7d0c338a.

But still keep in mind that between latest 1.3.6 (what I believe is more
or less what you are using) and current origin/flatiron are 118 patches...

Regards,
  Honza

> 
> Thanks
> Emmanuel
> 
> 
> 2014-04-30 17:07 GMT+02:00 Jan Friesse <jfriesse at redhat.com>:
> 
>> Emmanuel,
>>
>> emmanuel segura napsal(a):
>>> Hello Jan,
>>>
>>> Thanks for the explanation, but i saw this in my log.
>>>
>>>
>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>>>
>>> corosync [TOTEM ] Process pause detected for 577 ms, flushing membership
>>> messages.
>>> corosync [TOTEM ] Process pause detected for 538 ms, flushing membership
>>> messages.
>>> corosync [TOTEM ] A processor failed, forming new configuration.
>>> corosync [CLM   ] CLM CONFIGURATION CHANGE
>>> corosync [CLM   ] New Configuration:
>>> corosync [CLM   ]       r(0) ip(10.xxx.xxx.xxx)
>>> corosync [CLM   ] Members Left:
>>> corosync [CLM   ]       r(0) ip(10.xxx.xxx.xxx)
>>> corosync [CLM   ] Members Joined:
>>> corosync [pcmk  ] notice: pcmk_peer_update: Transitional membership event
>>> on ring 6904: memb=1, new=0, lost=1
>>> corosync [pcmk  ] info: pcmk_peer_update: memb: node01 891257354
>>> corosync [pcmk  ] info: pcmk_peer_update: lost: node02 874480
>>>
>>>
>> :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>>>
>>> when this happen, corosync needs to retransmit the toten?
>>> from what i understood the toten need to be retransmit, but in my case a
>>> new configuration was formed
>>>
>>> This my corosync version
>>>
>>> corosync-1.3.3-0.3.1
>>>
>>
>> 1.3.3 is unsupported for ages. Please upgrade to newest 1.4.6 (if you
>> are using cman) or 2.3.3 (if you are not using cman). Also please change
>> your pacemaker to not use plugin (upgrade to 2.3.3 will solve it
>> automatically, because plugins in corosync 2.x are no longer support).
>>
>> Regards,
>>   Honza
>>
>>
>>> Thanks
>>>
>>>
>>> 2014-04-30 9:42 GMT+02:00 Jan Friesse <jfriesse at redhat.com>:
>>>
>>>> Emmanuel,
>>>> there is no need to trigger fencing on "Process pause detected...".
>>>>
>>>> Also fencing is not triggered if membership didn't changed. So let's say
>>>> token was lost but during gather state all nodes replied, then there is
>>>> no change of membership and no need to fence.
>>>>
>>>> I believe your situation was:
>>>> - one node is little overloaded
>>>> - token lost
>>>> - overload over
>>>> - gather state
>>>> - every node is alive
>>>> -> no fencing
>>>>
>>>> Regards,
>>>>   Honza
>>>>
>>>> emmanuel segura napsal(a):
>>>>> Hello Jan,
>>>>>
>>>>> Forget the last mail:
>>>>>
>>>>> Hello Jan,
>>>>>
>>>>> I found this problem in two hp blade system and the strange thing is
>> the
>>>>> fencing was not triggered :(, but it's enabled
>>>>>
>>>>>
>>>>> 2014-04-25 18:36 GMT+02:00 emmanuel segura <emi2fast at gmail.com>:
>>>>>
>>>>>> Hello Jan,
>>>>>>
>>>>>> I found this problem in two hp blade system and the strange thing is
>> the
>>>>>> fencing was triggered :(
>>>>>>
>>>>>>
>>>>>> 2014-04-25 9:27 GMT+02:00 Jan Friesse <jfriesse at redhat.com>:
>>>>>>
>>>>>> Emanuel,
>>>>>>>
>>>>>>> emmanuel segura napsal(a):
>>>>>>>
>>>>>>>  Hello List,
>>>>>>>>
>>>>>>>> I have this two lines in my cluster logs, somebody can help to know
>>>> what
>>>>>>>> this means.
>>>>>>>>
>>>>>>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>>>>>>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>>>>>>>> ::::::::::::::
>>>>>>>>
>>>>>>>> corosync [TOTEM ] Process pause detected for 577 ms, flushing
>>>> membership
>>>>>>>> messages.
>>>>>>>> corosync [TOTEM ] Process pause detected for 538 ms, flushing
>>>> membership
>>>>>>>> messages.
>>>>>>>>
>>>>>>>
>>>>>>> Corosync internally checks gap between member join messages. If such
>>>> gap
>>>>>>> is > token/2, it means, that corosync was not scheduled to run by
>>>> kernel
>>>>>>> for too long, and it should discard membership messages.
>>>>>>>
>>>>>>> Original intend was to detect paused process. If pause is detected,
>>>> it's
>>>>>>> better to discard old membership messages and initiate new query then
>>>>>>> sending outdated view.
>>>>>>>
>>>>>>> So there are various reasons why this is triggered, but today it's
>>>>>>> usually VM with overloaded host machine.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  corosync [TOTEM ] A processor failed, forming new configuration.
>>>>>>>>
>>>>>>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>>>>>>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>>>>>>>> ::::::::::::::
>>>>>>>>
>>>>>>>> I know the "corosync [TOTEM ] A processor failed, forming new
>>>>>>>> configuration" message is when the toten package is definitely lost.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>> Regards,
>>>>>>>   Honza
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> esta es mi vida e me la vivo hasta que dios quiera
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
> 
> 
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 





More information about the Pacemaker mailing list