[Pacemaker] corosync [TOTEM ] Process pause detected for 577 ms

Jan Friesse jfriesse at redhat.com
Wed Apr 30 11:07:21 EDT 2014


Emmanuel,

emmanuel segura napsal(a):
> Hello Jan,
> 
> Thanks for the explanation, but i saw this in my log.
> 
> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> 
> corosync [TOTEM ] Process pause detected for 577 ms, flushing membership
> messages.
> corosync [TOTEM ] Process pause detected for 538 ms, flushing membership
> messages.
> corosync [TOTEM ] A processor failed, forming new configuration.
> corosync [CLM   ] CLM CONFIGURATION CHANGE
> corosync [CLM   ] New Configuration:
> corosync [CLM   ]       r(0) ip(10.xxx.xxx.xxx)
> corosync [CLM   ] Members Left:
> corosync [CLM   ]       r(0) ip(10.xxx.xxx.xxx)
> corosync [CLM   ] Members Joined:
> corosync [pcmk  ] notice: pcmk_peer_update: Transitional membership event
> on ring 6904: memb=1, new=0, lost=1
> corosync [pcmk  ] info: pcmk_peer_update: memb: node01 891257354
> corosync [pcmk  ] info: pcmk_peer_update: lost: node02 874480
> 
> :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> 
> when this happen, corosync needs to retransmit the toten?
> from what i understood the toten need to be retransmit, but in my case a
> new configuration was formed
> 
> This my corosync version
> 
> corosync-1.3.3-0.3.1
> 

1.3.3 is unsupported for ages. Please upgrade to newest 1.4.6 (if you
are using cman) or 2.3.3 (if you are not using cman). Also please change
your pacemaker to not use plugin (upgrade to 2.3.3 will solve it
automatically, because plugins in corosync 2.x are no longer support).

Regards,
  Honza


> Thanks
> 
> 
> 2014-04-30 9:42 GMT+02:00 Jan Friesse <jfriesse at redhat.com>:
> 
>> Emmanuel,
>> there is no need to trigger fencing on "Process pause detected...".
>>
>> Also fencing is not triggered if membership didn't changed. So let's say
>> token was lost but during gather state all nodes replied, then there is
>> no change of membership and no need to fence.
>>
>> I believe your situation was:
>> - one node is little overloaded
>> - token lost
>> - overload over
>> - gather state
>> - every node is alive
>> -> no fencing
>>
>> Regards,
>>   Honza
>>
>> emmanuel segura napsal(a):
>>> Hello Jan,
>>>
>>> Forget the last mail:
>>>
>>> Hello Jan,
>>>
>>> I found this problem in two hp blade system and the strange thing is the
>>> fencing was not triggered :(, but it's enabled
>>>
>>>
>>> 2014-04-25 18:36 GMT+02:00 emmanuel segura <emi2fast at gmail.com>:
>>>
>>>> Hello Jan,
>>>>
>>>> I found this problem in two hp blade system and the strange thing is the
>>>> fencing was triggered :(
>>>>
>>>>
>>>> 2014-04-25 9:27 GMT+02:00 Jan Friesse <jfriesse at redhat.com>:
>>>>
>>>> Emanuel,
>>>>>
>>>>> emmanuel segura napsal(a):
>>>>>
>>>>>  Hello List,
>>>>>>
>>>>>> I have this two lines in my cluster logs, somebody can help to know
>> what
>>>>>> this means.
>>>>>>
>>>>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>>>>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>>>>>> ::::::::::::::
>>>>>>
>>>>>> corosync [TOTEM ] Process pause detected for 577 ms, flushing
>> membership
>>>>>> messages.
>>>>>> corosync [TOTEM ] Process pause detected for 538 ms, flushing
>> membership
>>>>>> messages.
>>>>>>
>>>>>
>>>>> Corosync internally checks gap between member join messages. If such
>> gap
>>>>> is > token/2, it means, that corosync was not scheduled to run by
>> kernel
>>>>> for too long, and it should discard membership messages.
>>>>>
>>>>> Original intend was to detect paused process. If pause is detected,
>> it's
>>>>> better to discard old membership messages and initiate new query then
>>>>> sending outdated view.
>>>>>
>>>>> So there are various reasons why this is triggered, but today it's
>>>>> usually VM with overloaded host machine.
>>>>>
>>>>>
>>>>>
>>>>>  corosync [TOTEM ] A processor failed, forming new configuration.
>>>>>>
>>>>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>>>>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>>>>>> ::::::::::::::
>>>>>>
>>>>>> I know the "corosync [TOTEM ] A processor failed, forming new
>>>>>> configuration" message is when the toten package is definitely lost.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>> Regards,
>>>>>   Honza
>>>>>
>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> esta es mi vida e me la vivo hasta que dios quiera
>>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
> 
> 
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 





More information about the Pacemaker mailing list