[Pacemaker] corosync [TOTEM ] Process pause detected for 577 ms

Jan Friesse jfriesse at redhat.com
Wed Apr 30 03:42:34 EDT 2014


Emmanuel,
there is no need to trigger fencing on "Process pause detected...".

Also fencing is not triggered if membership didn't changed. So let's say
token was lost but during gather state all nodes replied, then there is
no change of membership and no need to fence.

I believe your situation was:
- one node is little overloaded
- token lost
- overload over
- gather state
- every node is alive
-> no fencing

Regards,
  Honza

emmanuel segura napsal(a):
> Hello Jan,
> 
> Forget the last mail:
> 
> Hello Jan,
> 
> I found this problem in two hp blade system and the strange thing is the
> fencing was not triggered :(, but it's enabled
> 
> 
> 2014-04-25 18:36 GMT+02:00 emmanuel segura <emi2fast at gmail.com>:
> 
>> Hello Jan,
>>
>> I found this problem in two hp blade system and the strange thing is the
>> fencing was triggered :(
>>
>>
>> 2014-04-25 9:27 GMT+02:00 Jan Friesse <jfriesse at redhat.com>:
>>
>> Emanuel,
>>>
>>> emmanuel segura napsal(a):
>>>
>>>  Hello List,
>>>>
>>>> I have this two lines in my cluster logs, somebody can help to know what
>>>> this means.
>>>>
>>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>>>> ::::::::::::::
>>>>
>>>> corosync [TOTEM ] Process pause detected for 577 ms, flushing membership
>>>> messages.
>>>> corosync [TOTEM ] Process pause detected for 538 ms, flushing membership
>>>> messages.
>>>>
>>>
>>> Corosync internally checks gap between member join messages. If such gap
>>> is > token/2, it means, that corosync was not scheduled to run by kernel
>>> for too long, and it should discard membership messages.
>>>
>>> Original intend was to detect paused process. If pause is detected, it's
>>> better to discard old membership messages and initiate new query then
>>> sending outdated view.
>>>
>>> So there are various reasons why this is triggered, but today it's
>>> usually VM with overloaded host machine.
>>>
>>>
>>>
>>>  corosync [TOTEM ] A processor failed, forming new configuration.
>>>>
>>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>>>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>>>> ::::::::::::::
>>>>
>>>> I know the "corosync [TOTEM ] A processor failed, forming new
>>>> configuration" message is when the toten package is definitely lost.
>>>>
>>>> Thanks
>>>>
>>>>
>>> Regards,
>>>   Honza
>>>
>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>>
>> --
>> esta es mi vida e me la vivo hasta que dios quiera
>>
> 
> 
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 





More information about the Pacemaker mailing list