[Pacemaker] Pacemaker/corosync freeze

Jan Friesse jfriesse at redhat.com
Thu Mar 13 05:03:28 EDT 2014


...

>>>>
>>>> Also can you please try to set debug: on in corosync.conf and paste
>>>> full corosync.log then?
>>>
>>> I set debug to on, and did a few restarts but could not reproduce the issue
>> yet - will post the logs as soon as I manage to reproduce.
>>>
>>
>> Perfect.
>>
>> Another option you can try to set is netmtu (1200 is usually safe).
> 
> Finally I was able to reproduce the issue.
> I restarted node ctsip2 at 21:10:14, and CPU went 100% immediately (not when node was up again).
> 
> The corosync log with debug on is available at: http://pastebin.com/kTpDqqtm
> 
> 
> To be honest, I had to wait much longer for this reproduction as before, even though there was no change in the corosync configuration - just potentially some system updates. But anyway, the issue is unfortunately still there.
> Previously, when this issue came, cpu was at 100% on all nodes - this time only on ctmgr, which was the DC...
> 
> I hope you can find some useful details in the log.
> 

Attila,
what seems to be interesting is

Configuration ERRORs found during PE processing.  Please run "crm_verify
-L" to identify issues.

I'm unsure how much is this problem but I'm really not pacemaker expert.

Anyway, I have theory what may happening and it looks like related with
IPC (and probably not related to network). But to make sure we will not
try fixing already fixed bug, can you please build:
- New libqb (0.17.0). There are plenty of fixes in IPC
- Corosync 2.3.3 (already plenty IPC fixes)
- And maybe also newer pacemaker

I know you were not very happy using hand-compiled sources, but please
give them at least a try.

Thanks,
  Honza

> Thanks,
> Attila
> 
> 
> 
>>
>> Regards,
>>   Honza
>>
>>>
>>> There are also a few things that might or might not be related:
>>>
>>> 1) Whenever I want to edit the configuration with "crm configure edit",

...




More information about the Pacemaker mailing list