[Pacemaker] Pacemaker/corosync freeze

Attila Megyeri amegyeri at minerva-soft.com
Thu Mar 13 09:26:48 EDT 2014


> -----Original Message-----
> From: Attila Megyeri [mailto:amegyeri at minerva-soft.com]
> Sent: Thursday, March 13, 2014 1:45 PM
> To: The Pacemaker cluster resource manager; Andrew Beekhof
> Subject: Re: [Pacemaker] Pacemaker/corosync freeze
> 
> Hello,
> 
> > -----Original Message-----
> > From: Jan Friesse [mailto:jfriesse at redhat.com]
> > Sent: Thursday, March 13, 2014 10:03 AM
> > To: The Pacemaker cluster resource manager
> > Subject: Re: [Pacemaker] Pacemaker/corosync freeze
> >
> > ...
> >
> > >>>>
> > >>>> Also can you please try to set debug: on in corosync.conf and
> > >>>> paste full corosync.log then?
> > >>>
> > >>> I set debug to on, and did a few restarts but could not reproduce
> > >>> the issue
> > >> yet - will post the logs as soon as I manage to reproduce.
> > >>>
> > >>
> > >> Perfect.
> > >>
> > >> Another option you can try to set is netmtu (1200 is usually safe).
> > >
> > > Finally I was able to reproduce the issue.
> > > I restarted node ctsip2 at 21:10:14, and CPU went 100% immediately
> > > (not
> > when node was up again).
> > >
> > > The corosync log with debug on is available at:
> > > http://pastebin.com/kTpDqqtm
> > >
> > >
> > > To be honest, I had to wait much longer for this reproduction as
> > > before,
> > even though there was no change in the corosync configuration - just
> > potentially some system updates. But anyway, the issue is
> > unfortunately still there.
> > > Previously, when this issue came, cpu was at 100% on all nodes -
> > > this time
> > only on ctmgr, which was the DC...
> > >
> > > I hope you can find some useful details in the log.
> > >
> >
> > Attila,
> > what seems to be interesting is
> >
> > Configuration ERRORs found during PE processing.  Please run "crm_verify -
> L"
> > to identify issues.
> >
> > I'm unsure how much is this problem but I'm really not pacemaker expert.
> 
> Perhaps Andrew could comment on that. Any idea?
> 
> 
> >
> > Anyway, I have theory what may happening and it looks like related
> > with IPC (and probably not related to network). But to make sure we
> > will not try fixing already fixed bug, can you please build:
> > - New libqb (0.17.0). There are plenty of fixes in IPC
> > - Corosync 2.3.3 (already plenty IPC fixes)
> > - And maybe also newer pacemaker
> >
> 
> I already use Corosync 2.3.3, built from source, and libqb-dev 0.16 from
> Ubuntu package.
> I am currently building libqb 0.17.0, will update you on the results.
> 
> In the meantime we had another freeze, which did not seem to be related to
> any restarts, but brought all coroync processes to 100%.
> Please check out the corosync.log, perhaps it is a different cause:
> http://pastebin.com/WMwzv0Rr
> 
> 
> In the meantime I will install the new libqb and send logs if we have further
> issues.
> 
> Thank you very much for your help!
> 
> Regards,
> Attila
> 

One more question:

If I install libqb 0.17.0 from source, do I need to rebuild corosync as well, or if it was built with libqb 0.16.0 it will be fine?

BTW, in the meantime I installed the new libqb on 3 of the 7 hosts, so I can see if it makes a difference. If I see crashes on the outdated ones, but not on the new ones, we are fine. :)

Thanks,

Attila







> 
> 
> > I know you were not very happy using hand-compiled sources, but please
> > give them at least a try.
> >
> > Thanks,
> >   Honza
> >
> > > Thanks,
> > > Attila
> > >
> > >
> > >
> > >>
> > >> Regards,
> > >>   Honza
> > >>
> > >>>
> > >>> There are also a few things that might or might not be related:
> > >>>
> > >>> 1) Whenever I want to edit the configuration with "crm configure
> > >>> edit",
> >
> > ...
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list