[Pacemaker] Pacemaker/corosync freeze
Attila Megyeri
amegyeri at minerva-soft.com
Thu Mar 13 14:26:48 CET 2014
> -----Original Message-----
> From: Attila Megyeri [mailto:amegyeri at minerva-soft.com]
> Sent: Thursday, March 13, 2014 1:45 PM
> To: The Pacemaker cluster resource manager; Andrew Beekhof
> Subject: Re: [Pacemaker] Pacemaker/corosync freeze
>
> Hello,
>
> > -----Original Message-----
> > From: Jan Friesse [mailto:jfriesse at redhat.com]
> > Sent: Thursday, March 13, 2014 10:03 AM
> > To: The Pacemaker cluster resource manager
> > Subject: Re: [Pacemaker] Pacemaker/corosync freeze
> >
> > ...
> >
> > >>>>
> > >>>> Also can you please try to set debug: on in corosync.conf and
> > >>>> paste full corosync.log then?
> > >>>
> > >>> I set debug to on, and did a few restarts but could not reproduce
> > >>> the issue
> > >> yet - will post the logs as soon as I manage to reproduce.
> > >>>
> > >>
> > >> Perfect.
> > >>
> > >> Another option you can try to set is netmtu (1200 is usually safe).
> > >
> > > Finally I was able to reproduce the issue.
> > > I restarted node ctsip2 at 21:10:14, and CPU went 100% immediately
> > > (not
> > when node was up again).
> > >
> > > The corosync log with debug on is available at:
> > > http://pastebin.com/kTpDqqtm
> > >
> > >
> > > To be honest, I had to wait much longer for this reproduction as
> > > before,
> > even though there was no change in the corosync configuration - just
> > potentially some system updates. But anyway, the issue is
> > unfortunately still there.
> > > Previously, when this issue came, cpu was at 100% on all nodes -
> > > this time
> > only on ctmgr, which was the DC...
> > >
> > > I hope you can find some useful details in the log.
> > >
> >
> > Attila,
> > what seems to be interesting is
> >
> > Configuration ERRORs found during PE processing. Please run "crm_verify -
> L"
> > to identify issues.
> >
> > I'm unsure how much is this problem but I'm really not pacemaker expert.
>
> Perhaps Andrew could comment on that. Any idea?
>
>
> >
> > Anyway, I have theory what may happening and it looks like related
> > with IPC (and probably not related to network). But to make sure we
> > will not try fixing already fixed bug, can you please build:
> > - New libqb (0.17.0). There are plenty of fixes in IPC
> > - Corosync 2.3.3 (already plenty IPC fixes)
> > - And maybe also newer pacemaker
> >
>
> I already use Corosync 2.3.3, built from source, and libqb-dev 0.16 from
> Ubuntu package.
> I am currently building libqb 0.17.0, will update you on the results.
>
> In the meantime we had another freeze, which did not seem to be related to
> any restarts, but brought all coroync processes to 100%.
> Please check out the corosync.log, perhaps it is a different cause:
> http://pastebin.com/WMwzv0Rr
>
>
> In the meantime I will install the new libqb and send logs if we have further
> issues.
>
> Thank you very much for your help!
>
> Regards,
> Attila
>
One more question:
If I install libqb 0.17.0 from source, do I need to rebuild corosync as well, or if it was built with libqb 0.16.0 it will be fine?
BTW, in the meantime I installed the new libqb on 3 of the 7 hosts, so I can see if it makes a difference. If I see crashes on the outdated ones, but not on the new ones, we are fine. :)
Thanks,
Attila
>
>
> > I know you were not very happy using hand-compiled sources, but please
> > give them at least a try.
> >
> > Thanks,
> > Honza
> >
> > > Thanks,
> > > Attila
> > >
> > >
> > >
> > >>
> > >> Regards,
> > >> Honza
> > >>
> > >>>
> > >>> There are also a few things that might or might not be related:
> > >>>
> > >>> 1) Whenever I want to edit the configuration with "crm configure
> > >>> edit",
> >
> > ...
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list