[Pacemaker] crmd segfault on opensuse 11.1

Stratos Zolotas strzol at gmail.com
Tue Oct 13 02:11:05 EDT 2009


On Mon, Oct 12, 2009 at 8:40 PM, Andrew Beekhof <andrew at beekhof.net> wrote:

> The crmd process looks to have stalled.
> Can you re-run with debug turned on in openais.conf?
>
> On Mon, Oct 12, 2009 at 6:09 PM, Stratos Zolotas <strzol at gmail.com> wrote:
> >
> >
> > On Mon, Oct 12, 2009 at 5:57 PM, Dejan Muhamedagic <dejanmm at fastmail.fm>
> > wrote:
> >>
> >> Hi,
> >>
> >> On Mon, Oct 12, 2009 at 03:32:15PM +0300, Stratos Zolotas wrote:
> >> > On Mon, Oct 12, 2009 at 3:10 PM, Dejan Muhamedagic
> >> > <dejanmm at fastmail.fm>wrote:
> >> >
> >> > > On Mon, Oct 12, 2009 at 02:57:29PM +0300, Stratos Zolotas wrote:
> >> > > > On Mon, Oct 12, 2009 at 2:51 PM, Dejan Muhamedagic
> >> > > > <dejanmm at fastmail.fm
> >> > > >wrote:
> >> > > >
> >> > > > > Hi,
> >> > > > >
> >> > > > > On Mon, Oct 12, 2009 at 02:42:25PM +0300, Stratos Zolotas wrote:
> >> > > > > > Hello to the list!!!
> >> > > > > >
> >> > > > > > This is my first question to the list and my first attempt to
> >> > > > > > built a
> >> > > two
> >> > > > > > node cluster on opensuse 11.1 with pacemaker 1.0.5 and openais
> >> > > 0.80.5, so
> >> > > > > > please forgive my lack of knowledge.
> >> > > > > >
> >> > > > > > I'm trying to build a Active/Passive scenario but i have the
> >> > > following on
> >> > > > > > both nodes:
> >> > > > > >
> >> > > > > > Oct 12 14:05:57 alpha kernel: crmd[30704]: segfault at 18 ip
> >> > > > > > 00007f7770526eee sp 00007fffc7379810 error 4 in
> >> > > > > > libplumb.so.2.0.0[7f777050a000+30000]
> >> > > > >
> >> > > > > It'd be excellent to see the backtrace, providing that there are
> >> > > > > core files. Please enable core file generation if there are
> none.
> >> > > > > If you don't know about backtraces, just use hb_report to
> capture
> >> > > > > it.
> >> > > > >
> >> > > > > > As result i'm getting the following:
> >> > > > >
> >> > > > > That's not the consequence of the previous problem.
> >> > > > >
> >> > > > > > alpha:/etc/ais # crm_mon --one-shot -V
> >> > > > > > crm_mon[30911]: 2009/10/12_14:39:00 ERROR: unpack_resources:
> No
> >> > > STONITH
> >> > > > > > resources have been defined
> >> > > > > > crm_mon[30911]: 2009/10/12_14:39:00 ERROR: unpack_resources:
> >> > > > > > Either
> >> > > > > > configure some or disable STONITH with the stonith-enabled
> >> > > > > > option
> >> > > > > > crm_mon[30911]: 2009/10/12_14:39:00 ERROR: unpack_resources:
> >> > > > > > NOTE:
> >> > > > > Clusters
> >> > > > > > with shared data need STONITH to ensure data integrity
> >> > > > >
> >> > > > > Thanks,
> >> > > > >
> >> > > > > Dejan
> >> > > > >
> >> > > > > >
> >> > > > > > ============
> >> > > > > > Last updated: Mon Oct 12 14:39:00 2009
> >> > > > > > Current DC: NONE
> >> > > > > > 0 Nodes configured, unknown expected votes
> >> > > > > > 0 Resources configured.
> >> > > > > > ============
> >> > > > > >
> >> > > > > > The errors are regarding the configuration (i have search
> about
> >> > > > > > them)
> >> > > > > that i
> >> > > > > > am unable to do at the moment because "crm configure" cannot
> >> > > > > > connect
> >> > > to
> >> > > > > the
> >> > > > > > cluster.
> >> > > > > >
> >> > > > > > Both nodes are running opensuse 11.1 x86_64 with the latest
> >> > > > > > updates
> >> > > and
> >> > > > > the
> >> > > > > > version that i said above.
> >> > > > > >
> >> > > > > > Any help is appreciated and please again forgive my lack of
> >> > > knowledge.
> >> > > > > >
> >> > > > > > Thank you in advance.
> >> > > > > >
> >> > > > > > Stratos.
> >> > > > >
> >> > > > > > _______________________________________________
> >> > > > > > Pacemaker mailing list
> >> > > > > > Pacemaker at oss.clusterlabs.org
> >> > > > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >> > > > >
> >> > > > >
> >> > > > > _______________________________________________
> >> > > > > Pacemaker mailing list
> >> > > > > Pacemaker at oss.clusterlabs.org
> >> > > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >> > > > >
> >> > > >
> >> > > >
> >> > > > Thank you for the immediate response. I know about the errors (I
> >> > > > have to
> >> > > > disable stonith on the config) but i cannot configure anything
> with
> >> > > > crm.
> >> > > > After commit i get something like "node did not respond"
> >> > > >
> >> > > > The problem is that there is no nodes as you can see after the
> >> > > > errors.
> >> > > >
> >> > > > I want to help to eliminate the problem, but i'm not a programmer.
> >> > > > So if
> >> > > you
> >> > > > can please guide me so i can execute hb_report and provide the
> >> > > > necessary
> >> > > > logs. When i have to execute hb_report and with what parametes?
> >> > >
> >> > > First check if you have core dumps:
> >> > >
> >> > > # ls -lR /var/lib/heartbeat/cores
> >> > >
> >> > > Then run
> >> > >
> >> > > # hb_report -f <time> -A -n "<nodes>" /tmp/problem-1
> >> > >
> >> > > Replace <time> with whichever time you started cluster at (say
> >> > > 13:00). <nodes> with a space separated list of nodes.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Dejan
> >> > >
> >> > > > Again please forgive my luck of knowledge (it is my first time
> with
> >> > > > clusters).
> >> > > >
> >> > > > Thanks again.
> >> > > >
> >> > > > Stratos.
> >> > >
> >> > > > _______________________________________________
> >> > > > Pacemaker mailing list
> >> > > > Pacemaker at oss.clusterlabs.org
> >> > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >> > >
> >> > >
> >> > > _______________________________________________
> >> > > Pacemaker mailing list
> >> > > Pacemaker at oss.clusterlabs.org
> >> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >> > >
> >> >
> >> > I don't think that there are any core dumps. The three folders
> returned
> >> > from
> >> > the command are empty.
> >> >
> >> > alpha:~ # ls -IR /var/lib/heartbeat/cores/
> >> > hacluster  nobody  root
> >> > alpha:~ #
> >> >
> >> > hb_report -f 15:27 -A -n "alpha bravo" -u root /root/problem-3
> >> >
> >> > returns
> >>
> >> The magic is:
> >>
> >> # ulimit -c unlimited
> >>
> >> You should put it somewhere so that it is run on boot. For now,
> >> just run it before /etc/init.d/openais start.
> >>
> >> > Password:
> >> > alpha: WARN: could not find the log file on alpha
> >> > Password: /etc/ha.d/shellfuncs: line 211: maketempdir: command not
> found
> >> > alpha: WARN: sorry, can't create temoary file for find_files
> >> > /etc/ha.d/shellfuncs: line 211: maketempdir: command not found
> >> > alpha: WARN: sorry, can't create temoary file for find_files
> >> > /etc/ha.d/shellfuncs: line 211: maketempdir: command not found
> >> > /etc/ha.d/shellfuncs: line 211: maketempdir: command not found
> >> > alpha: ERROR: cannot create temporary files
> >>
> >> This looks funny. Can you please show the package versions? And
> >> where did the packages come from?
> >>
> >> Thanks,
> >>
> >> Dejan
> >>
> >> > I have attached the generated folder as zip file, but with a quick
> look,
> >> > i
> >> > don't think that has something useful. Maybe its better to guide me
> how
> >> > to
> >> > produce dump core files.
> >> >
> >> > I have also tried without the -u option
> >> >
> >> > Thanks
> >> >
> >> > Stratos
> >> >
> >> >
> >> >
> >> > --
> >> > Kernel IT Solutions Ltd
> >> > http://www.kernelit.gr
> >> >
> >> > Cyclades Wireless Network
> >> > http://www.cywn.gr
> >>
> >>
> >> > _______________________________________________
> >> > Pacemaker mailing list
> >> > Pacemaker at oss.clusterlabs.org
> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >>
> >> _______________________________________________
> >> Pacemaker mailing list
> >> Pacemaker at oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > After i have reinstalled all the packages, i'm running for about half an
> > hour without segfault.
> >
> > crm_mon still reports:
> > ============
> > Last updated: Mon Oct 12 19:02:43 2009
> > Current DC: NONE
> > 0 Nodes configured, unknown expected votes
> > 0 Resources configured.
> > ============
> >
> > and when i try to "commit" a configuration (through crm configure) i get
> a
> > "Remote node did not respond"
> >
> > What i have to to do to make the nodes appear? (at least until a segfault
> > occurs and we have a core dump)
> >
> > I'm attaching my /var/log/messages from the first node after the last run
> of
> > openais.
> >
> >
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list
> > Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> >
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

After restarting in debug mode i have a segfault.

I'm attaching a core file found in /var/lib/heartbeat/cores/hacluster.

Hope it helps....

-- 
Kernel IT Solutions Ltd
http://www.kernelit.gr

Cyclades Wireless Network
http://www.cywn.gr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20091013/cf250af3/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: core.3091.zip
Type: application/zip
Size: 182718 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20091013/cf250af3/attachment-0003.zip>


More information about the Pacemaker mailing list