[Pacemaker] Occasional error running ocf scripts
Bernd Schubert
bs_lists at aakef.fastmail.fm
Fri Aug 13 10:31:27 UTC 2010
> > 99% of the time, the resource will stop correctly, it is just on a few
> > occasions that I see an error like this.
> >
> > Is this a known problem, or can I generate extra logging to try help
> > debug?
>
> Never heard of it. That sounds quite serious. Yes, extra logging
> would be helpful. How often did that happen? Which releases do
> you run?
I reported it to this list - without any reply. Then also filled a bug report:
http://developerbugs.linux-foundation.org/show_bug.cgi?id=2458
Also without a reply so far.
I looked into lrmd code and it seems to only know what it passed as xml to it,
so unlikely to be a cluster-glue issue. Now it would be much easier to debug,
if lrmd would know about all resources and would know about required
parameters. It then could fail immediately without calling the RA. But that is
design problem.
IMHO, the issue was introced in pacemaker between 1.0.7 and 1.0.9, but I do
not the time to track it further down. For now we simply continue to use 1.0.7
(as I reported to the list before, 1.0.8 randomly fails to start resources, as
we typically have above 30, 60 or even 120 resources, we run then run into
random issues all the time...).
Cheers,
Bernd
--
Bernd Schubert
DataDirect Networks
More information about the Pacemaker
mailing list