[Pacemaker] Occasional error running ocf scripts
Dejan Muhamedagic
dejanmm at fastmail.fm
Fri Aug 13 11:35:13 UTC 2010
Hi,
On Fri, Aug 13, 2010 at 10:29:43AM +0000, Chris Picton wrote:
> On Fri, 13 Aug 2010 12:06:27 +0200, Dejan Muhamedagic wrote:
>
> > Hi,
> >
> > On Fri, Aug 13, 2010 at 11:20:38AM +0200, Chris Picton wrote:
> >> Hi all
> >>
> >> I have seen the following behaviour on a few occasions in the past few
> >> months. It seems as if the resource script get called, but without the
> >> correct OCF_RESOURCE parameters.
> >>
> >> Aug 13 10:58:08 chris-test-01 Filesystem[24682]: [24688]: ERROR: Please
> >> set OCF_RESKEY_device to the device to be managed Aug 13 10:58:08
> >>
> >>
> >> 99% of the time, the resource will stop correctly, it is just on a few
> >> occasions that I see an error like this.
> >>
> >> Is this a known problem, or can I generate extra logging to try help
> >> debug?
> >
> > Never heard of it. That sounds quite serious. Yes, extra logging would
> > be helpful. How often did that happen? Which releases do you run?
> >
>
> I have probably seen it more than 10 times (on different resources,
> versions and servers) over the past year
>
> It has happened on versions 2.1.4, 3.0.0 and 3.0.3, but it happened more
> often on 2.1.4 (we had a server which would often get stonithed when
> stopping a resource for exactly this reason)
>
> I am currently testing a new CIB for my sql servers and it came up again,
> so I thought I would mail through my results.
>
> I will update to the latest rpm package from clusterlabs (I currently am
> running pacemaker-1.0.9.1-1.el5 and heartbeat-3.0.3-2.el5 on my test),
> and see if I can trigger it again with a higher debug level.
1.0.9.1 is the latest stable pacemaker release. Just make sure
that you're also running the latest cluster-glue (1.0.6).
Thanks,
Dejan
More information about the Pacemaker
mailing list