[Pacemaker] Occasional error running ocf scripts

Fri Aug 13 11:35:13 UTC 2010

Hi,

On Fri, Aug 13, 2010 at 10:29:43AM +0000, Chris Picton wrote:
> On Fri, 13 Aug 2010 12:06:27 +0200, Dejan Muhamedagic wrote:
> 
> > Hi,
> > 
> > On Fri, Aug 13, 2010 at 11:20:38AM +0200, Chris Picton wrote:
> >> Hi all
> >> 
> >> I have seen the following behaviour on a few occasions in the past few
> >> months.  It seems as if the resource script get called, but without the
> >> correct OCF_RESOURCE parameters.
> >> 
> >> Aug 13 10:58:08 chris-test-01 Filesystem[24682]: [24688]: ERROR: Please
> >> set OCF_RESKEY_device to the device to be managed Aug 13 10:58:08
> >> 
> >> 
> >> 99% of the time, the resource will stop correctly, it is just on a few
> >> occasions that I see an error like this.
> >> 
> >> Is this a known problem, or can I generate extra logging to try help
> >> debug?
> > 
> > Never heard of it. That sounds quite serious. Yes, extra logging would
> > be helpful. How often did that happen? Which releases do you run?
> > 
> 
> I have probably seen it more than 10 times (on different resources, 
> versions and servers) over the past year
> 
> It has happened on versions 2.1.4, 3.0.0 and 3.0.3, but it happened more 
> often on 2.1.4 (we had a server which would often get stonithed when 
> stopping a resource for exactly this reason)
> 
> I am currently testing a new CIB for my sql servers and it came up again, 
> so I thought I would mail through my results.
> 
> I will update to the latest rpm package from clusterlabs (I currently am 
> running pacemaker-1.0.9.1-1.el5 and heartbeat-3.0.3-2.el5 on my test), 
> and see if I can trigger it again with a higher debug level.

1.0.9.1 is the latest stable pacemaker release. Just make sure
that you're also running the latest cluster-glue (1.0.6).

Thanks,

Dejan