[Pacemaker] timed out / exec error

James Harper james.harper at bendigoit.com.au
Thu Dec 20 06:43:20 EST 2012


> Hi,
> 
> On Tue, Dec 18, 2012 at 10:58:18AM +0000, James Harper wrote:
> > For the following failure:
> >
> > Failed actions:
> >     p_lvm_iscsi:0_monitor_10000 (node=bitvs6, call=57, rc=-2,
> > status=Timed Out): unknown exec error
> >
> > Is this the ra itself returning a "Timed Out" error, or is it the
> > cluster software determining that the ra is taking too long and so
> > killing it and declaring it failed? stonith kicks in
> 
> The latter.
> 
> > shortly after this happens so tracking it down is a bit of a pain.
> 
> Is it expected? Normally, a monitor failing should cause a resource restart. If
> a resource fails to stop, it may be a resource agent bug.
> 
> > It happens any time the system gets loaded (eg when making a config
> > change)
> 
> What kind of change?
> 
> > and I can't seem to put my finger on what is causing it.
> 
> Which resource is that? Which version of resource agents do you run?
> 

Any cib change throws the system load up for 10-20 seconds, and then things start timing out, despite having set the timeouts well in excess of the time it takes for pacemaker to mark the resource as timed out.

All packages are from debian wheezy.

James




More information about the Pacemaker mailing list