[Pacemaker] A couple of queries regarding the behaviour of ocf:heartbeat:ManageVE

Mon Mar 11 17:52:53 EDT 2013

On 11/03/13 15:12, Dejan Muhamedagic wrote:
>> A more flexible solution might be to make the timeout configurable, but
>> > in the absence of this, then I think upping the stop action timeout
>> > seems like the right thing to do.
>>     
> The value you found is just an advice to the user. You can
> define timeout for any operation on a per-resource basis (.e.g.
> op stop timeout=4m)

Sorry - I didn't make myself clear - what I meant is that the vzctl
program (which ocf:heartbeat:ManageVE uses extensively), should be
modified so that its internal timeout (after which it forcibly stops the
VM in question - currently hard-coded to 120s) is modifiable.  The
ocf:heartbeat:ManageVE could then support setting this timeout (and
complain loudly, if it's been set too close to, or greater-than the
resource stop operation timeout value).

As it is, I think the ocf:heartbeat:ManageVE stop timeout should always
be greater than the underlying vzctl timeout, otherwise the virtual
machine (VE / container / zone, whatever you want to call it) stop
operation will be unreliable, and the cluster node runs the risk of
going splat.

I realise that the ocf:heartbeat:ManageVE stop operation timeout is
advice to the user, but it seems like pretty bad advise at the moment!

> The start operation should anyway wait until the resource is
> completely started, so it should do 'start --wait.'
>   

OK, good - that's what I though I'll submit a patch for the resource
script for the time being, and look at doing the modification to vzctl
asynchronously I think...

Tim.

-- 
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.  
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53  http://seoss.co.uk/ +44-(0)1273-808309