[Pacemaker] [PATCH] change timeouts, startup behaviour ocf:heartbeat:ManageVE (OpenVZ VE cluster resource)
Dejan Muhamedagic
dejanmm at fastmail.fm
Wed Mar 13 16:18:38 UTC 2013
On Tue, Mar 12, 2013 at 12:58:44PM +0000, Tim Small wrote:
> The attached patch changes the behaviour of the OpenVZ virtual machine
> cluster resource agent, so that:
>
> 1. The default resource stop timeout is greater than the hardcoded
Just for the record: where is this hardcoded actually? Is it
also documented?
> timeout in "vzctl stop" (after this time, vzctl forcibly stops the
> virtual machine) (since failure to stop a resource can lead to the
> cluster node being evicted from the cluster entirely - and this is
> generally a BAD thing).
Agreed.
> 2. The start operation now waits for resource startup to complete i.e.
> for the VE to "boot up" (so that the cluster manager can detect VEs
> which are hanging on startup, and also throttle simultaneous startups,
> so as not-to overburden the node in question). Since the start
> operation now does a lot more, the default start operation timeout has
> been increased.
I'm not sure if we can introduce this just like that. It changes
significantly the agent's behaviour.
BTW, how does vzctl know when the VE is started?
> 3. Backs off the default timeouts and intervals for various operations
> to less aggressive values.
Please make patches which are self-contained, but can be
described in a succinct manner. If the description above matches
the code modifications, then there should be three instead of
one patch.
Please continue the discussion at linux-ha-dev, that's where RA
development discussions take place.
Cheers,
Dejan
>
> Cheers,
>
> Tim.
>
>
> n.b. There is a bug in the Debian 6.0 (Squeeze) OpenVZ kernel such that
> "vzctl start <VEID> --wait" hangs. The bug doesn't impact the
> OpenVZ.org kernels (and hence won't impact Debian 7.0 Wheezy either).
>
> --
> South East Open Source Solutions Limited
> Registered in England and Wales with company number 06134732.
> Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
> VAT number: 900 6633 53 http://seoss.co.uk/ +44-(0)1273-808309
>
> --- ManageVE.old 2010-10-22 05:54:50.000000000 +0000
> +++ ManageVE 2013-03-12 11:39:47.895102380 +0000
> @@ -26,12 +26,15 @@
> #
> #
> # Created 07. Sep 2006
> -# Updated 18. Sep 2006
> +# Updated 12. Mar 2013
> #
> -# rev. 1.00.3
> +# rev. 1.00.4
> #
> # Changelog
> #
> +# 12/Mar/13 1.00.4 Wait for VE startup to finish, lengthen default start timeout.
> +# Default stop timeout to longer than the vzctl stop 'polite'
> +# interval.
> # 12/Sep/06 1.00.3 more cleanup
> # 12/Sep/06 1.00.2 fixed some logic in start_ve
> # general cleanup all over the place
> @@ -67,7 +70,7 @@
> <?xml version="1.0"?>
> <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
> <resource-agent name="ManageVE">
> - <version>1.00.3</version>
> + <version>1.00.4</version>
>
> <longdesc lang="en">
> This OCF complaint resource agent manages OpenVZ VEs and thus requires
> @@ -87,12 +90,12 @@
> </parameters>
>
> <actions>
> - <action name="start" timeout="75" />
> - <action name="stop" timeout="75" />
> - <action name="status" depth="0" timeout="10" interval="10" />
> - <action name="monitor" depth="0" timeout="10" interval="10" />
> - <action name="validate-all" timeout="5" />
> - <action name="meta-data" timeout="5" />
> + <action name="start" timeout="240" />
> + <action name="stop" timeout="150" />
> + <action name="status" depth="0" timeout="20" interval="60" />
> + <action name="monitor" depth="0" timeout="20" interval="60" />
> + <action name="validate-all" timeout="10" />
> + <action name="meta-data" timeout="10" />
> </actions>
> </resource-agent>
> END
> @@ -127,7 +130,7 @@
> return $retcode
> fi
>
> - $VZCTL start $VEID >& /dev/null
> + $VZCTL start $VEID --wait >& /dev/null
> retcode=$?
>
> if [[ $retcode != 0 && $retcode != 32 ]]; then
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list