[Pacemaker] Trouble with ordering
Serge Dubrouski
sergeyfd at gmail.com
Mon Oct 3 01:47:26 UTC 2011
On Sun, Oct 2, 2011 at 12:31 AM, Gerald Vogt <vogt at spamcop.net> wrote:
> On 02.10.11 03:18, Serge Dubrouski wrote:
> > 1. You expect rndc and host to be in $PATH. At the same time the path
> to
> > named can be configured. I think consequently, the same should apply
> to
> > rndc and host as they are bind utils.
> >
> > On our CentOS servers we run the latest version of bind, compiled
> from
> > source and installed in a custom path which is added in /etc/profile.
> > For some reason /etc/profile doesn't seem to apply to the ocf scripts
> > thus the script doesn't find rndc or host unless I extend PATH
> manually
> > at the beginning of the script.
> >
> >
> > We had some discussion around this and finally decided to leave it up
> > to sysadmin ti make sure that both tools are available in PATH. One
> > can always create a couple of symlink to cover it.
>
> But isn't it inconsequent that you can set the named path as a parameter
> but not rndc or host. named, rndc, and host all come out of a bind
> installation and they all run on the same host...
>
> > 2. In the stop function you call "rndc stop" to stop the daemon.
> > However, if the daemon hangs, rndc will hang. Thus pacemaker runs
> into a
> > timeout and kills the ocf script, leading to a failed stop.
> >
> >
> > You didn't read the code carefully again. Yes it does exactly what you
> > want or at least it's supposed to:
> >
> > if ! $RNDC stop >/dev/null; then
>
> The problem is your script never gets beyond this line. rndc tries to
> contact named which is hanging. I don't know what time out rndc has
> exactly but at least on our CentOS installation it doesn't time out
> within 60s.
>
> 60s is currently the timeout we have set in the "primitive" declaration.
> Thus after 60s pacemaker assumes your script is hanging and kills your
> script with TERM.
>
> As I wrote before: you should be able to test this easily by sending a
> STOP signal to the named process. At least in this situation I see that
> the "rndc stop" doesn't return before those 60s.
>
Indeed you are right. Thanks for catching. Attached is the patch that fixes
this issue. It also makes rndc and host commands configurable.
Please take a look at the patch and if it's all right I'll ask pacemaker
team to push it into git.
Thanks again.
>
> > kill `cat ${OCF_RESKEY_named_pidfile}`
> > fi
> >
> > if [ -n "$OCF_RESKEY_CRM_meta_timeout" ]; then
> > # Allow 2/3 of the action timeout for the orderly shutdown
> > # (The origin unit is ms, hence the conversion)
> > timeout=$((OCF_RESKEY_CRM_meta_timeout/1500))
> > else
> > timeout=20
> > fi
> >
> > while named_status ; do
> > if [ $timeout -ge ${OCF_RESKEY_named_stop_timeout} ]; then
> > break
> > else
> > sleep 1
> > timeout=$((timeout++))
> > fi
> > done
> >
> > *#If still up*
> > * if named_status 2>&1; then*
> > * ocf_log err "named is still up! Killing";*
> > * kill -9 `cat ${OCF_RESKEY_named_pidfile}`*
> > * fi*
> >
> >
> > I think the ocf script should have its own timeout and abort the rndc
> > call if it takes too long and then try to kill the server.
> >
> >
> > See above.
> >
> >
> >
> > To test send a STOP signal to named and wait...
>
> Gerald
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
--
Serge Dubrouski.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111002/1cc66069/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: named.patch
Type: text/x-patch
Size: 4231 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111002/1cc66069/attachment-0004.bin>
More information about the Pacemaker
mailing list