[Pacemaker] Trouble with ordering

Serge Dubrouski sergeyfd at gmail.com
Mon Oct 3 01:47:26 UTC 2011


On Sun, Oct 2, 2011 at 12:31 AM, Gerald Vogt <vogt at spamcop.net> wrote:

> On 02.10.11 03:18, Serge Dubrouski wrote:
> >     1. You expect rndc and host to be in $PATH. At the same time the path
> to
> >     named can be configured. I think consequently, the same should apply
> to
> >     rndc and host as they are bind utils.
> >
> >     On our CentOS servers we run the latest version of bind, compiled
> from
> >     source and installed in a custom path which is added in /etc/profile.
> >     For some reason /etc/profile doesn't seem to apply to the ocf scripts
> >     thus the script doesn't find rndc or host unless I extend PATH
> manually
> >     at the beginning of the script.
> >
> >
> > We had some discussion around this and finally decided  to leave it up
> > to sysadmin ti make sure that both tools are available in PATH. One
> > can always create a couple of symlink to cover it.
>
> But isn't it inconsequent that you can set the named path as a parameter
> but not rndc or host. named, rndc, and host all come out of a bind
> installation and they all run on the same host...
>
> >     2. In the stop function you call "rndc stop" to stop the daemon.
> >     However, if the daemon hangs, rndc will hang. Thus pacemaker runs
> into a
> >     timeout and kills the ocf script, leading to a failed stop.
> >
> >
> > You didn't read the code carefully again. Yes it does exactly what you
> > want or at least it's supposed to:
> >
> >     if ! $RNDC stop >/dev/null; then
>
> The problem is your script never gets beyond this line. rndc tries to
> contact named which is hanging. I don't know what time out rndc has
> exactly but at least on our CentOS installation it doesn't time out
> within 60s.
>
> 60s is currently the timeout we have set in the "primitive" declaration.
> Thus after 60s pacemaker assumes your script is hanging and kills your
> script with TERM.
>
> As I wrote before: you should be able to test this easily by sending a
> STOP signal to the named process. At least in this situation I see that
> the "rndc stop" doesn't return before those 60s.
>

Indeed you are right. Thanks for catching. Attached is the patch that fixes
this issue. It also makes rndc and host commands configurable.

Please take a look at the patch and if it's all right I'll ask pacemaker
team to push it into git.

Thanks again.


>
> >         kill `cat ${OCF_RESKEY_named_pidfile}`
> >     fi
> >
> >     if [ -n "$OCF_RESKEY_CRM_meta_timeout" ]; then
> >       # Allow 2/3 of the action timeout for the orderly shutdown
> >       # (The origin unit is ms, hence the conversion)
> >       timeout=$((OCF_RESKEY_CRM_meta_timeout/1500))
> >     else
> >       timeout=20
> >     fi
> >
> >     while named_status ; do
> >         if [ $timeout -ge ${OCF_RESKEY_named_stop_timeout} ]; then
> >             break
> >         else
> >             sleep 1
> >             timeout=$((timeout++))
> >         fi
> >     done
> >
> >     *#If still up*
> > *    if named_status 2>&1; then*
> > *        ocf_log err "named is still up! Killing";*
> > *        kill -9 `cat ${OCF_RESKEY_named_pidfile}`*
> > *    fi*
> >
> >
> >     I think the ocf script should have its own timeout and abort the rndc
> >     call if it takes too long and then try to kill the server.
> >
> >
> > See above.
> >
> >
> >
> >     To test send a STOP signal to named and wait...
>
> Gerald
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>



-- 
Serge Dubrouski.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111002/1cc66069/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: named.patch
Type: text/x-patch
Size: 4231 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111002/1cc66069/attachment-0004.bin>


More information about the Pacemaker mailing list