[ClusterLabs] Antw: CRM managing ADSL connection; failure not handled
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Tue Aug 25 08:34:59 UTC 2015
Why not start with writing a real OCF RA?
>>> Tom Yates <madhatter at teaparty.net> schrieb am 24.08.2015 um 11:35 in Nachricht
<alpine.LFD.2.20.1508240951170.22953 at risby.home.teaparty.net>:
> I've got a failover firewall pair where the external interface is ADSL;
> that is, PPPoE. i've defined the service thus:
>
> primitive ExternalIP lsb:hb-adsl-helper \
> op monitor interval="60s"
>
> and in addition written a noddy script /etc/init.d/hb-adsl-helper, thus:
>
> #!/bin/bash
> RETVAL=0
> start() {
> /sbin/pppoe-start
> }
> stop() {
> /sbin/pppoe-stop
> }
> case "$1" in
> start)
> start
> ;;
> stop)
> stop
> ;;
> status)
> /sbin/ifconfig ppp0 >& /dev/null && exit 0
> exit 1
> ;;
> *)
> echo $"Usage: $0 {start|stop|status}"
> exit 3
> esac
> exit $?
>
> The problem is that sometimes the ADSL connection falls over, as they do,
> eg:
>
> Aug 20 11:42:10 positron pppd[2469]: LCP terminated by peer
> Aug 20 11:42:10 positron pppd[2469]: Connect time 8619.4 minutes.
> Aug 20 11:42:10 positron pppd[2469]: Sent 1342528799 bytes, received
> 164420300 bytes.
> Aug 20 11:42:13 positron pppd[2469]: Connection terminated.
> Aug 20 11:42:13 positron pppd[2469]: Modem hangup
> Aug 20 11:42:13 positron pppoe[2470]: read (asyncReadFromPPP): Session 1735:
> Input/output error
> Aug 20 11:42:13 positron pppoe[2470]: Sent PADT
> Aug 20 11:42:13 positron pppd[2469]: Exit.
> Aug 20 11:42:13 positron pppoe-connect: PPPoE connection lost; attempting
> re-connection.
>
> CRMd then logs a bunch of stuff, followed by
>
> Aug 20 11:42:18 positron lrmd: [1760]: info: rsc:ExternalIP:8: stop
> Aug 20 11:42:18 positron lrmd: [28357]: WARN: For LSB init script, no
> additional parameters are needed.
> [...]
> Aug 20 11:42:18 positron pppoe-stop: Killing pppd
> Aug 20 11:42:18 positron pppoe-stop: Killing pppoe-connect
> Aug 20 11:42:18 positron lrmd: [1760]: WARN: Managed ExternalIP:stop process
> 28357 exited with return code 1.
>
>
> At this point, the PPPoE connection is down, and stays down. CRMd doesn't
> fail the group which contains both internal and external interfaces over
> to the other node, but nor does it try to restart the service. I'm fairly
> sure this is because I've done something boneheaded, but I can't get my
> bone head around what it might be.
>
> Any light anyone can shed is much appreciated.
>
>
> --
>
> Tom Yates - http://www.teaparty.net
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list