[Pacemaker] Known problem with IPaddr(2)
Lars Ellenberg
lars.ellenberg at linbit.com
Tue Apr 13 18:28:09 UTC 2010
On Tue, Apr 13, 2010 at 12:10:18PM +0200, Dejan Muhamedagic wrote:
> Hi,
>
> On Mon, Apr 12, 2010 at 05:26:19PM +0200, Markus M. wrote:
> > Markus M. wrote:
> > >is there a known problem with IPaddr(2) when defining many (in my
> > >case: 11) ip resources which are started/stopped concurrently?
>
> Don't remember any problems.
>
> > Well... some further investigation revealed that it seems to be a
> > problem with the way how the ip addresses are assigned.
> >
> > When looking at the output of "ip addr", the first ip address added
> > to the interface gets the scope "global", all further aliases gets
> > the scope "global secondary".
> >
> > If afterwards the first ip address is removed before the secondaries
> > (due to concurrently run of the scripts), ALL secondaries are
> > removed at the same time by the "ip" command, leading to an error
> > for all subsequent trials to remove the other ip addresses because
> > they are already gone.
> >
> > I am not sure how "ip" decides for the "secondary" scope, maybe
> > beacuse the other ip addresses are in the same subnet as the first
> > one.
>
> That sounds bad. Instances should be independent of each other.
> Can you please open a bugzilla and attach a hb_report.
Oh, that is perfectly expected the way he describes it.
The assumption has always been that there is at least one
"normal", not managed by crm, address on the interface,
so no one will have noticed before.
I suggest the following patch,
basically doing one retry.
For the described scenario,
the second try will find the IP already "non existant",
and exit $OCF_SUCCESS.
diff -r e39d40853f09 heartbeat/IPaddr2
--- a/heartbeat/IPaddr2 Tue Apr 13 19:23:05 2010 +0200
+++ b/heartbeat/IPaddr2 Tue Apr 13 20:27:06 2010 +0200
@@ -684,12 +684,12 @@
if [ $ip_status = "no" ]; then
: Requested interface not in use
- exit $OCF_SUCCESS
+ return $OCF_SUCCESS
fi
if [ -n "$IP_CIP" ] && [ $ip_status != "partial2" ]; then
if [ $ip_status = "partial" ]; then
- exit $OCF_SUCCESS
+ return $OCF_SUCCESS
fi
echo "-$IP_INC_NO" >$IP_CIP_FILE
if [ "x$(cat $IP_CIP_FILE)" = "x" ]; then
@@ -713,7 +713,7 @@
if [ "$ip_del_if" = "yes" ]; then
delete_interface $BASEIP $NIC $NETMASK
if [ $? -ne 0 ]; then
- exit $OCF_ERR_GENERIC
+ return $OCF_ERR_GENERIC
fi
if [ "$LVS_SUPPORT" = 1 ]; then
@@ -721,7 +721,7 @@
fi
fi
- exit $OCF_SUCCESS
+ return $OCF_SUCCESS
}
ip_monitor() {
@@ -828,7 +828,12 @@
case $__OCF_ACTION in
start) ip_start
;;
-stop) ip_stop
+stop)
+ # do one retry
+ ip_stop || ip_stop
+ # neither explicit exit nor explicit $? needed.
+ # but for good measure and readability:
+ exit $?
;;
status) ip_status=`ip_served`
if [ $ip_status = "ok" ]; then
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
More information about the Pacemaker
mailing list