[Pacemaker] Known problem with IPaddr(2)

Lars Ellenberg lars.ellenberg at linbit.com
Tue Apr 13 15:16:31 EDT 2010


On Tue, Apr 13, 2010 at 08:28:09PM +0200, Lars Ellenberg wrote:
> On Tue, Apr 13, 2010 at 12:10:18PM +0200, Dejan Muhamedagic wrote:
> > Hi,
> > 
> > On Mon, Apr 12, 2010 at 05:26:19PM +0200, Markus M. wrote:
> > > Markus M. wrote:
> > > >is there a known problem with IPaddr(2) when defining many (in my
> > > >case: 11) ip resources which are started/stopped concurrently?
> > 
> > Don't remember any problems.
> > 
> > > Well... some further investigation revealed that it seems to be a
> > > problem with the way how the ip addresses are assigned.
> > > 
> > > When looking at the output of "ip addr", the first ip address added
> > > to the interface gets the scope "global", all further aliases gets
> > > the scope "global secondary".
> > > 
> > > If afterwards the first ip address is removed before the secondaries
> > > (due to concurrently run of the scripts), ALL secondaries are
> > > removed at the same time by the "ip" command, leading to an error
> > > for all subsequent trials to remove the other ip addresses because
> > > they are already gone.
> > > 
> > > I am not sure how "ip" decides for the "secondary" scope, maybe
> > > beacuse the other ip addresses are in the same subnet as the first
> > > one.
> > 
> > That sounds bad. Instances should be independent of each other.
> > Can you please open a bugzilla and attach a hb_report.
> 
> Oh, that is perfectly expected the way he describes it.
> The assumption has always been that there is at least one
> "normal", not managed by crm, address on the interface,
> so no one will have noticed before.
> 
> I suggest the following patch,
> basically doing one retry.
> 
> For the described scenario,
> the second try will find the IP already "non existant",
> and exit $OCF_SUCCESS.

Though that obviously won't make instances independent.

The typical way to achieve that is to have them all as "secondary" IPs.
Which implies that for successful use of independent IPaddr2 resources
on the same device, you need at least one "system" IP (as opposed to
"managed by cluster") on that device.

The first IP assigned will get "primary" status.
Usually, if you delete a "primary" IP, the kernel will also
delete all secondary IP addresses.

If using a "system" IP is not an option, here is the alternative:
"Recent" kernels (a quick check revealed that this setting is around
since at least 2.6.12) can do "alias promotion", which can be enabled
using
	sysctl -w net.ipv4.conf.all.promote_secondaries=1
(or per device)

In both cases the previously "retry on ip_stop" patch is unnecesssary.
But won't do any harm, either. Most likely ;-)

Glad that helped ;-)

Somebody please add that to the man page respectively agent meta data...

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.




More information about the Pacemaker mailing list