[Pacemaker] CLUSTERIP/iptables interaction

Mon Dec 14 18:01:48 UTC 2009

Hi,

On Mon, Dec 14, 2009 at 04:01:24PM +0100, Michael Schwartzkopff wrote:
> Am Montag, 14. Dezember 2009 15:53:25 schrieb Tim Serong:
> > On 12/15/2009 at 01:03 AM, Michael Schwartzkopff <misch at multinet.de> wrote:
> > > Am Montag, 14. Dezember 2009 13:00:34 schrieb Chris Picton:
> > > > Hi all
> > > >
> > > > I am doing some tests with clusterip and pacemaker/heartbeat on Centos
> > > > 5.4, using the clusterlabs repo
> > > >
> > > > My resource looks like:
> > > > primitive CLUSTERIP_21 ocf:heartbeat:IPaddr2 \
> > > > 	op monitor interval="10" timeout="20" start-delay="0" \
> > > > 	params ip="10.202.4.21" nic="eth0" cidr_netmask="24"
> > > > clusterip_hash="sourceip-sourceport" \
> > > > 	meta resource-stickiness="0"
> > > > clone clone_CLUSTERIP_21 CLUSTERIP_21 \
> > > > 	meta clone-max="2" globally-unique="true" clone-node-max="2"
> > > >
> > > >
> > > > This start up fine, and adds an iptables rule correctly, however, if I
> > > > restart the iptables service the clusterip rule gets removed.
> > >
> > > Of course. The rule is inserted and managed by the cluster dynamically.
> > >
> > > > I then get the below errors in my log file.  These continue without
> > > > stopping.
> > > >
> > > > It seems that the ipaddr2 script is not currently capable of recreating
> > > > the iptables rule if it get inadvertently removed.
> > > >
> > > > It this known behaviour?  If not, where does my error lie?
> > >
> > > I did not verify this behaviour but it sounds reasonable. Perhaps
> > > recreating
> > >
> > > the iptables rule after the loss could/should be part of the monitoring
> > > of the
> > > IPaddr2 script.
> > >
> > >
> > > Looking through your logs it seems monitor detect the problem but cannot
> > > recreate the correct rule. Perhaps an error in the RA.
> >
> > The monitor op shouldn't make any changes.  If the rule has gone away,
> > the monitor op should return failure to indicate the resource is broken,
> > which will result in Pacemaker telling the the failed resource to stop, and
> > start again.  Actually, from the logs it looks like a restart was
> > attempted, and the stop op reported success, but the subsequent start
> > failed for some reason.
> >
> > Regards,
> >
> > Tim
> 
> Exactly. So the RA seems to have a problem handeling this error scenario 
> correctly.

OK. Anybody knows how should it work and where's the problem. It
seems like it can't find some proc file.

Thanks,

Dejan