[Pacemaker] Known problem with IPaddr(2)

Mon Apr 12 14:06:35 UTC 2010

Hello,

is there a known problem with IPaddr(2) when defining many (in my case: 
11) ip resources which are started/stopped concurrently?

In my case (CentOS5, latest pacemaker) the resources are starting up 
fine, but when shutting down pacemaker (also during a cluster switch), 
sometimes one ore more of the ip resources are ending up as 
failed/unmanaged after the stop action. This leads to whole cluster to 
"hang".

The problem is not associated to a specific ip resource, it's a 
different one which fails each time.

When trying to debug the problem (putting "set -x" and redirection of 
stderr in the resource script) the problem seems to go away. So it might 
be something like a race condition, but at a first look the script seems 
to be fine.

I tried both resources IPaddr and IPaddr2, the problem occurs with both 
(more likely with IPaddr, but also with IPaddr2 sometimes).

Any idea? Of course i could make the start of the ip resources ordered, 
but if possible i would like to avoid this.

With kind regards
Markus