[Pacemaker] Very strange behavior on asymmetric cluster

Sat Mar 19 18:14:10 EDT 2011

19.03.2011 19:10, Dan Frincu:
>
>         Even if that is set, we need to verify that the resources are,
>         indeed,
>         NOT running where they shouldn't be; remember, it is our job
>         to ensure
>         that the configured policy is enforced. So, we probe them
>         everywhere to
>         ensure they are indeed not around, and stop them if we find them.
>
>
>     Again, WHY do you need to verify things which cannot happen by
>     setup? If some resource cannot, REALLY CANNOT exist on a node, and
>     administrator can confirm this, why rely on network, cluster
>     stack, resource agents, electricity in power outlet, etc. to
>     verify that 2+2 is still 4?
>
>
> Don't want to step on any toes or anything, mainly because me stepping 
> on somebody's toes without the person wearing a pair of steel-toe cap 
> boots would leave them toeless, but I've been hearing the ranting go 
> on and on and just felt like maybe something's missing from the 
> picture, specifically, an example for why checking for resources on 
> passive nodes is a good thing, which I haven't seen thus far.
...
> Ok, so far it sounds perfect, but what happens if on the 
> secondary/passive node, someone starts the service, by user error, by 
> upgrading the software and thus activating its automatic startup at 
> the given runlevel and restarting the secondary node (common practice 
> when performing upgrades in a cluster environment), etc. If Pacemaker 
> were not to check all the nodes for the service being active or not => 
> epic fail. Its state-based model, where it maintains a state of the 
> resources and performs the necessary actions to bring the cluster to 
> that state is what saves us from the "epic fail" moment.

Surely you are right. Resources must be monitored on standby nodes to 
prevent such a scenario. You can screw your setup by many other ways, 
howewer. And pacemaker (1.0.10, at least) does not execute recurring 
monitor on passive node, so you may start your service by hands, and it 
will be unnoticed for quite some time.

What I am talking about is monitoring (probing) of a resource on a node 
where this resource cannot be exist. For example, if you have five nodes 
in your cluster and a DRBD resource, which can, by it's nature, work on 
no more than two nodes. Then, other three of your nodes will be 
occasionally probed for that resource. If that action fails, the 
resource will be restarted everywhere. If that node cannot be fenced, 
the resource will be dead.

There is still at least one case when such a failure may happen even if 
RA is perfect: misbehaving or highly overloaded node may cause RA 
timeout. And bugs or configuration errors may, of course.

A resource should not depend on unrelated things, such as nodes which 
have no connections to the resource. Then the resource will be more stable.

> I'm trying to be impartial here, although I may be biased by my 
> experience to rule in favor of Pacemaker, but here's a thought, it's a 
> free world, we all have the freedom of speech, which I'm also 
> exercising at the moment, want something done, do it yourself, patches 
> are being accepted, don't have the time, ask people for their help, in 
> a polite manner, wait for them to reply, kindly ask them again (and 
> prayers are heard, Steven Dake released >> 
> http://www.mail-archive.com/openais@lists.linux-foundation.org/msg06072.html 
> << a patch for automatic redundant ring recovery, thank you Steven), 
> want something done fast, pay some developers to do it for you, say 
> the folks over at www.linbit.com <http://www.linbit.com> wouldn't mind 
> some sponsorship (and I'm not affiliated with them in any way, believe 
> it or not, I'm actually doing this without external incentives, from 
> the kindness of my heart so to speak).

My goal for now is to make the problem clear to the team. It is doubtful 
that such a patch will be accepted without that, given current reaction. 
Moreover, it is not clear how to fix the problem to the best advantage.

This cluster stack is brilliant. It's a pity to see how it fails to keep 
a resource running while it is relatively simple to avoid unneeded downtime.

Thank you for participating.

P.S. There is a crude workaround: op monitor interval="0" timeout="10" 
on_fail="nothing". Obvoiusly, it has own deficiencies.

--
Pavel Levshin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110320/a4a9a0b7/attachment-0003.html>