[Pacemaker] New action for resource running in multiple nodes

Mon Aug 12 13:27:33 EDT 2013

Hi Andreas,

The problem is the network is out of my control. All the nodes are virtual
machines over some VMWare ESX.
We have two different networks, one for the service, and the other for the
cluster.
One idea is to create a second ring in the service network, but networks
are virtualized, so maybe the problem persists.

And of course, we don't have stonith. It is the same problem, I have no
control over the VMWare hosts, and seems that they have to pay an extra to
use the API needed by the stonith plugin.

Meanwhile, I try to find

Probably this two problems will be fixed in a couple of months, but
meanwhile I have try to maintain the cluster up :)

Thanks
Adrián

On Mon, Aug 12, 2013 at 6:57 PM, Andreas Mock <andreas.mock at web.de> wrote:

> Hi Adrián,****
>
> ** **
>
> IMHO the effort would focus on the wrong issue.****
>
> Make your network for clustering reliable. It is THE building block****
>
> of a cluster besides the nodes.****
>
> - Additional network cards****
>
> - Different vendor****
>
> - Bonding****
>
> - Different path through switches****
>
> ** **
>
> On a two-node-cluster without the necessary option to****
>
> increase the number of nodes I almost always take a crosscable****
>
> for one of the interconnects.****
>
> ** **
>
> Best regards****
>
> Andreas Mock****
>
> ** **
>
> P.S. The story sounds to me that you also don't have stonith****
>
> enabled. Another building block IMHO.****
>
> ** **
>
> ** **
>
> *Von:* Adrián López Tejedor [mailto:adrianlzt at gmail.com]
> *Gesendet:* Montag, 12. August 2013 16:26
> *An:* pacemaker at oss.clusterlabs.org
> *Betreff:* [Pacemaker] New action for resource running in multiple nodes**
> **
>
> ** **
>
> Hi!****
>
> ** **
>
> In the environment we use corosync/pacemaker, recently we are having some
> problems with the network used to maintain the cluster. This short
> interruptions cause the passive node (we have a two node active-passive
> configuration with apache tomcat) to think he is alone, and start another
> instance of tomcat. ****
>
> Few seconds later, the cluster reconnects, and the resource is found
> active in both nodes. The default behaviour (as seen in
> http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-resource-options.html)
> is to stop both, and start one of them.****
>
> ** **
>
> For us, this implies that service is down everytime a short interruption
> in the network occurs.****
>
> ** **
>
> Maybe a new option for "multiple-active" like "stop_old" and/or "stop_new"
> could be useful, stopping only the newest instance of the resource.****
>
> ** **
>
> Thanks!****
>
> Adrián****
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130812/9e644ab2/attachment-0003.html>