[Pacemaker] Make 2 nodes failover to each other

Wed Sep 1 10:30:50 UTC 2010

Am Mittwoch, den 01.09.2010, 16:18 +0800 schrieb Alister Wong:
> Hi, Michael. Thanks for reply.
> 
> Actually, I want to setup to pass resource once any error occurred.
> Can the node always pass the resource to another node automatically even the
> node was encounter error before?
> For example:
> 
> At the beginning all my resource is located at NodeA, then NodeA encounter
> error, it failover to NodeB. From crm_mon it should Fail Action in NodeA.
> When NodeB encounter error, it will pass the resource back to NodeA.
> 
> I am not sure any configure can be set to achieve this.
> 
> Or any command I can run to make NodeA to be ready to receive resource
> again?
> Currently, I know that I can't pass the resource to node that was failed
> before.
> 
> Thank you.
> 
> Alister
> -----Original Message-----
> From: Michael Schwartzkopff [mailto:misch at clusterbau.com] 
> Sent: Tuesday, August 31, 2010 9:53 PM
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] Make 2 nodes failover to each other
> 
> Am Dienstag, den 31.08.2010, 21:24 +0800 schrieb Alister Wong:
> > Hi, 
> > 
> >  
> > 
> > I am new to Linux cluster, I have a question for 2 nodes cluster.
> > 
> > I want to make cluster with jakarta tomcat, the node will failover to
> > each other if error detected (e.g. gateway failed to ping)
> > 
> > However, in my current setting, once the node (A) is encountered
> > error, it will failover to another (B). Then if B encounter failed, it
> > can't fail back to A. 
> > 
> > Can anyone help me to let the resource failover around once it
> > encountered error? 
> > 
> > Do I have to do something to make a failed node to be ready to use
> > again? If it is, can anyone tell me how?
> > 
> >  
> > 
> > Below is my configure:
> > 
> > [root at nmc01-a ~]# crm configure show
> > 
> > node nmc01-a
> > 
> > node nmc01-b
> > 
> > primitive ClusterIP ocf:heartbeat:IPaddr2 \
> > 
> >         params ip="10.214.65.5" cidr_netmask="24" \
> > 
> >         op monitor interval="30s"
> > 
> > primitive Tomcat ocf:heartbeat:tomcat \
> > 
> >         operations $id="Tomcat-operations" \
> > 
> >         op monitor interval="30" timeout="30" \
> > 
> >         op start interval="0" timeout="70" \
> > 
> >         op stop interval="0" timeout="120" \
> > 
> >         params catalina_home="/opt/apache-tomcat-6.0.26"
> > java_home="/usr/java/jdk1.6.0_21" tomcat_user="nmc" \
> > 
> >         meta target-role="Started"
> > 
> > primitive pingd ocf:pacemaker:pingd \
> > 
> >         params host_list="10.214.65.254" multiplier="100" \
> > 
> >         op monitor interval="60s" timeout="50s" on_fail="restart" \
> > 
> >         op start interval="0" timeout="90" \
> > 
> >         op stop interval="0" timeout="100"
> > 
> > group nmc_web ClusterIP Tomcat
> > 
> > clone pingdclone pingd \
> > 
> >         meta globally-unique="false"
> > 
> > location nmc_web_connected_node nmc_web \
> > 
> >         rule $id="nmc_web_connected_node-rule" -inf: pingd lte 0
> > 
> > property $id="cib-bootstrap-options" \
> > 
> >         dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \
> > 
> >         cluster-infrastructure="openais" \
> > 
> >         expected-quorum-votes="2" \
> > 
> >         stonith-enabled="false" \
> > 
> >         no-quorum-policy="ignore" \
> > 
> >         last-lrm-refresh="1283167810"
> > 
> > rsc_defaults $id="rsc-options" \
> > 
> >         resource-stickiness="100"
> > 
> >  
> > 
> > By the way, from pingd example in clusterlab.org.
> > 
> > What does "not_defined pingd" mean in below rule setting?
> > 
> > crm(pingd)configure# location my_web_cluster_on_connected_node
> > my_web_cluster \
> > 
> >  rule -inf: not_defined pingd or pingd lte 0
> 
> 
> You have a resource stickiness defined. So after the first node becomes
> available again the resource stays there where it runs and does not fail
> back.
>  
> > 
> > When I included "not_defined pingd" in my cluster confgiure, if one of
> > the node hasn't started up. The pingd in that node won't be started
> > and caused 
> > 
> > my other resources (Virtual IP and tomcat) couldn't start up.
> 
> Perhaps a syntay error because you forget the "or"?
> 
> This "defined" also checks if the pingd attribute is defined at all on a
> node. So it prevents resources running on a node with the ping resource
> not running.
> 
> Michael.
> 
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 
> __________ Information from ESET NOD32 Antivirus, version of virus signature
> database 5413 (20100831) __________
> 
> The message was checked by ESET NOD32 Antivirus.
> 
> http://www.eset.com
> 
> 
>  
> 
> __________ Information from ESET NOD32 Antivirus, version of virus signature
> database 5413 (20100831) __________
> 
> The message was checked by ESET NOD32 Antivirus.
> 
> http://www.eset.com
>  
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

First of all:
http://www.caliburn.nl/topposting.html

You can acchieve this behaviour setting the migration-thrshold to 1. A
failure on node A would stop the resource on node A and start it on node
B. You would have to clear the fail-counter in node A manually. The
resource stickiness makes the resource stay on node B until this node is
not capable to run the reosurce.