[Pacemaker] Make 2 nodes failover to each other

Tue Aug 31 13:52:41 UTC 2010

Am Dienstag, den 31.08.2010, 21:24 +0800 schrieb Alister Wong:
> Hi, 
> 
>  
> 
> I am new to Linux cluster, I have a question for 2 nodes cluster.
> 
> I want to make cluster with jakarta tomcat, the node will failover to
> each other if error detected (e.g. gateway failed to ping)
> 
> However, in my current setting, once the node (A) is encountered
> error, it will failover to another (B). Then if B encounter failed, it
> can't fail back to A. 
> 
> Can anyone help me to let the resource failover around once it
> encountered error? 
> 
> Do I have to do something to make a failed node to be ready to use
> again? If it is, can anyone tell me how?
> 
>  
> 
> Below is my configure:
> 
> [root at nmc01-a ~]# crm configure show
> 
> node nmc01-a
> 
> node nmc01-b
> 
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
> 
>         params ip="10.214.65.5" cidr_netmask="24" \
> 
>         op monitor interval="30s"
> 
> primitive Tomcat ocf:heartbeat:tomcat \
> 
>         operations $id="Tomcat-operations" \
> 
>         op monitor interval="30" timeout="30" \
> 
>         op start interval="0" timeout="70" \
> 
>         op stop interval="0" timeout="120" \
> 
>         params catalina_home="/opt/apache-tomcat-6.0.26"
> java_home="/usr/java/jdk1.6.0_21" tomcat_user="nmc" \
> 
>         meta target-role="Started"
> 
> primitive pingd ocf:pacemaker:pingd \
> 
>         params host_list="10.214.65.254" multiplier="100" \
> 
>         op monitor interval="60s" timeout="50s" on_fail="restart" \
> 
>         op start interval="0" timeout="90" \
> 
>         op stop interval="0" timeout="100"
> 
> group nmc_web ClusterIP Tomcat
> 
> clone pingdclone pingd \
> 
>         meta globally-unique="false"
> 
> location nmc_web_connected_node nmc_web \
> 
>         rule $id="nmc_web_connected_node-rule" -inf: pingd lte 0
> 
> property $id="cib-bootstrap-options" \
> 
>         dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \
> 
>         cluster-infrastructure="openais" \
> 
>         expected-quorum-votes="2" \
> 
>         stonith-enabled="false" \
> 
>         no-quorum-policy="ignore" \
> 
>         last-lrm-refresh="1283167810"
> 
> rsc_defaults $id="rsc-options" \
> 
>         resource-stickiness="100"
> 
>  
> 
> By the way, from pingd example in clusterlab.org.
> 
> What does "not_defined pingd" mean in below rule setting?
> 
> crm(pingd)configure# location my_web_cluster_on_connected_node
> my_web_cluster \
> 
>  rule -inf: not_defined pingd or pingd lte 0

You have a resource stickiness defined. So after the first node becomes
available again the resource stays there where it runs and does not fail
back.

> 
> When I included "not_defined pingd" in my cluster confgiure, if one of
> the node hasn't started up. The pingd in that node won't be started
> and caused 
> 
> my other resources (Virtual IP and tomcat) couldn't start up.

Perhaps a syntay error because you forget the "or"?

This "defined" also checks if the pingd attribute is defined at all on a
node. So it prevents resources running on a node with the ping resource
not running.

Michael.