[Pacemaker] Fail to switchover when pingd host list failed to reach

Mon Sep 13 13:34:47 UTC 2010

On Wed, Sep 8, 2010 at 1:07 PM, Alister Wong
<alister.wong at wisespotgroup.com> wrote:
> Hi, all,
>
>         I encounter a problem to let my resource failover when active node
> is failed to ping default gateway. Below is my configuration:
>
>
>
>         node nmc01-a
>
> node nmc01-b
>
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>
>         params ip="10.214.65.5" cidr_netmask="24" \
>
>         op monitor interval="30s" \
>
>         meta migration-threshold="1" failure-timeout="90"
>
> primitive Tomcat ocf:heartbeat:tomcat \
>
>         operations $id="Tomcat-operations" \
>
>         op monitor interval="30" timeout="30" \
>
>         op start interval="0" timeout="70" \
>
>         op stop interval="0" timeout="120" \
>
>         params catalina_home="/opt/apache-tomcat-6.0.26"
> java_home="/usr/java/jdk1.6.0_21" tomcat_user="nmc" \
>
>         meta target-role="Started" migration-threshold="1"
> failure-timeout="90"
>
> primitive pingd ocf:pacemaker:pingd \
>
>         params host_list="default_gw" multiplier="100" \
>
>         op monitor interval="60s" timeout="50s" \
>
>         op start interval="0" timeout="90" \
>
>         op stop interval="0" timeout="100"
>
> group nmc_web ClusterIP Tomcat
>
> clone pingdclone pingd \
>
>         meta globally-unique="false" migration-threshold="1"
> failure-timeout="90"
>
> location nmc_web_connected_node nmc_web \
>
>         rule $id="nmc_web_connected_node-rule" -inf: pingd lte 0
>
> colocation tomcat-with-ip inf: Tomcat ClusterIP
>
> order tomcat-after-ip inf: ClusterIP Tomcat
>
> property $id="cib-bootstrap-options" \
>
>         dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \
>
>         cluster-infrastructure="openais" \
>
>         expected-quorum-votes="2" \
>
>         stonith-enabled="false" \
>
>         no-quorum-policy="ignore" \
>
>         last-lrm-refresh="1283328391"
>
> rsc_defaults $id="rsc-options" \
>
>         resource-stickiness="100"
>
>
>
>         When I test my configure, I changed the default_gw from my hosts
> table to invalid IP. However, my resource didn’t failover. I checked
> /var/log/cluster/corosync.log and it showed the default_gw is unreachable.
>
>
>
>         Can anyone tell what’s wrong with my configuration?

Not sure. Can you post the result of cibadmin -Ql when the cluster is
in the state you describe?
We need to see the status section.