[Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?

Thu May 16 02:00:48 EDT 2013

Hello

How do you configure your cluster network? are you using a private network
for the cluster and one public for the services?

2013/5/15 Andrew Widdersheim <awiddersheim at hotmail.com>

> Sorry to bring up old issues but I am having the exact same problem as the
> original poster. A simultaneous disconnect on my two node cluster causes
> the resources to start to transition to the other node but mid flight
> the transition is aborted and resources are started again on
> the original node when the cluster realizes connectivity is same between
> the two nodes.
>
> I have tried various dampen settings without having any luck. Seems like
> the nodes report the outages at slightly different times which results in a
> partial transition of resources instead of waiting to know the connectivity
> of all of the nodes in the cluster before taking action which is what I
> would have thought dampen would help solve.
>
> Ideally the cluster wouldn't start the transition if another cluster node
> is having a connectivity issue as well and connectivity status is shared
> between all cluster nodes. Find my configuration below. Let me know there
> is something I can change to fix or if this behavior is expected.
>
> primitive p_drbd ocf:linbit:drbd \
>         params drbd_resource="r1" \
>         op monitor interval="30s" role="Slave" \
>         op monitor interval="10s" role="Master"
> primitive p_fs ocf:heartbeat:Filesystem \
>         params device="/dev/drbd/by-res/r1" directory="/drbd/r1"
> fstype="ext4" options="noatime" \
>         op start interval="0" timeout="60s" \
>         op stop interval="0" timeout="180s" \
>         op monitor interval="30s" timeout="40s"
> primitive p_mysql ocf:heartbeat:mysql \
>         params binary="/usr/libexec/mysqld" config="/drbd/r1/mysql/my.cnf"
> datadir="/drbd/r1/mysql" \
>         op start interval="0" timeout="120s" \
>         op stop interval="0" timeout="120s" \
>         op monitor interval="30s" \
>         meta target-role="Started"
> primitive p_ping ocf:pacemaker:ping \
>         params host_list="192.168.5.1" dampen="30s" multiplier="1000"
> debug="true" \
>         op start interval="0" timeout="60s" \
>         op stop interval="0" timeout="60s" \
>         op monitor interval="5s" timeout="10s"
> group g_mysql_group p_fs p_mysql \
>         meta target-role="Started"
> ms ms_drbd p_drbd \
>         meta notify="true" master-max="1" clone-max="2"
> target-role="Started"
> clone cl_ping p_ping
> location l_connected g_mysql \
>         rule $id="l_connected-rule" pingd: defined pingd
> colocation c_mysql_on_drbd inf: g_mysql ms_drbd:Master
> order o_drbd_before_mysql inf: ms_drbd:promote g_mysql:start
> property $id="cib-bootstrap-options" \
>         dc-version="1.1.6-1.el6-8b6c6b9b6dc2627713f870850d20163fad4cc2a2" \
>         cluster-infrastructure="Heartbeat" \
>         no-quorum-policy="ignore" \
>         stonith-enabled="false" \
>         cluster-recheck-interval="5m" \
>         last-lrm-refresh="1368632470"
> rsc_defaults $id="rsc-options" \
>         migration-threshold="5" \
>         resource-stickiness="200"
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130516/1d8e28ff/attachment-0003.html>