[Pacemaker] Weired resource-stickiness behavior

Thu Jun 13 13:58:48 EDT 2013

Hi, Andrew:
With the configuration I pasted in this thread, I can see the resource fail
over to the online node when the other node is put into "Standby" by using
"crm node standby" command. And when I put the "Standby" node back again to
"Online", the resource keeps running on the original "Online" node, and
nothing happens to the resources.
However, if I cut down the network connection of the other node, "crm
status" will show me the node is put into "OFFLINE" status. The affected
resource can also be failed over to another online node correctly. But the
issue is that, when I re-connect the network to put the "OFFLINE" node to
be "Online" again, all the resource is firstly stopped , then some resource
are restarted again on the original online node and some other resource are
going back to the newly "Online" node. This behavior seems not related to
the resource-stickiness configuration.
I'm just curious if it's the default behavior, and if not, how can disable
pacemaker to stop all running resource, and re-locating them.
Thanks.

On Thu, Jun 13, 2013 at 1:50 PM, Andrew Beekhof <andrew at beekhof.net> wrote:

>
> On 13/06/2013, at 2:43 PM, Xiaomin Zhang <zhangxiaomin at gmail.com> wrote:
>
> > Andrew Beekhof <andrew at ...> writes:
> >
> >>
> >> Try increasing your stickiness as it is being exceeded by the location
> > constraints.
> >> For the biggest stick, try 'infinity' which means - never move unless
> the
> > node dies.
> >>
> >
> > Thanks, Andrew, I applied infinity resource stickiness. However, the sst
> > resource is still switched to the node which is online back from failure.
> > And I found sth in the log:
> >
> > Jun 13 11:46:29 node3 pengine[27813]:  warning: unpack_rsc_op: Processing
> > failed op monitor for ip_ssn on node2: not running (7)
> > Jun 13 11:46:29 node3 pengine[27813]:    error: native_create_actions:
> > Resource fs_ssn (ocf::Filesystem) is active on 2 nodes attempting
> recovery
> > Jun 13 11:46:29 node3 pengine[27813]:  warning: native_create_actions:
> See
> > http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more
> information.
> >
> > Is this log showing that pacemaker tries to restart all the resource when
> > the failed node is back again?
>
> No, thats a log showing the services were already running there when
> pacemaker started.
>
> >
> >
> >>> Thanks.
> >>>
> >>> Below is my configure:
> >>> ------------------CONFIG START--------------------------------------
> >>> node node3 \
> >>>     attributes standby="on"
> >>> node node1
> >>> node node2
> >>> primitive drbd_ssn ocf:linbit:drbd \
> >>>     params drbd_resource="r0" \
> >>>     op monitor interval="15s"
> >>> primitive fs_ssn ocf:heartbeat:Filesystem \
> >>>     op monitor interval="15s" \
> >>>     params device="/dev/drbd0" directory="/drbd" fstype="ext3" \
> >>>     meta target-role="Started"
> >>> primitive ip_ssn ocf:heartbeat:IPaddr2 \
> >>>     params ip="192.168.241.1" cidr_netmask="32" \
> >>>     op monitor interval="15s" \
> >>>     meta target-role="Started"
> >>> primitive ip_sst ocf:heartbeat:IPaddr2 \
> >>>     params ip="192.168.241.2" cidr_netmask="32" \
> >>>     op monitor interval="15s" \
> >>>     meta target-role="Started"
> >>> primitive sst lsb:sst \
> >>>     op monitor interval="15s" \
> >>>     meta target-role="stopped"
> >>> primitive ssn lsb:ssn \
> >>>     op monitor interval="15s" \
> >>>     meta target-role="stopped"
> >>> ms ms_drbd_ssn drbd_ssn \
> >>>     meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1"
> >>> notify="true" target-role="Started"
> >>> location sst_ip_prefer ip_sst 50: node1
> >>> location drbd_ssn_prefer ms_drbd_ssn 50: node1
> >>> colocation fs_ssn_coloc inf: ip_ssn fs_ssn
> >>> colocation fs_on_drbd_coloc inf: fs_ssn ms_drbd_ssn:Master
> >>> colocation sst_ip_coloc inf: sst ip_sst
> >>> colocation ssn_ip_coloc inf: ssn ip_ssn
> >>> order ssn_after_drbd inf: ms_drbd_ssn:promote fs_ssn:start
> >>> order ip_after_fs inf: fs_ssn:start ip_ssn:start
> >>> order sst_after_ip inf: ip_sst:start sst:start
> >>> order sst_after_ssn inf: ssn:start sst:start
> >>> order ssn_after_ip inf: ip_ssn:start ssn:start
> >>> property $id="cib-bootstrap-options" \
> >>>     dc-version="1.1.8-7.el6-394e906" \
> >>>     cluster-infrastructure="classic openais (with plugin)" \
> >>>     expected-quorum-votes="3" \
> >>>     stonith-enabled="false"
> >>> rsc_defaults $id="rsc-options" \
> >>>     resource-stickiness="100"
> >>>
> >>> -------------------CONFIG END----------------------------------------
> >>>
> > Best Regards.
> > Xiaomin
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130614/e2cc18ce/attachment-0003.html>