[Pacemaker] Weired resource-stickiness behavior

Fri Jun 14 05:52:14 UTC 2013

Hi, Andrew:
If I cut down the network connection of the running node by:
service network stop,
"crm status" will show me the node is put into "OFFLINE" status. The
affected resource can also be failed over to another online node correctly.
But the issue is that, when I re-connect the network  by:
service network start.
to put the "OFFLINE" node to be "Online" again, all the resource is firstly
stopped , then some resource are restarted again on the original online
node and some other resource are going back to the newly "Online" node.
This behavior seems not related to the resource-stickiness configuration.
I'm just curious if it's the default behavior.
And if I tried to reboot the OFFLINE node, when it's online again, the
resource won't be stopped.
Is this expected that "service network start" triggers Pacemaker to
reassign resource?
Thanks.

On Fri, Jun 14, 2013 at 10:06 AM, Andrew Beekhof <andrew at beekhof.net> wrote:

>
> On 13/06/2013, at 5:15 PM, Xiaomin Zhang <zhangxiaomin at gmail.com> wrote:
>
> > Thanks Andrew.
> > Yes, the fs_ssn service (ocf:FileSystem) is still running when the
> machine loses network. I configure it as primitive:
> > primitive fs_ssn ocf:heartbeat:Filesystem \
> >      op monitor interval="15s" \
> >      params device="/dev/drbd0" directory="/drbd" fstype="ext3" \
> >      meta target-role="Started"
> > As I assume this resource can only be started on 1 node, I think it
> should be stopped automatically when pacemaker detects it's not in a HA
> cluster.
> > Is this incorrect assumption?
>
> No. But I'd need to see logs from all the nodes (please use attachments)
> to be able to comment further.
>
> > Thanks.
> >
> >
> >
> > On Thu, Jun 13, 2013 at 1:50 PM, Andrew Beekhof <andrew at beekhof.net>
> wrote:
> >
> > On 13/06/2013, at 2:43 PM, Xiaomin Zhang <zhangxiaomin at gmail.com> wrote:
> >
> > > Andrew Beekhof <andrew at ...> writes:
> > >
> > >>
> > >> Try increasing your stickiness as it is being exceeded by the location
> > > constraints.
> > >> For the biggest stick, try 'infinity' which means - never move unless
> the
> > > node dies.
> > >>
> > >
> > > Thanks, Andrew, I applied infinity resource stickiness. However, the
> sst
> > > resource is still switched to the node which is online back from
> failure.
> > > And I found sth in the log:
> > >
> > > Jun 13 11:46:29 node3 pengine[27813]:  warning: unpack_rsc_op:
> Processing
> > > failed op monitor for ip_ssn on node2: not running (7)
> > > Jun 13 11:46:29 node3 pengine[27813]:    error: native_create_actions:
> > > Resource fs_ssn (ocf::Filesystem) is active on 2 nodes attempting
> recovery
> > > Jun 13 11:46:29 node3 pengine[27813]:  warning: native_create_actions:
> See
> > > http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more
> information.
> > >
> > > Is this log showing that pacemaker tries to restart all the resource
> when
> > > the failed node is back again?
> >
> > No, thats a log showing the services were already running there when
> pacemaker started.
> >
> > >
> > >
> > >>> Thanks.
> > >>>
> > >>> Below is my configure:
> > >>> ------------------CONFIG START--------------------------------------
> > >>> node node3 \
> > >>>     attributes standby="on"
> > >>> node node1
> > >>> node node2
> > >>> primitive drbd_ssn ocf:linbit:drbd \
> > >>>     params drbd_resource="r0" \
> > >>>     op monitor interval="15s"
> > >>> primitive fs_ssn ocf:heartbeat:Filesystem \
> > >>>     op monitor interval="15s" \
> > >>>     params device="/dev/drbd0" directory="/drbd" fstype="ext3" \
> > >>>     meta target-role="Started"
> > >>> primitive ip_ssn ocf:heartbeat:IPaddr2 \
> > >>>     params ip="192.168.241.1" cidr_netmask="32" \
> > >>>     op monitor interval="15s" \
> > >>>     meta target-role="Started"
> > >>> primitive ip_sst ocf:heartbeat:IPaddr2 \
> > >>>     params ip="192.168.241.2" cidr_netmask="32" \
> > >>>     op monitor interval="15s" \
> > >>>     meta target-role="Started"
> > >>> primitive sst lsb:sst \
> > >>>     op monitor interval="15s" \
> > >>>     meta target-role="stopped"
> > >>> primitive ssn lsb:ssn \
> > >>>     op monitor interval="15s" \
> > >>>     meta target-role="stopped"
> > >>> ms ms_drbd_ssn drbd_ssn \
> > >>>     meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1"
> > >>> notify="true" target-role="Started"
> > >>> location sst_ip_prefer ip_sst 50: node1
> > >>> location drbd_ssn_prefer ms_drbd_ssn 50: node1
> > >>> colocation fs_ssn_coloc inf: ip_ssn fs_ssn
> > >>> colocation fs_on_drbd_coloc inf: fs_ssn ms_drbd_ssn:Master
> > >>> colocation sst_ip_coloc inf: sst ip_sst
> > >>> colocation ssn_ip_coloc inf: ssn ip_ssn
> > >>> order ssn_after_drbd inf: ms_drbd_ssn:promote fs_ssn:start
> > >>> order ip_after_fs inf: fs_ssn:start ip_ssn:start
> > >>> order sst_after_ip inf: ip_sst:start sst:start
> > >>> order sst_after_ssn inf: ssn:start sst:start
> > >>> order ssn_after_ip inf: ip_ssn:start ssn:start
> > >>> property $id="cib-bootstrap-options" \
> > >>>     dc-version="1.1.8-7.el6-394e906" \
> > >>>     cluster-infrastructure="classic openais (with plugin)" \
> > >>>     expected-quorum-votes="3" \
> > >>>     stonith-enabled="false"
> > >>> rsc_defaults $id="rsc-options" \
> > >>>     resource-stickiness="100"
> > >>>
> > >>> -------------------CONFIG END----------------------------------------
> > >>>
> > > Best Regards.
> > > Xiaomin
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130614/58d10d57/attachment.htm>