[Pacemaker] Weired resource-stickiness behavior

Mon Jun 17 23:44:33 EDT 2013

On 14/06/2013, at 3:52 PM, Xiaomin Zhang <zhangxiaomin at gmail.com> wrote:

> Hi, Andrew:
> If I cut down the network connection of the running node by:
> service network stop, 
> "crm status" will show me the node is put into "OFFLINE" status. The affected resource can also be failed over to another online node correctly. But the issue is that, when I re-connect the network  by:
> service network start.
> to put the "OFFLINE" node to be "Online" again, all the resource is firstly stopped , then some resource are restarted again on the original online node and some other resource are going back to the newly "Online" node. This behavior seems not related to the resource-stickiness configuration.
> I'm just curious if it's the default behavior.

It is when you've disabled fencing and the service is still running on the "OFFLINE" node.
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/ch13.html#_what_is_stonith

> And if I tried to reboot the OFFLINE node, when it's online again, the resource won't be stopped.
> Is this expected that "service network start" triggers Pacemaker to reassign resource?
> Thanks.
> 
> 
> 
> On Fri, Jun 14, 2013 at 10:06 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
> 
> On 13/06/2013, at 5:15 PM, Xiaomin Zhang <zhangxiaomin at gmail.com> wrote:
> 
> > Thanks Andrew.
> > Yes, the fs_ssn service (ocf:FileSystem) is still running when the machine loses network. I configure it as primitive:
> > primitive fs_ssn ocf:heartbeat:Filesystem \
> >      op monitor interval="15s" \
> >      params device="/dev/drbd0" directory="/drbd" fstype="ext3" \
> >      meta target-role="Started"
> > As I assume this resource can only be started on 1 node, I think it should be stopped automatically when pacemaker detects it's not in a HA cluster.
> > Is this incorrect assumption?
> 
> No. But I'd need to see logs from all the nodes (please use attachments) to be able to comment further.
> 
> > Thanks.
> >
> >
> >
> > On Thu, Jun 13, 2013 at 1:50 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
> >
> > On 13/06/2013, at 2:43 PM, Xiaomin Zhang <zhangxiaomin at gmail.com> wrote:
> >
> > > Andrew Beekhof <andrew at ...> writes:
> > >
> > >>
> > >> Try increasing your stickiness as it is being exceeded by the location
> > > constraints.
> > >> For the biggest stick, try 'infinity' which means - never move unless the
> > > node dies.
> > >>
> > >
> > > Thanks, Andrew, I applied infinity resource stickiness. However, the sst
> > > resource is still switched to the node which is online back from failure.
> > > And I found sth in the log:
> > >
> > > Jun 13 11:46:29 node3 pengine[27813]:  warning: unpack_rsc_op: Processing
> > > failed op monitor for ip_ssn on node2: not running (7)
> > > Jun 13 11:46:29 node3 pengine[27813]:    error: native_create_actions:
> > > Resource fs_ssn (ocf::Filesystem) is active on 2 nodes attempting recovery
> > > Jun 13 11:46:29 node3 pengine[27813]:  warning: native_create_actions: See
> > > http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information.
> > >
> > > Is this log showing that pacemaker tries to restart all the resource when
> > > the failed node is back again?
> >
> > No, thats a log showing the services were already running there when pacemaker started.
> >
> > >
> > >
> > >>> Thanks.
> > >>>
> > >>> Below is my configure:
> > >>> ------------------CONFIG START--------------------------------------
> > >>> node node3 \
> > >>>     attributes standby="on"
> > >>> node node1
> > >>> node node2
> > >>> primitive drbd_ssn ocf:linbit:drbd \
> > >>>     params drbd_resource="r0" \
> > >>>     op monitor interval="15s"
> > >>> primitive fs_ssn ocf:heartbeat:Filesystem \
> > >>>     op monitor interval="15s" \
> > >>>     params device="/dev/drbd0" directory="/drbd" fstype="ext3" \
> > >>>     meta target-role="Started"
> > >>> primitive ip_ssn ocf:heartbeat:IPaddr2 \
> > >>>     params ip="192.168.241.1" cidr_netmask="32" \
> > >>>     op monitor interval="15s" \
> > >>>     meta target-role="Started"
> > >>> primitive ip_sst ocf:heartbeat:IPaddr2 \
> > >>>     params ip="192.168.241.2" cidr_netmask="32" \
> > >>>     op monitor interval="15s" \
> > >>>     meta target-role="Started"
> > >>> primitive sst lsb:sst \
> > >>>     op monitor interval="15s" \
> > >>>     meta target-role="stopped"
> > >>> primitive ssn lsb:ssn \
> > >>>     op monitor interval="15s" \
> > >>>     meta target-role="stopped"
> > >>> ms ms_drbd_ssn drbd_ssn \
> > >>>     meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
> > >>> notify="true" target-role="Started"
> > >>> location sst_ip_prefer ip_sst 50: node1
> > >>> location drbd_ssn_prefer ms_drbd_ssn 50: node1
> > >>> colocation fs_ssn_coloc inf: ip_ssn fs_ssn
> > >>> colocation fs_on_drbd_coloc inf: fs_ssn ms_drbd_ssn:Master
> > >>> colocation sst_ip_coloc inf: sst ip_sst
> > >>> colocation ssn_ip_coloc inf: ssn ip_ssn
> > >>> order ssn_after_drbd inf: ms_drbd_ssn:promote fs_ssn:start
> > >>> order ip_after_fs inf: fs_ssn:start ip_ssn:start
> > >>> order sst_after_ip inf: ip_sst:start sst:start
> > >>> order sst_after_ssn inf: ssn:start sst:start
> > >>> order ssn_after_ip inf: ip_ssn:start ssn:start
> > >>> property $id="cib-bootstrap-options" \
> > >>>     dc-version="1.1.8-7.el6-394e906" \
> > >>>     cluster-infrastructure="classic openais (with plugin)" \
> > >>>     expected-quorum-votes="3" \
> > >>>     stonith-enabled="false"
> > >>> rsc_defaults $id="rsc-options" \
> > >>>     resource-stickiness="100"
> > >>>
> > >>> -------------------CONFIG END----------------------------------------
> > >>>
> > > Best Regards.
> > > Xiaomin
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org