[Pacemaker] Using "avoids" location constraint

Wed Jul 10 13:32:04 UTC 2013

First of all, setting the 3rd host to be a standby (this was done before
any of the resources were created) didn't stop Pacemaker attempting to
start the resources there (that fails as MySQL isn't installed on that
server)....

[root at drbd1 billy]# pcs status
Last updated: Wed Jul 10 13:56:20 2013
Last change: Wed Jul 10 13:55:16 2013 via cibadmin on drbd1.localdomain
Stack: cman
Current DC: drbd1.localdomain - partition with quorum
Version: 1.1.8-7.el6-394e906
3 Nodes configured, unknown expected votes
5 Resources configured.

Node drbd3.localdomain: standby
Online: [ drbd1.localdomain drbd2.localdomain ]

Full list of resources:

 Master/Slave Set: ms_drbd_mysql [p_drbd_mysql]
     Masters: [ drbd1.localdomain ]
     Slaves: [ drbd2.localdomain ]
 Resource Group: g_mysql
     p_fs_mysql (ocf::heartbeat:Filesystem):    Started drbd1.localdomain
     p_ip_mysql (ocf::heartbeat:IPaddr2):       Started drbd1.localdomain
     p_mysql    (ocf::heartbeat:mysql): Started drbd1.localdomain

Failed actions:
    p_mysql_monitor_0 (node=drbd3.localdomain, call=18, rc=5,
status=complete): not installed

...

Is that a bug?

It does at least let me "pcs resource move" my resources and they switch
between drbd1 and drbd2.

While the resources are running on drbd1, "ifdown" its network connection.
What I'd hope would happen in that scenario is that it would be recognised
that there's still a quorum (drbd2 + drbd3) and the resources would be
migrated to drbd2; instead the resources are stopped...

[root at drbd2 billy]# pcs status
Last updated: Wed Jul 10 14:03:03 2013
Last change: Wed Jul 10 13:59:19 2013 via crm_resource on drbd1.localdomain
Stack: cman
Current DC: drbd2.localdomain - partition with quorum
Version: 1.1.8-7.el6-394e906
3 Nodes configured, unknown expected votes
5 Resources configured.

Node drbd3.localdomain: standby
Online: [ drbd2.localdomain ]
OFFLINE: [ drbd1.localdomain ]

Full list of resources:

 Master/Slave Set: ms_drbd_mysql [p_drbd_mysql]
     Masters: [ drbd2.localdomain ]
     Stopped: [ p_drbd_mysql:1 ]
 Resource Group: g_mysql
     p_fs_mysql (ocf::heartbeat:Filesystem):    Stopped
     p_ip_mysql (ocf::heartbeat:IPaddr2):       Stopped
     p_mysql    (ocf::heartbeat:mysql): Stopped

Failed actions:
    p_mysql_monitor_0 (node=drbd3.localdomain, call=18, rc=5,
status=complete): not installed

...

When I look at the log files, I see that there's an attempt to fence drbd1
even though I have <nvpair id="cib-bootstrap-options-stonith-enabled"
name="stonith-enabled" value="false"/> in the CIB. Why would the cluster
still be attempting to STONITH?

The CIB and the log files from the time I dropped the network connection
can be found at http://clusterdb.com/upload/pacemaker_logs.zip

Thanks for the help, Andrew.

On 10 July 2013 12:02, Andrew Beekhof <andrew at beekhof.net> wrote:

>
> On 09/07/2013, at 3:59 PM, Andrew Morgan <andrewjamesmorgan at gmail.com>
> wrote:
>
> >
> >
> >
> > On 9 July 2013 04:11, Andrew Beekhof <andrew at beekhof.net> wrote:
> >
> > On 08/07/2013, at 11:35 PM, Andrew Morgan <andrewjamesmorgan at gmail.com>
> wrote:
> >
> > > Thanks Florian.
> > >
> > > The problem I have is that I'd like to define a HA configuration that
> isn't dependent on a specific set of fencing hardware (or any fencing
> hardware at all for that matter) and as the stack has the quorum capability
> included I'm hoping that this is an option.
> > >
> > > I've not been able to find any quorum commands within pcs; the closest
> I've found is setting a node to "standby" but when I do that, it appears to
> have lost its quorum vote
> >
> > This is not the case.
> >
> > My test was to have 3 nodes, node 3 defined as being on standby. My
> resources were running on node 2. I then dropped the network connection on
> node 2 hoping that node 1 and node 3 would maintain a quorum and that the
> resources would start on node 1 - instead the resources were stopped.
>
> I'd like to see logs of that.  Because I'm having a really hard time
> believing it.
>
> >
> > I have quorum enabled but on pcs status it says that the number of votes
> required is unknown - is there something else that I need to configure?
>
> Something sounds very wrong with your cluster.
>
> >
> >
> >
> > > - this seems at odds with the help text....
> > >
> > > standby <node>
> > >         Put specified node into standby mode (the node specified will
> no longer be able to host resources
> > >
> > > Regards, Andrew.
> > >
> > >
> > > On 8 July 2013 10:23, Florian Crouzat <gentoo at floriancrouzat.net>
> wrote:
> > > Le 08/07/2013 09:49, Andrew Morgan a écrit :
> > >
> > > I'm attempting to implement a 3 node cluster where only 2 nodes are
> > > there to actually run the services and the 3rd is there to form a
> quorum
> > > (so that the cluster stays up when one of the 2 'workload' nodes
> fails).
> > >
> > > To this end, I added a location avoids contraint so that the services
> > > (including drbd) don't get placed on the 3rd node (drbd3)...
> > >
> > > pcs constraint location ms_drbd avoids drbd3.localdomain
> > >
> > > the problem is that this constraint doesn't appear to be enforced and I
> > > see failed actions where Pacemaker has attempted to start the services
> > > on drbd3. In most cases I can just ignore the error but if I attempt to
> > > migrate the services using "pcs move" then it causes a fatal startup
> > > loop for drbd. If I migrate by adding an extra location contraint
> > > preferring the other workload node then I can migrate ok.
> > >
> > > I'm using Oracle Linux 6.4; drbd83-utils 8.3.11; corosync 1.4.1; cman
> > > 3.0.12.1; Pacemaker 1.1.8 & pcs 1.1.8
> > >
> > >
> > > I'm no quorum-node expert but I believe your initial design isn't
> optimal.
> > > You could probably even run with only two nodes (real nodes) and
> no-quorum-policy=ignore + fencing (for data integrity) [1]
> > > This is what most (all?) people with two nodes clusters do.
> > >
> > > But if you really believe you need to be quorate, then I think you
> need to define your third node as quorum-node in corosync/cman (not sure
> how since EL6.4 and CMAN) and I cannot find a valid link. IIRC with such
> definition, you won't need the location constraints.
> > >
> > >
> > > [1]
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_perform_a_failover.html#_quorum_and_two_node_clusters
> > >
> > >
> > >
> > > --
> > > Cheers,
> > > Florian Crouzat
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130710/b16ae1ea/attachment.htm>