[Pacemaker] Issues with constraints - working for start/stop, being ignored on "failures"
Dejan Muhamedagic
dejanmm at fastmail.fm
Mon May 31 11:50:16 UTC 2010
Hi,
On Sun, May 30, 2010 at 09:47:37PM -0600, Tim Serong wrote:
> On 5/31/2010 at 12:57 PM, Cnut Jansen <work at cnutjansen.eu> wrote:
> > Hi,
> >
> > I'm not sure if it's really some kind of bug (maybe allready widely
> > known and even allready fixed in more recent versions) or simply
> > misconfiguration and lack of knowledge and experience or something
> > (since I'm still quite new to HA-computing), but I have issues with
> > Pacemaker about the order constraints I defined, can't get rid of them
> > and only partially "work around" them. But such workarounds don't really
> > seem "as intended/designed" to me...
> >
> > The problem is that even though upon starting / switching-to-online and
> > stopping / switching-to-standby the nodes / cluster, all constraint
> > chains work as they should, and so do they even upon directly stopping
> > the troubling fundamental resources, the DRBD- and DLM-resources, which
> > are the bases of my constraint chains. Therefor when i.e. a failure
> > occurs in the DRBD-resource for MySQL's DataDir, the cluster should
> > first stop the MySQL-resource-group (MySQL + IP-adress), then stop the
> > MySQL-mount-resource, then demote and finally stop the DRBD-resource.
> > But when trying to test the cluster's behaviour upon such a failure via
> > "crm_resource -F -r drbdMysql:0 -H nde28", the cluster first tries to
> > demote the DRBD-resource, then also allready stop it, then the MySQL-IP,
> > the MySQL-mount and only finally MySQL.
> > The result of such a test isn't - due to failing demote and stop for the
> > DRBD-resource - hard to guess: DRBD-resource left in "started
> > (unmanaged) failed", rest of involved resources is stopped.
> >
> > I'm running Pacemaker 1.0.6, delivered with and running on SLES 11 with
> > HAE, both kept up-to-date with official update repositories (due to
> > company's directives).
> > In a few days SLES 11 SP1 shall be released, where I also hope for a
> > more recent version of Pacemaker, DRBD (still have to run 8.2.7) and
> > other HA-cluster-related stuff.
> >
> > I also allready posted about these issues in Novell's support forum with
> > lots of more details:
> > http://forums.novell.com/novell-product-support-forums/suse-linux-enterprise-serve
> > r-sles/sles-configure-administer/411152-constraint-issues-upon-failure-drbd-resource-su
> > se-linux-enterprise-hae-11-a.html
> >
> > So I'm wondering:
> > 1) Aren't constraint chains upon defining them also allready implicitly
> > exactly invertedly defined for stopping resources too?
>
> Yes, but see below for a note on scores.
>
> > 2) After my testing for workarounds: Why (seem to) do - in case of the
> > "failing" fundamental resources - order constraints for MS-resources's
> > stop-action have an effect, but neither those for MS-resources's
> > demote-action, nor those for (primitives's/?)clones's stop-action? Or is
> > that just for the MS-resources's stop-action being only the second
> > command anyway, and just therefor following my additional constraint?!
>
> I'm not sure about that.
>
> > Current constraints:
> > colocation TEST_colocO2cb inf: cloneO2cb cloneDlm
> > colocation colocGrpMysql inf: grpMysql cloneMountMysql
> > colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
> > colocation colocMountMysql_o2cb inf: cloneMountMysql cloneO2cb
> > colocation colocMountOpencms_drbd inf: cloneMountOpencms
> > msDrbdOpencms:Master
> > colocation colocMountOpencms_o2cb inf: cloneMountOpencms cloneO2cb
> > colocation colocTomcat inf: cloneTomcat cloneMountOpencms:Started
> > order TEST_orderO2cb 0: cloneDlm cloneO2cb
> > order orderGrpMysql 0: cloneMountMysql:start grpMysql
> > order orderMountMysql_drbd 0: msDrbdMysql:promote cloneMountMysql:start
> > order orderMountMysql_o2cb 0: cloneO2cb cloneMountMysql
> > order orderMountOpencms_drbd 0: msDrbdOpencms:promote
> > cloneMountOpencms:start
> > order orderMountOpencms_o2cb 0: cloneO2cb cloneMountOpencms
> > order orderTomcat 0: cloneMountOpencms:start cloneTomcat
>
> Try specifying "inf" for those ordering scores rather than zero.
> Ordering constraints with score="0" are considered optional and only
> have an effect when both resources are starting or stopping. You
> should also be able to leave out the ":start" specifiers as this is
> implicit.
Actually not in cases where it is the second resource and the
first resource has a different action specified. The second
action (or state) defaults to the first action (or state).
> > Constraints added to "work around" at least the DRBD-resources left in
> > state "started (unmanaged) failed":
> > order GNAH_orderDrbdMysql_stop 0: cloneMountMysql:stop msDrbdMysql:stop
> > order GNAH_orderDrbdOpencms_stop 0: cloneMountOpencms:stop
> > msDrbdOpencms:stop
> > (Also tried similiar constraints for msDrbd*:demote and cloneDlm:stop,
> > but neither seemed to have an effect)
>
> Those shouldn't be necessary (I never tried putting ordering
> constraints on stop ops before...)
Right. They are implied by the other constraints.
Thanks,
Dejan
> Regards,
>
> Tim
>
>
> --
> Tim Serong <tserong at novell.com>
> Senior Clustering Engineer, OPS Engineering, Novell Inc.
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
More information about the Pacemaker
mailing list