[Pacemaker] Issues with constraints - working for start/stop, being ignored on "failures"
Tim Serong
tserong at novell.com
Mon May 31 03:47:37 UTC 2010
On 5/31/2010 at 12:57 PM, Cnut Jansen <work at cnutjansen.eu> wrote:
> Hi,
>
> I'm not sure if it's really some kind of bug (maybe allready widely
> known and even allready fixed in more recent versions) or simply
> misconfiguration and lack of knowledge and experience or something
> (since I'm still quite new to HA-computing), but I have issues with
> Pacemaker about the order constraints I defined, can't get rid of them
> and only partially "work around" them. But such workarounds don't really
> seem "as intended/designed" to me...
>
> The problem is that even though upon starting / switching-to-online and
> stopping / switching-to-standby the nodes / cluster, all constraint
> chains work as they should, and so do they even upon directly stopping
> the troubling fundamental resources, the DRBD- and DLM-resources, which
> are the bases of my constraint chains. Therefor when i.e. a failure
> occurs in the DRBD-resource for MySQL's DataDir, the cluster should
> first stop the MySQL-resource-group (MySQL + IP-adress), then stop the
> MySQL-mount-resource, then demote and finally stop the DRBD-resource.
> But when trying to test the cluster's behaviour upon such a failure via
> "crm_resource -F -r drbdMysql:0 -H nde28", the cluster first tries to
> demote the DRBD-resource, then also allready stop it, then the MySQL-IP,
> the MySQL-mount and only finally MySQL.
> The result of such a test isn't - due to failing demote and stop for the
> DRBD-resource - hard to guess: DRBD-resource left in "started
> (unmanaged) failed", rest of involved resources is stopped.
>
> I'm running Pacemaker 1.0.6, delivered with and running on SLES 11 with
> HAE, both kept up-to-date with official update repositories (due to
> company's directives).
> In a few days SLES 11 SP1 shall be released, where I also hope for a
> more recent version of Pacemaker, DRBD (still have to run 8.2.7) and
> other HA-cluster-related stuff.
>
> I also allready posted about these issues in Novell's support forum with
> lots of more details:
> http://forums.novell.com/novell-product-support-forums/suse-linux-enterprise-serve
> r-sles/sles-configure-administer/411152-constraint-issues-upon-failure-drbd-resource-su
> se-linux-enterprise-hae-11-a.html
>
> So I'm wondering:
> 1) Aren't constraint chains upon defining them also allready implicitly
> exactly invertedly defined for stopping resources too?
Yes, but see below for a note on scores.
> 2) After my testing for workarounds: Why (seem to) do - in case of the
> "failing" fundamental resources - order constraints for MS-resources's
> stop-action have an effect, but neither those for MS-resources's
> demote-action, nor those for (primitives's/?)clones's stop-action? Or is
> that just for the MS-resources's stop-action being only the second
> command anyway, and just therefor following my additional constraint?!
I'm not sure about that.
> Current constraints:
> colocation TEST_colocO2cb inf: cloneO2cb cloneDlm
> colocation colocGrpMysql inf: grpMysql cloneMountMysql
> colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
> colocation colocMountMysql_o2cb inf: cloneMountMysql cloneO2cb
> colocation colocMountOpencms_drbd inf: cloneMountOpencms
> msDrbdOpencms:Master
> colocation colocMountOpencms_o2cb inf: cloneMountOpencms cloneO2cb
> colocation colocTomcat inf: cloneTomcat cloneMountOpencms:Started
> order TEST_orderO2cb 0: cloneDlm cloneO2cb
> order orderGrpMysql 0: cloneMountMysql:start grpMysql
> order orderMountMysql_drbd 0: msDrbdMysql:promote cloneMountMysql:start
> order orderMountMysql_o2cb 0: cloneO2cb cloneMountMysql
> order orderMountOpencms_drbd 0: msDrbdOpencms:promote
> cloneMountOpencms:start
> order orderMountOpencms_o2cb 0: cloneO2cb cloneMountOpencms
> order orderTomcat 0: cloneMountOpencms:start cloneTomcat
Try specifying "inf" for those ordering scores rather than zero.
Ordering constraints with score="0" are considered optional and only
have an effect when both resources are starting or stopping. You
should also be able to leave out the ":start" specifiers as this is
implicit.
> Constraints added to "work around" at least the DRBD-resources left in
> state "started (unmanaged) failed":
> order GNAH_orderDrbdMysql_stop 0: cloneMountMysql:stop msDrbdMysql:stop
> order GNAH_orderDrbdOpencms_stop 0: cloneMountOpencms:stop
> msDrbdOpencms:stop
> (Also tried similiar constraints for msDrbd*:demote and cloneDlm:stop,
> but neither seemed to have an effect)
Those shouldn't be necessary (I never tried putting ordering
constraints on stop ops before...)
Regards,
Tim
--
Tim Serong <tserong at novell.com>
Senior Clustering Engineer, OPS Engineering, Novell Inc.
More information about the Pacemaker
mailing list