[Pacemaker] Issues with constraints - working for start/stop, being ignored on "failures"

Cnut Jansen work at cnutjansen.eu
Sun May 30 22:57:06 EDT 2010


Hi,

I'm not sure if it's really some kind of bug (maybe allready widely 
known and even allready fixed in more recent versions) or simply 
misconfiguration and lack of knowledge and experience or something 
(since I'm still quite new to HA-computing), but I have issues with 
Pacemaker about the order constraints I defined, can't get rid of them 
and only partially "work around" them. But such workarounds don't really 
seem "as intended/designed" to me...

The problem is that even though upon starting / switching-to-online and 
stopping / switching-to-standby the nodes / cluster, all constraint 
chains work as they should, and so do they even upon directly stopping 
the troubling fundamental resources, the DRBD- and DLM-resources, which 
are the bases of my constraint chains. Therefor when i.e. a failure 
occurs in the DRBD-resource for MySQL's DataDir, the cluster should 
first stop the MySQL-resource-group (MySQL + IP-adress), then stop the 
MySQL-mount-resource, then demote and finally stop the DRBD-resource. 
But when trying to test the cluster's behaviour upon such a failure via 
"crm_resource -F -r drbdMysql:0 -H nde28", the cluster first tries to 
demote the DRBD-resource, then also allready stop it, then the MySQL-IP, 
the MySQL-mount and only finally MySQL.
The result of such a test isn't - due to failing demote and stop for the 
DRBD-resource - hard to guess: DRBD-resource left in "started 
(unmanaged) failed", rest of involved resources is stopped.

I'm running Pacemaker 1.0.6, delivered with and running on SLES 11 with 
HAE, both kept up-to-date with official update repositories (due to 
company's directives).
In a few days SLES 11 SP1 shall be released, where I also hope for a 
more recent version of Pacemaker, DRBD (still have to run 8.2.7) and 
other HA-cluster-related stuff.

I also allready posted about these issues in Novell's support forum with 
lots of more details:
http://forums.novell.com/novell-product-support-forums/suse-linux-enterprise-server-sles/sles-configure-administer/411152-constraint-issues-upon-failure-drbd-resource-suse-linux-enterprise-hae-11-a.html

So I'm wondering:
1) Aren't constraint chains upon defining them also allready implicitly 
exactly invertedly defined for stopping resources too?
2) After my testing for workarounds: Why (seem to) do - in case of the 
"failing" fundamental resources - order constraints for MS-resources's 
stop-action have an effect, but neither those for MS-resources's 
demote-action, nor those for (primitives's/?)clones's stop-action? Or is 
that just for the MS-resources's stop-action being only the second 
command anyway, and just therefor following my additional constraint?!


Current constraints:
colocation TEST_colocO2cb inf: cloneO2cb cloneDlm
colocation colocGrpMysql inf: grpMysql cloneMountMysql
colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
colocation colocMountMysql_o2cb inf: cloneMountMysql cloneO2cb
colocation colocMountOpencms_drbd inf: cloneMountOpencms 
msDrbdOpencms:Master
colocation colocMountOpencms_o2cb inf: cloneMountOpencms cloneO2cb
colocation colocTomcat inf: cloneTomcat cloneMountOpencms:Started
order TEST_orderO2cb 0: cloneDlm cloneO2cb
order orderGrpMysql 0: cloneMountMysql:start grpMysql
order orderMountMysql_drbd 0: msDrbdMysql:promote cloneMountMysql:start
order orderMountMysql_o2cb 0: cloneO2cb cloneMountMysql
order orderMountOpencms_drbd 0: msDrbdOpencms:promote 
cloneMountOpencms:start
order orderMountOpencms_o2cb 0: cloneO2cb cloneMountOpencms
order orderTomcat 0: cloneMountOpencms:start cloneTomcat

Constraints added to "work around" at least the DRBD-resources left in 
state "started (unmanaged) failed":
order GNAH_orderDrbdMysql_stop 0: cloneMountMysql:stop msDrbdMysql:stop
order GNAH_orderDrbdOpencms_stop 0: cloneMountOpencms:stop 
msDrbdOpencms:stop
(Also tried similiar constraints for msDrbd*:demote and cloneDlm:stop, 
but neither seemed to have an effect)





More information about the Pacemaker mailing list