[Pacemaker] Issues with constraints - working for start/stop, being ignored on "failures"

Mon May 31 03:47:37 UTC 2010

On 5/31/2010 at 12:57 PM, Cnut Jansen <work at cnutjansen.eu> wrote: 
> Hi, 
>  
> I'm not sure if it's really some kind of bug (maybe allready widely  
> known and even allready fixed in more recent versions) or simply  
> misconfiguration and lack of knowledge and experience or something  
> (since I'm still quite new to HA-computing), but I have issues with  
> Pacemaker about the order constraints I defined, can't get rid of them  
> and only partially "work around" them. But such workarounds don't really  
> seem "as intended/designed" to me... 
>  
> The problem is that even though upon starting / switching-to-online and  
> stopping / switching-to-standby the nodes / cluster, all constraint  
> chains work as they should, and so do they even upon directly stopping  
> the troubling fundamental resources, the DRBD- and DLM-resources, which  
> are the bases of my constraint chains. Therefor when i.e. a failure  
> occurs in the DRBD-resource for MySQL's DataDir, the cluster should  
> first stop the MySQL-resource-group (MySQL + IP-adress), then stop the  
> MySQL-mount-resource, then demote and finally stop the DRBD-resource.  
> But when trying to test the cluster's behaviour upon such a failure via  
> "crm_resource -F -r drbdMysql:0 -H nde28", the cluster first tries to  
> demote the DRBD-resource, then also allready stop it, then the MySQL-IP,  
> the MySQL-mount and only finally MySQL. 
> The result of such a test isn't - due to failing demote and stop for the  
> DRBD-resource - hard to guess: DRBD-resource left in "started  
> (unmanaged) failed", rest of involved resources is stopped. 
>  
> I'm running Pacemaker 1.0.6, delivered with and running on SLES 11 with  
> HAE, both kept up-to-date with official update repositories (due to  
> company's directives). 
> In a few days SLES 11 SP1 shall be released, where I also hope for a  
> more recent version of Pacemaker, DRBD (still have to run 8.2.7) and  
> other HA-cluster-related stuff. 
>  
> I also allready posted about these issues in Novell's support forum with  
> lots of more details: 
> http://forums.novell.com/novell-product-support-forums/suse-linux-enterprise-serve 
> r-sles/sles-configure-administer/411152-constraint-issues-upon-failure-drbd-resource-su 
> se-linux-enterprise-hae-11-a.html 
>  
> So I'm wondering: 
> 1) Aren't constraint chains upon defining them also allready implicitly  
> exactly invertedly defined for stopping resources too?

Yes, but see below for a note on scores.

> 2) After my testing for workarounds: Why (seem to) do - in case of the  
> "failing" fundamental resources - order constraints for MS-resources's  
> stop-action have an effect, but neither those for MS-resources's  
> demote-action, nor those for (primitives's/?)clones's stop-action? Or is  
> that just for the MS-resources's stop-action being only the second  
> command anyway, and just therefor following my additional constraint?! 

I'm not sure about that.

> Current constraints: 
> colocation TEST_colocO2cb inf: cloneO2cb cloneDlm 
> colocation colocGrpMysql inf: grpMysql cloneMountMysql 
> colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master 
> colocation colocMountMysql_o2cb inf: cloneMountMysql cloneO2cb 
> colocation colocMountOpencms_drbd inf: cloneMountOpencms  
> msDrbdOpencms:Master 
> colocation colocMountOpencms_o2cb inf: cloneMountOpencms cloneO2cb 
> colocation colocTomcat inf: cloneTomcat cloneMountOpencms:Started 
> order TEST_orderO2cb 0: cloneDlm cloneO2cb 
> order orderGrpMysql 0: cloneMountMysql:start grpMysql 
> order orderMountMysql_drbd 0: msDrbdMysql:promote cloneMountMysql:start 
> order orderMountMysql_o2cb 0: cloneO2cb cloneMountMysql 
> order orderMountOpencms_drbd 0: msDrbdOpencms:promote  
> cloneMountOpencms:start 
> order orderMountOpencms_o2cb 0: cloneO2cb cloneMountOpencms 
> order orderTomcat 0: cloneMountOpencms:start cloneTomcat 

Try specifying "inf" for those ordering scores rather than zero.
Ordering constraints with score="0" are considered optional and only
have an effect when both resources are starting or stopping.  You
should also be able to leave out the ":start" specifiers as this is
implicit.

> Constraints added to "work around" at least the DRBD-resources left in  
> state "started (unmanaged) failed": 
> order GNAH_orderDrbdMysql_stop 0: cloneMountMysql:stop msDrbdMysql:stop 
> order GNAH_orderDrbdOpencms_stop 0: cloneMountOpencms:stop  
> msDrbdOpencms:stop 
> (Also tried similiar constraints for msDrbd*:demote and cloneDlm:stop,  
> but neither seemed to have an effect) 

Those shouldn't be necessary (I never tried putting ordering
constraints on stop ops before...)

Regards,

Tim

-- 
Tim Serong <tserong at novell.com>
Senior Clustering Engineer, OPS Engineering, Novell Inc.