[Pacemaker] Issues with constraints - working for start/stop, being ignored on "failures"

Mon May 31 11:50:16 UTC 2010

Hi,

On Sun, May 30, 2010 at 09:47:37PM -0600, Tim Serong wrote:
> On 5/31/2010 at 12:57 PM, Cnut Jansen <work at cnutjansen.eu> wrote: 
> > Hi, 
> >  
> > I'm not sure if it's really some kind of bug (maybe allready widely  
> > known and even allready fixed in more recent versions) or simply  
> > misconfiguration and lack of knowledge and experience or something  
> > (since I'm still quite new to HA-computing), but I have issues with  
> > Pacemaker about the order constraints I defined, can't get rid of them  
> > and only partially "work around" them. But such workarounds don't really  
> > seem "as intended/designed" to me... 
> >  
> > The problem is that even though upon starting / switching-to-online and  
> > stopping / switching-to-standby the nodes / cluster, all constraint  
> > chains work as they should, and so do they even upon directly stopping  
> > the troubling fundamental resources, the DRBD- and DLM-resources, which  
> > are the bases of my constraint chains. Therefor when i.e. a failure  
> > occurs in the DRBD-resource for MySQL's DataDir, the cluster should  
> > first stop the MySQL-resource-group (MySQL + IP-adress), then stop the  
> > MySQL-mount-resource, then demote and finally stop the DRBD-resource.  
> > But when trying to test the cluster's behaviour upon such a failure via  
> > "crm_resource -F -r drbdMysql:0 -H nde28", the cluster first tries to  
> > demote the DRBD-resource, then also allready stop it, then the MySQL-IP,  
> > the MySQL-mount and only finally MySQL. 
> > The result of such a test isn't - due to failing demote and stop for the  
> > DRBD-resource - hard to guess: DRBD-resource left in "started  
> > (unmanaged) failed", rest of involved resources is stopped. 
> >  
> > I'm running Pacemaker 1.0.6, delivered with and running on SLES 11 with  
> > HAE, both kept up-to-date with official update repositories (due to  
> > company's directives). 
> > In a few days SLES 11 SP1 shall be released, where I also hope for a  
> > more recent version of Pacemaker, DRBD (still have to run 8.2.7) and  
> > other HA-cluster-related stuff. 
> >  
> > I also allready posted about these issues in Novell's support forum with  
> > lots of more details: 
> > http://forums.novell.com/novell-product-support-forums/suse-linux-enterprise-serve 
> > r-sles/sles-configure-administer/411152-constraint-issues-upon-failure-drbd-resource-su 
> > se-linux-enterprise-hae-11-a.html 
> >  
> > So I'm wondering: 
> > 1) Aren't constraint chains upon defining them also allready implicitly  
> > exactly invertedly defined for stopping resources too?
> 
> Yes, but see below for a note on scores.
> 
> > 2) After my testing for workarounds: Why (seem to) do - in case of the  
> > "failing" fundamental resources - order constraints for MS-resources's  
> > stop-action have an effect, but neither those for MS-resources's  
> > demote-action, nor those for (primitives's/?)clones's stop-action? Or is  
> > that just for the MS-resources's stop-action being only the second  
> > command anyway, and just therefor following my additional constraint?! 
> 
> I'm not sure about that.
> 
> > Current constraints: 
> > colocation TEST_colocO2cb inf: cloneO2cb cloneDlm 
> > colocation colocGrpMysql inf: grpMysql cloneMountMysql 
> > colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master 
> > colocation colocMountMysql_o2cb inf: cloneMountMysql cloneO2cb 
> > colocation colocMountOpencms_drbd inf: cloneMountOpencms  
> > msDrbdOpencms:Master 
> > colocation colocMountOpencms_o2cb inf: cloneMountOpencms cloneO2cb 
> > colocation colocTomcat inf: cloneTomcat cloneMountOpencms:Started 
> > order TEST_orderO2cb 0: cloneDlm cloneO2cb 
> > order orderGrpMysql 0: cloneMountMysql:start grpMysql 
> > order orderMountMysql_drbd 0: msDrbdMysql:promote cloneMountMysql:start 
> > order orderMountMysql_o2cb 0: cloneO2cb cloneMountMysql 
> > order orderMountOpencms_drbd 0: msDrbdOpencms:promote  
> > cloneMountOpencms:start 
> > order orderMountOpencms_o2cb 0: cloneO2cb cloneMountOpencms 
> > order orderTomcat 0: cloneMountOpencms:start cloneTomcat 
> 
> Try specifying "inf" for those ordering scores rather than zero.
> Ordering constraints with score="0" are considered optional and only
> have an effect when both resources are starting or stopping.  You
> should also be able to leave out the ":start" specifiers as this is
> implicit.

Actually not in cases where it is the second resource and the
first resource has a different action specified. The second
action (or state) defaults to the first action (or state).

> > Constraints added to "work around" at least the DRBD-resources left in  
> > state "started (unmanaged) failed": 
> > order GNAH_orderDrbdMysql_stop 0: cloneMountMysql:stop msDrbdMysql:stop 
> > order GNAH_orderDrbdOpencms_stop 0: cloneMountOpencms:stop  
> > msDrbdOpencms:stop 
> > (Also tried similiar constraints for msDrbd*:demote and cloneDlm:stop,  
> > but neither seemed to have an effect) 
> 
> Those shouldn't be necessary (I never tried putting ordering
> constraints on stop ops before...)

Right. They are implied by the other constraints.

Thanks,

Dejan

> Regards,
> 
> Tim
> 
> 
> -- 
> Tim Serong <tserong at novell.com>
> Senior Clustering Engineer, OPS Engineering, Novell Inc.
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf