[Pacemaker] Issues with constraints - working for start/stop, being ignored on "failures"

Tim Serong tserong at novell.com
Sun Jun 6 21:07:32 EDT 2010


On 6/2/2010 at 11:10 AM, Cnut Jansen <work at cnutjansen.eu> wrote: 
> Am 31.05.2010 05:47, schrieb Tim Serong: 
> > On 5/31/2010 at 12:57 PM, Cnut Jansen<work at cnutjansen.eu>  wrote: 
> > 
> >> Current constraints: 
> >> colocation TEST_colocO2cb inf: cloneO2cb cloneDlm 
> >> colocation colocGrpMysql inf: grpMysql cloneMountMysql 
> >> colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master 
> >> colocation colocMountMysql_o2cb inf: cloneMountMysql cloneO2cb 
> >> colocation colocMountOpencms_drbd inf: cloneMountOpencms  
> msDrbdOpencms:Master 
> >> colocation colocMountOpencms_o2cb inf: cloneMountOpencms cloneO2cb 
> >> colocation colocTomcat inf: cloneTomcat cloneMountOpencms:Started 
> >> order TEST_orderO2cb 0: cloneDlm cloneO2cb 
> >> order orderGrpMysql 0: cloneMountMysql:start grpMysql 
> >> order orderMountMysql_drbd 0: msDrbdMysql:promote cloneMountMysql:start 
> >> order orderMountMysql_o2cb 0: cloneO2cb cloneMountMysql 
> >> order orderMountOpencms_drbd 0: msDrbdOpencms:promote  
> cloneMountOpencms:start 
> >> order orderMountOpencms_o2cb 0: cloneO2cb cloneMountOpencms 
> >> order orderTomcat 0: cloneMountOpencms:start cloneTomcat 
> >> 
> > Try specifying "inf" for those ordering scores rather than zero. 
> > Ordering constraints with score="0" are considered optional and only 
> > have an effect when both resources are starting or stopping.  You 
> > should also be able to leave out the ":start" specifiers as this is 
> > implicit. 
> > 
> About those ":start" specifiers on the mount-resources's order  
> constraints you're of course right, and I also allready knew about that.  
> They're just remains from some tests (probably seek for (other?)  
> workarounds or something) I did, which I only - due to their (to my  
> knowledge) harmless redundancy - so far allways forgot to remove again  
> when doing other, more relevant/important changes. you know, due to the  
> crm-shell's (which I currently use for editing my configuration)  
> canceling all resource monitor operations on the node the crm-shell is  
> started on, I prefer to avoid starting it as much as possible for  
> allways having to make sure I afterwards made all monitor operations run  
> again (i.e. switch cluster's maintenance-mode on/off or switch node to  
> standby and back online). 

Say what?  The CRM shell shouldn't be canceling ops...

> About those 0-scores, unfortunately they're necessary, since they're the  
> - afaik - official workaround for to prevent instances of clone  
> resources being also restarted on nodes where it's unnecessary to do so.  
> So with scores set to "inf" instead, when I for example put one node  
> into standby and/or back to online, most clone resources would also be  
> restarted on the other node. That's not acceptable for production. 
> This behaviour is according to what I remember having read only changed  
> in Pacemaker 1.0.7, which isn't shipped with SLES 11 yet. I'm hoping for  
> SLES 11 SP1 to change that, but haven't found any reliable informations  
> about its version of Pacemaker yet. 

SLES 11 SP1 and the SLE High Availability Extension 11 SP1 are now
available for download from http://download.novell.com/ - this includes
Pacemaker 1.1.2.

> >> Constraints added to "work around" at least the DRBD-resources left in 
> >> state "started (unmanaged) failed": 
> >> order GNAH_orderDrbdMysql_stop 0: cloneMountMysql:stop msDrbdMysql:stop 
> >> order GNAH_orderDrbdOpencms_stop 0: cloneMountOpencms:stop 
> >> msDrbdOpencms:stop 
> >> (Also tried similiar constraints for msDrbd*:demote and cloneDlm:stop, 
> >> but neither seemed to have an effect) 
> >> 
> > Those shouldn't be necessary (I never tried putting ordering 
> > constraints on stop ops before...) 
> > 
> They shouldn't, right; that's also what I had expected. But as I  
> reported in my post above, they - for what reason ever - actually DO  
> have an effect! I simply don't know yet, why, and hope others maybe  
> having a clue. Anyway, so far, they're the most acceptable workaround I  
> know of for those strange constraint issues that made me we write here.  
> (Another workaround are start-delays on stop-operations, but such are -  
> for there dependency upon individual node's system- and  
> resource-performances - not acceptable for production) 
> I just still don't know if it's just a case of misconfiguration and/or  
> lack of knowledge/experience on my side, or if it's really a bug in  
> Pacemaker; maybe even a allready fixed one in more recent versions than  
> SLES 11's Pacemaker 1.0.6. 

Curious...  I'd suggest seeing if you can reproduce on SLE 11 SP1.

Regards,

Tim

> For in case someone would like to have a look onto it, I attached  
> complete cluster configuration, with and without the workaround and both  
> as XML and as output of "crm configure show". 
> (Please don't wonder about some quite high monitor operation intervals,  
> which were just set so when dumping the config; the tests done and  
> configs dumped when posting in Novell's support forum were done with  
> those timings being 1/100 of it and made no difference) 
>  
>  
> Here are also some grep'ed Syslogs: 
>  
> "Failure" of DRBD resource, without workaround: 
> May 28 21:15:40 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation drbdMysql:0_asyncmon_0 (call=2449, rc=1, cib-update=4731,  
> confirmed=false) unknown error 
> May 28 21:15:40 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation drbdMysql:0_notify_0 (call=2450, rc=0, cib-update=4735,  
> confirmed=true) ok 
> May 28 21:15:40 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation drbdMysql:0_monitor_10000 (call=2442, status=1, cib-update=0,  
> confirmed=true) Cancelled 
> May 28 21:15:40 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation drbdMysql:0_demote_0 (call=2451, rc=1, cib-update=4736,  
> confirmed=true) unknown error 
> May 28 21:15:40 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation drbdMysql:0_notify_0 (call=2452, rc=0, cib-update=4737,  
> confirmed=true) ok 
> May 28 21:15:41 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mysql-ip_monitor_2000 (call=2446, status=1, cib-update=0,  
> confirmed=true) Cancelled 
> May 28 21:15:41 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation drbdMysql:0_notify_0 (call=2454, rc=0, cib-update=4739,  
> confirmed=true) ok 
> May 28 21:15:41 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mysql-ip_stop_0 (call=2453, rc=0, cib-update=4740,  
> confirmed=true) ok 
> May 28 21:15:41 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mysql_monitor_1000000 (call=2444, status=1, cib-update=0,  
> confirmed=true) Cancelled 
> May 28 21:15:41 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation drbdMysql:0_stop_0 (call=2455, rc=1, cib-update=4741,  
> confirmed=true) unknown error 
> May 28 21:15:44 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mysql_stop_0 (call=2456, rc=0, cib-update=4742, confirmed=true) ok 
> May 28 21:15:44 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mountMysql:0_monitor_1000000 (call=2421, status=1,  
> cib-update=0, confirmed=true) Cancelled 
> May 28 21:15:44 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mountMysql:0_stop_0 (call=2457, rc=0, cib-update=4744,  
> confirmed=true) ok 
>  
> "Failure" of O2CB resource, without workaround: 
> May 28 21:20:32 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation dlm:0_asyncmon_0 (call=2476, rc=1, cib-update=4774,  
> confirmed=false) unknown error 
> May 28 21:20:32 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation dlm:0_monitor_10000 (call=2405, status=1, cib-update=0,  
> confirmed=true) Cancelled 
> May 28 21:20:52 nde28 crmd: [2846]: ERROR: process_lrm_event: LRM  
> operation dlm:0_stop_0 (2477) Timed Out (timeout=20000ms) 
> May 28 21:20:52 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mysql-ip_monitor_2000 (call=2475, status=1, cib-update=0,  
> confirmed=true) Cancelled 
> May 28 21:20:52 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation tomcat:0_monitor_500000 (call=2448, status=1, cib-update=0,  
> confirmed=true) Cancelled 
> May 28 21:20:53 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mysql-ip_stop_0 (call=2478, rc=0, cib-update=4782,  
> confirmed=true) ok 
> May 28 21:20:53 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mysql_monitor_1000000 (call=2473, status=1, cib-update=0,  
> confirmed=true) Cancelled 
> May 28 21:20:56 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation tomcat:0_stop_0 (call=2479, rc=0, cib-update=4783,  
> confirmed=true) ok 
> May 28 21:20:56 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mountOpencms:0_monitor_1000000 (call=2423, status=1,  
> cib-update=0, confirmed=true) Cancelled 
> May 28 21:20:56 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mountOpencms:0_stop_0 (call=2481, rc=0, cib-update=4784,  
> confirmed=true) ok 
> May 28 21:20:57 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mysql_stop_0 (call=2480, rc=0, cib-update=4785, confirmed=true) ok 
> May 28 21:20:57 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mountMysql:1_monitor_1000000 (call=2471, status=1,  
> cib-update=0, confirmed=true) Cancelled 
> May 28 21:20:57 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mountMysql:1_stop_0 (call=2482, rc=0, cib-update=4786,  
> confirmed=true) ok 
> May 28 21:20:57 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation o2cb:0_monitor_1000000 (call=2418, status=1, cib-update=0,  
> confirmed=true) Cancelled 
> May 28 21:20:58 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation o2cb:0_stop_0 (call=2483, rc=0, cib-update=4787,  
> confirmed=true) ok 
>  
> Constraints used, without workaround: 
> colocation TEST_colocO2cb inf: cloneO2cb cloneDlm 
> colocation colocGrpMysql inf: grpMysql cloneMountMysql 
> colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master 
> colocation colocMountMysql_o2cb inf: cloneMountMysql cloneO2cb 
> colocation colocMountOpencms_drbd inf: cloneMountOpencms  
> msDrbdOpencms:Master 
> colocation colocMountOpencms_o2cb inf: cloneMountOpencms cloneO2cb 
> colocation colocTomcat inf: cloneTomcat cloneMountOpencms:Started 
> order TEST_orderO2cb 0: cloneDlm cloneO2cb 
> order orderGrpMysql 0: cloneMountMysql:start grpMysql 
> order orderMountMysql_drbd 0: msDrbdMysql:promote cloneMountMysql:start 
> order orderMountMysql_o2cb 0: cloneO2cb cloneMountMysql 
> order orderMountOpencms_drbd 0: msDrbdOpencms:promote  
> cloneMountOpencms:start 
> order orderMountOpencms_o2cb 0: cloneO2cb cloneMountOpencms 
> order orderTomcat 0: cloneMountOpencms:start cloneTomcat 
>  
> "Failure" of DRBD resource, with workaround: 
> May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation drbdMysql:0_asyncmon_0 (call=2355, rc=1, cib-update=4556,  
> confirmed=false) unknown error 
> May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation drbdMysql:0_notify_0 (call=2356, rc=0, cib-update=4560,  
> confirmed=true) ok 
> May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation drbdMysql:0_monitor_10000 (call=2330, status=1, cib-update=0,  
> confirmed=true) Cancelled 
> May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation drbdMysql:0_demote_0 (call=2357, rc=1, cib-update=4561,  
> confirmed=true) unknown error 
> May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation drbdMysql:0_notify_0 (call=2358, rc=0, cib-update=4562,  
> confirmed=true) ok 
> May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mysql-ip_monitor_2000 (call=2344, status=1, cib-update=0,  
> confirmed=true) Cancelled 
> May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation drbdMysql:0_notify_0 (call=2360, rc=0, cib-update=4564,  
> confirmed=true) ok 
> May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mysql-ip_stop_0 (call=2359, rc=0, cib-update=4565,  
> confirmed=true) ok 
> May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mysql_monitor_1000000 (call=2342, status=1, cib-update=0,  
> confirmed=true) Cancelled 
> May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mysql_stop_0 (call=2361, rc=0, cib-update=4566, confirmed=true) ok 
> May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mountMysql:0_monitor_1000000 (call=2337, status=1,  
> cib-update=0, confirmed=true) Cancelled 
> May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mountMysql:0_stop_0 (call=2362, rc=0, cib-update=4567,  
> confirmed=true) ok 
> May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation drbdMysql:0_stop_0 (call=2363, rc=0, cib-update=4568,  
> confirmed=true) ok 
> May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation drbdMysql:0_start_0 (call=2364, rc=0, cib-update=4570,  
> confirmed=true) ok 
> May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation drbdMysql:0_notify_0 (call=2365, rc=0, cib-update=4571,  
> confirmed=true) ok 
> May 28 20:57:20 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation drbdMysql:0_notify_0 (call=2366, rc=0, cib-update=4573,  
> confirmed=true) ok 
> May 28 20:57:21 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation drbdMysql:0_promote_0 (call=2367, rc=0, cib-update=4574,  
> confirmed=true) ok 
> May 28 20:57:21 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation drbdMysql:0_notify_0 (call=2368, rc=0, cib-update=4575,  
> confirmed=true) ok 
> May 28 20:57:23 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation drbdMysql:0_monitor_10000 (call=2369, rc=8, cib-update=4577,  
> confirmed=false) master 
> May 28 20:57:23 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mountMysql:1_start_0 (call=2370, rc=0, cib-update=4578,  
> confirmed=true) ok 
> May 28 20:57:23 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mountMysql:1_monitor_1000000 (call=2371, rc=0,  
> cib-update=4579, confirmed=false) ok 
> May 28 20:57:27 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mysql_start_0 (call=2372, rc=0, cib-update=4580,  
> confirmed=true) ok 
> May 28 20:57:27 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mysql_monitor_1000000 (call=2373, rc=0, cib-update=4581,  
> confirmed=false) ok 
> May 28 20:57:27 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mysql-ip_start_0 (call=2374, rc=0, cib-update=4582,  
> confirmed=true) ok 
> May 28 20:57:28 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mysql-ip_monitor_2000 (call=2375, rc=0, cib-update=4583,  
> confirmed=false) ok 
>  
> "Failure" of O2CB resource, with workaround (not working for O2CB; maybe  
> for it not being a MS resource too): 
> May 28 21:01:31 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation dlm:0_asyncmon_0 (call=2376, rc=1, cib-update=4584,  
> confirmed=false) unknown error 
> May 28 21:01:32 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation dlm:0_monitor_10000 (call=2321, status=1, cib-update=0,  
> confirmed=true) Cancelled 
> May 28 21:01:52 nde28 crmd: [2846]: ERROR: process_lrm_event: LRM  
> operation dlm:0_stop_0 (2377) Timed Out (timeout=20000ms) 
> May 28 21:01:52 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mysql-ip_monitor_2000 (call=2375, status=1, cib-update=0,  
> confirmed=true) Cancelled 
> May 28 21:01:52 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation tomcat:0_monitor_500000 (call=2341, status=1, cib-update=0,  
> confirmed=true) Cancelled 
> May 28 21:01:52 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mysql-ip_stop_0 (call=2378, rc=0, cib-update=4592,  
> confirmed=true) ok 
> May 28 21:01:52 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mysql_monitor_1000000 (call=2373, status=1, cib-update=0,  
> confirmed=true) Cancelled 
> May 28 21:01:56 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation tomcat:0_stop_0 (call=2379, rc=0, cib-update=4593,  
> confirmed=true) ok 
> May 28 21:01:56 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mysql_stop_0 (call=2380, rc=0, cib-update=4594, confirmed=true) ok 
> May 28 21:01:56 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mountMysql:1_monitor_1000000 (call=2371, status=1,  
> cib-update=0, confirmed=true) Cancelled 
> May 28 21:01:57 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mountMysql:1_stop_0 (call=2381, rc=0, cib-update=4595,  
> confirmed=true) ok 
> May 28 21:01:58 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mountOpencms:0_monitor_1000000 (call=2339, status=1,  
> cib-update=0, confirmed=true) Cancelled 
> May 28 21:01:58 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation mountOpencms:0_stop_0 (call=2382, rc=0, cib-update=4596,  
> confirmed=true) ok 
> May 28 21:01:58 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation o2cb:0_monitor_1000000 (call=2334, status=1, cib-update=0,  
> confirmed=true) Cancelled 
> May 28 21:01:59 nde28 crmd: [2846]: info: process_lrm_event: LRM  
> operation o2cb:0_stop_0 (call=2383, rc=0, cib-update=4597,  
> confirmed=true) ok 
>  
> Constraints used, with workaround (*GNAH_*) for DRBD resource and tried  
> "workaround" for O2CB resource: 
> colocation TEST_colocO2cb inf: cloneO2cb cloneDlm 
> colocation colocGrpMysql inf: grpMysql cloneMountMysql 
> colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master 
> colocation colocMountMysql_o2cb inf: cloneMountMysql cloneO2cb 
> colocation colocMountOpencms_drbd inf: cloneMountOpencms  
> msDrbdOpencms:Master 
> colocation colocMountOpencms_o2cb inf: cloneMountOpencms cloneO2cb 
> colocation colocTomcat inf: cloneTomcat cloneMountOpencms:Started 
> order GNAH_orderDrbdMysql_stop 0: cloneMountMysql:stop msDrbdMysql:stop 
> order GNAH_orderDrbdOpencms_stop 0: cloneMountOpencms:stop  
> msDrbdOpencms:stop 
> order TESTGNAH_orderDlm_stop 0: cloneO2cb:stop cloneDlm:stop 
> order TEST_orderO2cb 0: cloneDlm cloneO2cb 
> order orderGrpMysql 0: cloneMountMysql:start grpMysql 
> order orderMountMysql_drbd 0: msDrbdMysql:promote cloneMountMysql:start 
> order orderMountMysql_o2cb 0: cloneO2cb cloneMountMysql 
> order orderMountOpencms_drbd 0: msDrbdOpencms:promote  
> cloneMountOpencms:start 
> order orderMountOpencms_o2cb 0: cloneO2cb cloneMountOpencms 
> order orderTomcat 0: cloneMountOpencms:start cloneTomcat 
>  






More information about the Pacemaker mailing list