[Pacemaker] Issues with constraints - working for start/stop, being ignored on "failures"
Tim Serong
tserong at novell.com
Mon Jun 7 01:07:32 UTC 2010
On 6/2/2010 at 11:10 AM, Cnut Jansen <work at cnutjansen.eu> wrote:
> Am 31.05.2010 05:47, schrieb Tim Serong:
> > On 5/31/2010 at 12:57 PM, Cnut Jansen<work at cnutjansen.eu> wrote:
> >
> >> Current constraints:
> >> colocation TEST_colocO2cb inf: cloneO2cb cloneDlm
> >> colocation colocGrpMysql inf: grpMysql cloneMountMysql
> >> colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
> >> colocation colocMountMysql_o2cb inf: cloneMountMysql cloneO2cb
> >> colocation colocMountOpencms_drbd inf: cloneMountOpencms
> msDrbdOpencms:Master
> >> colocation colocMountOpencms_o2cb inf: cloneMountOpencms cloneO2cb
> >> colocation colocTomcat inf: cloneTomcat cloneMountOpencms:Started
> >> order TEST_orderO2cb 0: cloneDlm cloneO2cb
> >> order orderGrpMysql 0: cloneMountMysql:start grpMysql
> >> order orderMountMysql_drbd 0: msDrbdMysql:promote cloneMountMysql:start
> >> order orderMountMysql_o2cb 0: cloneO2cb cloneMountMysql
> >> order orderMountOpencms_drbd 0: msDrbdOpencms:promote
> cloneMountOpencms:start
> >> order orderMountOpencms_o2cb 0: cloneO2cb cloneMountOpencms
> >> order orderTomcat 0: cloneMountOpencms:start cloneTomcat
> >>
> > Try specifying "inf" for those ordering scores rather than zero.
> > Ordering constraints with score="0" are considered optional and only
> > have an effect when both resources are starting or stopping. You
> > should also be able to leave out the ":start" specifiers as this is
> > implicit.
> >
> About those ":start" specifiers on the mount-resources's order
> constraints you're of course right, and I also allready knew about that.
> They're just remains from some tests (probably seek for (other?)
> workarounds or something) I did, which I only - due to their (to my
> knowledge) harmless redundancy - so far allways forgot to remove again
> when doing other, more relevant/important changes. you know, due to the
> crm-shell's (which I currently use for editing my configuration)
> canceling all resource monitor operations on the node the crm-shell is
> started on, I prefer to avoid starting it as much as possible for
> allways having to make sure I afterwards made all monitor operations run
> again (i.e. switch cluster's maintenance-mode on/off or switch node to
> standby and back online).
Say what? The CRM shell shouldn't be canceling ops...
> About those 0-scores, unfortunately they're necessary, since they're the
> - afaik - official workaround for to prevent instances of clone
> resources being also restarted on nodes where it's unnecessary to do so.
> So with scores set to "inf" instead, when I for example put one node
> into standby and/or back to online, most clone resources would also be
> restarted on the other node. That's not acceptable for production.
> This behaviour is according to what I remember having read only changed
> in Pacemaker 1.0.7, which isn't shipped with SLES 11 yet. I'm hoping for
> SLES 11 SP1 to change that, but haven't found any reliable informations
> about its version of Pacemaker yet.
SLES 11 SP1 and the SLE High Availability Extension 11 SP1 are now
available for download from http://download.novell.com/ - this includes
Pacemaker 1.1.2.
> >> Constraints added to "work around" at least the DRBD-resources left in
> >> state "started (unmanaged) failed":
> >> order GNAH_orderDrbdMysql_stop 0: cloneMountMysql:stop msDrbdMysql:stop
> >> order GNAH_orderDrbdOpencms_stop 0: cloneMountOpencms:stop
> >> msDrbdOpencms:stop
> >> (Also tried similiar constraints for msDrbd*:demote and cloneDlm:stop,
> >> but neither seemed to have an effect)
> >>
> > Those shouldn't be necessary (I never tried putting ordering
> > constraints on stop ops before...)
> >
> They shouldn't, right; that's also what I had expected. But as I
> reported in my post above, they - for what reason ever - actually DO
> have an effect! I simply don't know yet, why, and hope others maybe
> having a clue. Anyway, so far, they're the most acceptable workaround I
> know of for those strange constraint issues that made me we write here.
> (Another workaround are start-delays on stop-operations, but such are -
> for there dependency upon individual node's system- and
> resource-performances - not acceptable for production)
> I just still don't know if it's just a case of misconfiguration and/or
> lack of knowledge/experience on my side, or if it's really a bug in
> Pacemaker; maybe even a allready fixed one in more recent versions than
> SLES 11's Pacemaker 1.0.6.
Curious... I'd suggest seeing if you can reproduce on SLE 11 SP1.
Regards,
Tim
> For in case someone would like to have a look onto it, I attached
> complete cluster configuration, with and without the workaround and both
> as XML and as output of "crm configure show".
> (Please don't wonder about some quite high monitor operation intervals,
> which were just set so when dumping the config; the tests done and
> configs dumped when posting in Novell's support forum were done with
> those timings being 1/100 of it and made no difference)
>
>
> Here are also some grep'ed Syslogs:
>
> "Failure" of DRBD resource, without workaround:
> May 28 21:15:40 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation drbdMysql:0_asyncmon_0 (call=2449, rc=1, cib-update=4731,
> confirmed=false) unknown error
> May 28 21:15:40 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation drbdMysql:0_notify_0 (call=2450, rc=0, cib-update=4735,
> confirmed=true) ok
> May 28 21:15:40 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation drbdMysql:0_monitor_10000 (call=2442, status=1, cib-update=0,
> confirmed=true) Cancelled
> May 28 21:15:40 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation drbdMysql:0_demote_0 (call=2451, rc=1, cib-update=4736,
> confirmed=true) unknown error
> May 28 21:15:40 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation drbdMysql:0_notify_0 (call=2452, rc=0, cib-update=4737,
> confirmed=true) ok
> May 28 21:15:41 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mysql-ip_monitor_2000 (call=2446, status=1, cib-update=0,
> confirmed=true) Cancelled
> May 28 21:15:41 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation drbdMysql:0_notify_0 (call=2454, rc=0, cib-update=4739,
> confirmed=true) ok
> May 28 21:15:41 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mysql-ip_stop_0 (call=2453, rc=0, cib-update=4740,
> confirmed=true) ok
> May 28 21:15:41 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mysql_monitor_1000000 (call=2444, status=1, cib-update=0,
> confirmed=true) Cancelled
> May 28 21:15:41 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation drbdMysql:0_stop_0 (call=2455, rc=1, cib-update=4741,
> confirmed=true) unknown error
> May 28 21:15:44 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mysql_stop_0 (call=2456, rc=0, cib-update=4742, confirmed=true) ok
> May 28 21:15:44 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mountMysql:0_monitor_1000000 (call=2421, status=1,
> cib-update=0, confirmed=true) Cancelled
> May 28 21:15:44 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mountMysql:0_stop_0 (call=2457, rc=0, cib-update=4744,
> confirmed=true) ok
>
> "Failure" of O2CB resource, without workaround:
> May 28 21:20:32 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation dlm:0_asyncmon_0 (call=2476, rc=1, cib-update=4774,
> confirmed=false) unknown error
> May 28 21:20:32 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation dlm:0_monitor_10000 (call=2405, status=1, cib-update=0,
> confirmed=true) Cancelled
> May 28 21:20:52 nde28 crmd: [2846]: ERROR: process_lrm_event: LRM
> operation dlm:0_stop_0 (2477) Timed Out (timeout=20000ms)
> May 28 21:20:52 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mysql-ip_monitor_2000 (call=2475, status=1, cib-update=0,
> confirmed=true) Cancelled
> May 28 21:20:52 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation tomcat:0_monitor_500000 (call=2448, status=1, cib-update=0,
> confirmed=true) Cancelled
> May 28 21:20:53 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mysql-ip_stop_0 (call=2478, rc=0, cib-update=4782,
> confirmed=true) ok
> May 28 21:20:53 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mysql_monitor_1000000 (call=2473, status=1, cib-update=0,
> confirmed=true) Cancelled
> May 28 21:20:56 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation tomcat:0_stop_0 (call=2479, rc=0, cib-update=4783,
> confirmed=true) ok
> May 28 21:20:56 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mountOpencms:0_monitor_1000000 (call=2423, status=1,
> cib-update=0, confirmed=true) Cancelled
> May 28 21:20:56 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mountOpencms:0_stop_0 (call=2481, rc=0, cib-update=4784,
> confirmed=true) ok
> May 28 21:20:57 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mysql_stop_0 (call=2480, rc=0, cib-update=4785, confirmed=true) ok
> May 28 21:20:57 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mountMysql:1_monitor_1000000 (call=2471, status=1,
> cib-update=0, confirmed=true) Cancelled
> May 28 21:20:57 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mountMysql:1_stop_0 (call=2482, rc=0, cib-update=4786,
> confirmed=true) ok
> May 28 21:20:57 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation o2cb:0_monitor_1000000 (call=2418, status=1, cib-update=0,
> confirmed=true) Cancelled
> May 28 21:20:58 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation o2cb:0_stop_0 (call=2483, rc=0, cib-update=4787,
> confirmed=true) ok
>
> Constraints used, without workaround:
> colocation TEST_colocO2cb inf: cloneO2cb cloneDlm
> colocation colocGrpMysql inf: grpMysql cloneMountMysql
> colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
> colocation colocMountMysql_o2cb inf: cloneMountMysql cloneO2cb
> colocation colocMountOpencms_drbd inf: cloneMountOpencms
> msDrbdOpencms:Master
> colocation colocMountOpencms_o2cb inf: cloneMountOpencms cloneO2cb
> colocation colocTomcat inf: cloneTomcat cloneMountOpencms:Started
> order TEST_orderO2cb 0: cloneDlm cloneO2cb
> order orderGrpMysql 0: cloneMountMysql:start grpMysql
> order orderMountMysql_drbd 0: msDrbdMysql:promote cloneMountMysql:start
> order orderMountMysql_o2cb 0: cloneO2cb cloneMountMysql
> order orderMountOpencms_drbd 0: msDrbdOpencms:promote
> cloneMountOpencms:start
> order orderMountOpencms_o2cb 0: cloneO2cb cloneMountOpencms
> order orderTomcat 0: cloneMountOpencms:start cloneTomcat
>
> "Failure" of DRBD resource, with workaround:
> May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation drbdMysql:0_asyncmon_0 (call=2355, rc=1, cib-update=4556,
> confirmed=false) unknown error
> May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation drbdMysql:0_notify_0 (call=2356, rc=0, cib-update=4560,
> confirmed=true) ok
> May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation drbdMysql:0_monitor_10000 (call=2330, status=1, cib-update=0,
> confirmed=true) Cancelled
> May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation drbdMysql:0_demote_0 (call=2357, rc=1, cib-update=4561,
> confirmed=true) unknown error
> May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation drbdMysql:0_notify_0 (call=2358, rc=0, cib-update=4562,
> confirmed=true) ok
> May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mysql-ip_monitor_2000 (call=2344, status=1, cib-update=0,
> confirmed=true) Cancelled
> May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation drbdMysql:0_notify_0 (call=2360, rc=0, cib-update=4564,
> confirmed=true) ok
> May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mysql-ip_stop_0 (call=2359, rc=0, cib-update=4565,
> confirmed=true) ok
> May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mysql_monitor_1000000 (call=2342, status=1, cib-update=0,
> confirmed=true) Cancelled
> May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mysql_stop_0 (call=2361, rc=0, cib-update=4566, confirmed=true) ok
> May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mountMysql:0_monitor_1000000 (call=2337, status=1,
> cib-update=0, confirmed=true) Cancelled
> May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mountMysql:0_stop_0 (call=2362, rc=0, cib-update=4567,
> confirmed=true) ok
> May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation drbdMysql:0_stop_0 (call=2363, rc=0, cib-update=4568,
> confirmed=true) ok
> May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation drbdMysql:0_start_0 (call=2364, rc=0, cib-update=4570,
> confirmed=true) ok
> May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation drbdMysql:0_notify_0 (call=2365, rc=0, cib-update=4571,
> confirmed=true) ok
> May 28 20:57:20 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation drbdMysql:0_notify_0 (call=2366, rc=0, cib-update=4573,
> confirmed=true) ok
> May 28 20:57:21 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation drbdMysql:0_promote_0 (call=2367, rc=0, cib-update=4574,
> confirmed=true) ok
> May 28 20:57:21 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation drbdMysql:0_notify_0 (call=2368, rc=0, cib-update=4575,
> confirmed=true) ok
> May 28 20:57:23 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation drbdMysql:0_monitor_10000 (call=2369, rc=8, cib-update=4577,
> confirmed=false) master
> May 28 20:57:23 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mountMysql:1_start_0 (call=2370, rc=0, cib-update=4578,
> confirmed=true) ok
> May 28 20:57:23 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mountMysql:1_monitor_1000000 (call=2371, rc=0,
> cib-update=4579, confirmed=false) ok
> May 28 20:57:27 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mysql_start_0 (call=2372, rc=0, cib-update=4580,
> confirmed=true) ok
> May 28 20:57:27 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mysql_monitor_1000000 (call=2373, rc=0, cib-update=4581,
> confirmed=false) ok
> May 28 20:57:27 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mysql-ip_start_0 (call=2374, rc=0, cib-update=4582,
> confirmed=true) ok
> May 28 20:57:28 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mysql-ip_monitor_2000 (call=2375, rc=0, cib-update=4583,
> confirmed=false) ok
>
> "Failure" of O2CB resource, with workaround (not working for O2CB; maybe
> for it not being a MS resource too):
> May 28 21:01:31 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation dlm:0_asyncmon_0 (call=2376, rc=1, cib-update=4584,
> confirmed=false) unknown error
> May 28 21:01:32 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation dlm:0_monitor_10000 (call=2321, status=1, cib-update=0,
> confirmed=true) Cancelled
> May 28 21:01:52 nde28 crmd: [2846]: ERROR: process_lrm_event: LRM
> operation dlm:0_stop_0 (2377) Timed Out (timeout=20000ms)
> May 28 21:01:52 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mysql-ip_monitor_2000 (call=2375, status=1, cib-update=0,
> confirmed=true) Cancelled
> May 28 21:01:52 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation tomcat:0_monitor_500000 (call=2341, status=1, cib-update=0,
> confirmed=true) Cancelled
> May 28 21:01:52 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mysql-ip_stop_0 (call=2378, rc=0, cib-update=4592,
> confirmed=true) ok
> May 28 21:01:52 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mysql_monitor_1000000 (call=2373, status=1, cib-update=0,
> confirmed=true) Cancelled
> May 28 21:01:56 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation tomcat:0_stop_0 (call=2379, rc=0, cib-update=4593,
> confirmed=true) ok
> May 28 21:01:56 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mysql_stop_0 (call=2380, rc=0, cib-update=4594, confirmed=true) ok
> May 28 21:01:56 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mountMysql:1_monitor_1000000 (call=2371, status=1,
> cib-update=0, confirmed=true) Cancelled
> May 28 21:01:57 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mountMysql:1_stop_0 (call=2381, rc=0, cib-update=4595,
> confirmed=true) ok
> May 28 21:01:58 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mountOpencms:0_monitor_1000000 (call=2339, status=1,
> cib-update=0, confirmed=true) Cancelled
> May 28 21:01:58 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation mountOpencms:0_stop_0 (call=2382, rc=0, cib-update=4596,
> confirmed=true) ok
> May 28 21:01:58 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation o2cb:0_monitor_1000000 (call=2334, status=1, cib-update=0,
> confirmed=true) Cancelled
> May 28 21:01:59 nde28 crmd: [2846]: info: process_lrm_event: LRM
> operation o2cb:0_stop_0 (call=2383, rc=0, cib-update=4597,
> confirmed=true) ok
>
> Constraints used, with workaround (*GNAH_*) for DRBD resource and tried
> "workaround" for O2CB resource:
> colocation TEST_colocO2cb inf: cloneO2cb cloneDlm
> colocation colocGrpMysql inf: grpMysql cloneMountMysql
> colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
> colocation colocMountMysql_o2cb inf: cloneMountMysql cloneO2cb
> colocation colocMountOpencms_drbd inf: cloneMountOpencms
> msDrbdOpencms:Master
> colocation colocMountOpencms_o2cb inf: cloneMountOpencms cloneO2cb
> colocation colocTomcat inf: cloneTomcat cloneMountOpencms:Started
> order GNAH_orderDrbdMysql_stop 0: cloneMountMysql:stop msDrbdMysql:stop
> order GNAH_orderDrbdOpencms_stop 0: cloneMountOpencms:stop
> msDrbdOpencms:stop
> order TESTGNAH_orderDlm_stop 0: cloneO2cb:stop cloneDlm:stop
> order TEST_orderO2cb 0: cloneDlm cloneO2cb
> order orderGrpMysql 0: cloneMountMysql:start grpMysql
> order orderMountMysql_drbd 0: msDrbdMysql:promote cloneMountMysql:start
> order orderMountMysql_o2cb 0: cloneO2cb cloneMountMysql
> order orderMountOpencms_drbd 0: msDrbdOpencms:promote
> cloneMountOpencms:start
> order orderMountOpencms_o2cb 0: cloneO2cb cloneMountOpencms
> order orderTomcat 0: cloneMountOpencms:start cloneTomcat
>
More information about the Pacemaker
mailing list