[Pacemaker] Issues with constraints - working for start/stop, being ignored on "failures"
Cnut Jansen
work at cnutjansen.eu
Wed Jun 2 01:10:13 UTC 2010
Am 31.05.2010 05:47, schrieb Tim Serong:
> On 5/31/2010 at 12:57 PM, Cnut Jansen<work at cnutjansen.eu> wrote:
>
>> Current constraints:
>> colocation TEST_colocO2cb inf: cloneO2cb cloneDlm
>> colocation colocGrpMysql inf: grpMysql cloneMountMysql
>> colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
>> colocation colocMountMysql_o2cb inf: cloneMountMysql cloneO2cb
>> colocation colocMountOpencms_drbd inf: cloneMountOpencms msDrbdOpencms:Master
>> colocation colocMountOpencms_o2cb inf: cloneMountOpencms cloneO2cb
>> colocation colocTomcat inf: cloneTomcat cloneMountOpencms:Started
>> order TEST_orderO2cb 0: cloneDlm cloneO2cb
>> order orderGrpMysql 0: cloneMountMysql:start grpMysql
>> order orderMountMysql_drbd 0: msDrbdMysql:promote cloneMountMysql:start
>> order orderMountMysql_o2cb 0: cloneO2cb cloneMountMysql
>> order orderMountOpencms_drbd 0: msDrbdOpencms:promote cloneMountOpencms:start
>> order orderMountOpencms_o2cb 0: cloneO2cb cloneMountOpencms
>> order orderTomcat 0: cloneMountOpencms:start cloneTomcat
>>
> Try specifying "inf" for those ordering scores rather than zero.
> Ordering constraints with score="0" are considered optional and only
> have an effect when both resources are starting or stopping. You
> should also be able to leave out the ":start" specifiers as this is
> implicit.
>
About those ":start" specifiers on the mount-resources's order
constraints you're of course right, and I also allready knew about that.
They're just remains from some tests (probably seek for (other?)
workarounds or something) I did, which I only - due to their (to my
knowledge) harmless redundancy - so far allways forgot to remove again
when doing other, more relevant/important changes. you know, due to the
crm-shell's (which I currently use for editing my configuration)
canceling all resource monitor operations on the node the crm-shell is
started on, I prefer to avoid starting it as much as possible for
allways having to make sure I afterwards made all monitor operations run
again (i.e. switch cluster's maintenance-mode on/off or switch node to
standby and back online).
About those 0-scores, unfortunately they're necessary, since they're the
- afaik - official workaround for to prevent instances of clone
resources being also restarted on nodes where it's unnecessary to do so.
So with scores set to "inf" instead, when I for example put one node
into standby and/or back to online, most clone resources would also be
restarted on the other node. That's not acceptable for production.
This behaviour is according to what I remember having read only changed
in Pacemaker 1.0.7, which isn't shipped with SLES 11 yet. I'm hoping for
SLES 11 SP1 to change that, but haven't found any reliable informations
about its version of Pacemaker yet.
>> Constraints added to "work around" at least the DRBD-resources left in
>> state "started (unmanaged) failed":
>> order GNAH_orderDrbdMysql_stop 0: cloneMountMysql:stop msDrbdMysql:stop
>> order GNAH_orderDrbdOpencms_stop 0: cloneMountOpencms:stop
>> msDrbdOpencms:stop
>> (Also tried similiar constraints for msDrbd*:demote and cloneDlm:stop,
>> but neither seemed to have an effect)
>>
> Those shouldn't be necessary (I never tried putting ordering
> constraints on stop ops before...)
>
They shouldn't, right; that's also what I had expected. But as I
reported in my post above, they - for what reason ever - actually DO
have an effect! I simply don't know yet, why, and hope others maybe
having a clue. Anyway, so far, they're the most acceptable workaround I
know of for those strange constraint issues that made me we write here.
(Another workaround are start-delays on stop-operations, but such are -
for there dependency upon individual node's system- and
resource-performances - not acceptable for production)
I just still don't know if it's just a case of misconfiguration and/or
lack of knowledge/experience on my side, or if it's really a bug in
Pacemaker; maybe even a allready fixed one in more recent versions than
SLES 11's Pacemaker 1.0.6.
For in case someone would like to have a look onto it, I attached
complete cluster configuration, with and without the workaround and both
as XML and as output of "crm configure show".
(Please don't wonder about some quite high monitor operation intervals,
which were just set so when dumping the config; the tests done and
configs dumped when posting in Novell's support forum were done with
those timings being 1/100 of it and made no difference)
Here are also some grep'ed Syslogs:
"Failure" of DRBD resource, without workaround:
May 28 21:15:40 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation drbdMysql:0_asyncmon_0 (call=2449, rc=1, cib-update=4731,
confirmed=false) unknown error
May 28 21:15:40 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation drbdMysql:0_notify_0 (call=2450, rc=0, cib-update=4735,
confirmed=true) ok
May 28 21:15:40 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation drbdMysql:0_monitor_10000 (call=2442, status=1, cib-update=0,
confirmed=true) Cancelled
May 28 21:15:40 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation drbdMysql:0_demote_0 (call=2451, rc=1, cib-update=4736,
confirmed=true) unknown error
May 28 21:15:40 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation drbdMysql:0_notify_0 (call=2452, rc=0, cib-update=4737,
confirmed=true) ok
May 28 21:15:41 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mysql-ip_monitor_2000 (call=2446, status=1, cib-update=0,
confirmed=true) Cancelled
May 28 21:15:41 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation drbdMysql:0_notify_0 (call=2454, rc=0, cib-update=4739,
confirmed=true) ok
May 28 21:15:41 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mysql-ip_stop_0 (call=2453, rc=0, cib-update=4740,
confirmed=true) ok
May 28 21:15:41 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mysql_monitor_1000000 (call=2444, status=1, cib-update=0,
confirmed=true) Cancelled
May 28 21:15:41 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation drbdMysql:0_stop_0 (call=2455, rc=1, cib-update=4741,
confirmed=true) unknown error
May 28 21:15:44 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mysql_stop_0 (call=2456, rc=0, cib-update=4742, confirmed=true) ok
May 28 21:15:44 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mountMysql:0_monitor_1000000 (call=2421, status=1,
cib-update=0, confirmed=true) Cancelled
May 28 21:15:44 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mountMysql:0_stop_0 (call=2457, rc=0, cib-update=4744,
confirmed=true) ok
"Failure" of O2CB resource, without workaround:
May 28 21:20:32 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation dlm:0_asyncmon_0 (call=2476, rc=1, cib-update=4774,
confirmed=false) unknown error
May 28 21:20:32 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation dlm:0_monitor_10000 (call=2405, status=1, cib-update=0,
confirmed=true) Cancelled
May 28 21:20:52 nde28 crmd: [2846]: ERROR: process_lrm_event: LRM
operation dlm:0_stop_0 (2477) Timed Out (timeout=20000ms)
May 28 21:20:52 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mysql-ip_monitor_2000 (call=2475, status=1, cib-update=0,
confirmed=true) Cancelled
May 28 21:20:52 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation tomcat:0_monitor_500000 (call=2448, status=1, cib-update=0,
confirmed=true) Cancelled
May 28 21:20:53 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mysql-ip_stop_0 (call=2478, rc=0, cib-update=4782,
confirmed=true) ok
May 28 21:20:53 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mysql_monitor_1000000 (call=2473, status=1, cib-update=0,
confirmed=true) Cancelled
May 28 21:20:56 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation tomcat:0_stop_0 (call=2479, rc=0, cib-update=4783,
confirmed=true) ok
May 28 21:20:56 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mountOpencms:0_monitor_1000000 (call=2423, status=1,
cib-update=0, confirmed=true) Cancelled
May 28 21:20:56 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mountOpencms:0_stop_0 (call=2481, rc=0, cib-update=4784,
confirmed=true) ok
May 28 21:20:57 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mysql_stop_0 (call=2480, rc=0, cib-update=4785, confirmed=true) ok
May 28 21:20:57 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mountMysql:1_monitor_1000000 (call=2471, status=1,
cib-update=0, confirmed=true) Cancelled
May 28 21:20:57 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mountMysql:1_stop_0 (call=2482, rc=0, cib-update=4786,
confirmed=true) ok
May 28 21:20:57 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation o2cb:0_monitor_1000000 (call=2418, status=1, cib-update=0,
confirmed=true) Cancelled
May 28 21:20:58 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation o2cb:0_stop_0 (call=2483, rc=0, cib-update=4787,
confirmed=true) ok
Constraints used, without workaround:
colocation TEST_colocO2cb inf: cloneO2cb cloneDlm
colocation colocGrpMysql inf: grpMysql cloneMountMysql
colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
colocation colocMountMysql_o2cb inf: cloneMountMysql cloneO2cb
colocation colocMountOpencms_drbd inf: cloneMountOpencms
msDrbdOpencms:Master
colocation colocMountOpencms_o2cb inf: cloneMountOpencms cloneO2cb
colocation colocTomcat inf: cloneTomcat cloneMountOpencms:Started
order TEST_orderO2cb 0: cloneDlm cloneO2cb
order orderGrpMysql 0: cloneMountMysql:start grpMysql
order orderMountMysql_drbd 0: msDrbdMysql:promote cloneMountMysql:start
order orderMountMysql_o2cb 0: cloneO2cb cloneMountMysql
order orderMountOpencms_drbd 0: msDrbdOpencms:promote
cloneMountOpencms:start
order orderMountOpencms_o2cb 0: cloneO2cb cloneMountOpencms
order orderTomcat 0: cloneMountOpencms:start cloneTomcat
"Failure" of DRBD resource, with workaround:
May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation drbdMysql:0_asyncmon_0 (call=2355, rc=1, cib-update=4556,
confirmed=false) unknown error
May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation drbdMysql:0_notify_0 (call=2356, rc=0, cib-update=4560,
confirmed=true) ok
May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation drbdMysql:0_monitor_10000 (call=2330, status=1, cib-update=0,
confirmed=true) Cancelled
May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation drbdMysql:0_demote_0 (call=2357, rc=1, cib-update=4561,
confirmed=true) unknown error
May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation drbdMysql:0_notify_0 (call=2358, rc=0, cib-update=4562,
confirmed=true) ok
May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mysql-ip_monitor_2000 (call=2344, status=1, cib-update=0,
confirmed=true) Cancelled
May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation drbdMysql:0_notify_0 (call=2360, rc=0, cib-update=4564,
confirmed=true) ok
May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mysql-ip_stop_0 (call=2359, rc=0, cib-update=4565,
confirmed=true) ok
May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mysql_monitor_1000000 (call=2342, status=1, cib-update=0,
confirmed=true) Cancelled
May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mysql_stop_0 (call=2361, rc=0, cib-update=4566, confirmed=true) ok
May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mountMysql:0_monitor_1000000 (call=2337, status=1,
cib-update=0, confirmed=true) Cancelled
May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mountMysql:0_stop_0 (call=2362, rc=0, cib-update=4567,
confirmed=true) ok
May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation drbdMysql:0_stop_0 (call=2363, rc=0, cib-update=4568,
confirmed=true) ok
May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation drbdMysql:0_start_0 (call=2364, rc=0, cib-update=4570,
confirmed=true) ok
May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation drbdMysql:0_notify_0 (call=2365, rc=0, cib-update=4571,
confirmed=true) ok
May 28 20:57:20 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation drbdMysql:0_notify_0 (call=2366, rc=0, cib-update=4573,
confirmed=true) ok
May 28 20:57:21 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation drbdMysql:0_promote_0 (call=2367, rc=0, cib-update=4574,
confirmed=true) ok
May 28 20:57:21 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation drbdMysql:0_notify_0 (call=2368, rc=0, cib-update=4575,
confirmed=true) ok
May 28 20:57:23 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation drbdMysql:0_monitor_10000 (call=2369, rc=8, cib-update=4577,
confirmed=false) master
May 28 20:57:23 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mountMysql:1_start_0 (call=2370, rc=0, cib-update=4578,
confirmed=true) ok
May 28 20:57:23 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mountMysql:1_monitor_1000000 (call=2371, rc=0,
cib-update=4579, confirmed=false) ok
May 28 20:57:27 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mysql_start_0 (call=2372, rc=0, cib-update=4580,
confirmed=true) ok
May 28 20:57:27 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mysql_monitor_1000000 (call=2373, rc=0, cib-update=4581,
confirmed=false) ok
May 28 20:57:27 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mysql-ip_start_0 (call=2374, rc=0, cib-update=4582,
confirmed=true) ok
May 28 20:57:28 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mysql-ip_monitor_2000 (call=2375, rc=0, cib-update=4583,
confirmed=false) ok
"Failure" of O2CB resource, with workaround (not working for O2CB; maybe
for it not being a MS resource too):
May 28 21:01:31 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation dlm:0_asyncmon_0 (call=2376, rc=1, cib-update=4584,
confirmed=false) unknown error
May 28 21:01:32 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation dlm:0_monitor_10000 (call=2321, status=1, cib-update=0,
confirmed=true) Cancelled
May 28 21:01:52 nde28 crmd: [2846]: ERROR: process_lrm_event: LRM
operation dlm:0_stop_0 (2377) Timed Out (timeout=20000ms)
May 28 21:01:52 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mysql-ip_monitor_2000 (call=2375, status=1, cib-update=0,
confirmed=true) Cancelled
May 28 21:01:52 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation tomcat:0_monitor_500000 (call=2341, status=1, cib-update=0,
confirmed=true) Cancelled
May 28 21:01:52 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mysql-ip_stop_0 (call=2378, rc=0, cib-update=4592,
confirmed=true) ok
May 28 21:01:52 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mysql_monitor_1000000 (call=2373, status=1, cib-update=0,
confirmed=true) Cancelled
May 28 21:01:56 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation tomcat:0_stop_0 (call=2379, rc=0, cib-update=4593,
confirmed=true) ok
May 28 21:01:56 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mysql_stop_0 (call=2380, rc=0, cib-update=4594, confirmed=true) ok
May 28 21:01:56 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mountMysql:1_monitor_1000000 (call=2371, status=1,
cib-update=0, confirmed=true) Cancelled
May 28 21:01:57 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mountMysql:1_stop_0 (call=2381, rc=0, cib-update=4595,
confirmed=true) ok
May 28 21:01:58 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mountOpencms:0_monitor_1000000 (call=2339, status=1,
cib-update=0, confirmed=true) Cancelled
May 28 21:01:58 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation mountOpencms:0_stop_0 (call=2382, rc=0, cib-update=4596,
confirmed=true) ok
May 28 21:01:58 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation o2cb:0_monitor_1000000 (call=2334, status=1, cib-update=0,
confirmed=true) Cancelled
May 28 21:01:59 nde28 crmd: [2846]: info: process_lrm_event: LRM
operation o2cb:0_stop_0 (call=2383, rc=0, cib-update=4597,
confirmed=true) ok
Constraints used, with workaround (*GNAH_*) for DRBD resource and tried
"workaround" for O2CB resource:
colocation TEST_colocO2cb inf: cloneO2cb cloneDlm
colocation colocGrpMysql inf: grpMysql cloneMountMysql
colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
colocation colocMountMysql_o2cb inf: cloneMountMysql cloneO2cb
colocation colocMountOpencms_drbd inf: cloneMountOpencms
msDrbdOpencms:Master
colocation colocMountOpencms_o2cb inf: cloneMountOpencms cloneO2cb
colocation colocTomcat inf: cloneTomcat cloneMountOpencms:Started
order GNAH_orderDrbdMysql_stop 0: cloneMountMysql:stop msDrbdMysql:stop
order GNAH_orderDrbdOpencms_stop 0: cloneMountOpencms:stop
msDrbdOpencms:stop
order TESTGNAH_orderDlm_stop 0: cloneO2cb:stop cloneDlm:stop
order TEST_orderO2cb 0: cloneDlm cloneO2cb
order orderGrpMysql 0: cloneMountMysql:start grpMysql
order orderMountMysql_drbd 0: msDrbdMysql:promote cloneMountMysql:start
order orderMountMysql_o2cb 0: cloneO2cb cloneMountMysql
order orderMountOpencms_drbd 0: msDrbdOpencms:promote
cloneMountOpencms:start
order orderMountOpencms_o2cb 0: cloneO2cb cloneMountOpencms
order orderTomcat 0: cloneMountOpencms:start cloneTomcat
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: crm - config without workaround
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100602/76d0a4d3/attachment-0004.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: crm - config with workaround
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100602/76d0a4d3/attachment-0005.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: XML - config without workaround
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100602/76d0a4d3/attachment-0006.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: XML - config with workaround
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100602/76d0a4d3/attachment-0007.ksh>
More information about the Pacemaker
mailing list