[Pacemaker] Issues with constraints - working for start/stop, being ignored on "failures"

Cnut Jansen work at cnutjansen.eu
Tue Jun 1 21:10:13 EDT 2010


Am 31.05.2010 05:47, schrieb Tim Serong:
> On 5/31/2010 at 12:57 PM, Cnut Jansen<work at cnutjansen.eu>  wrote:
>
>> Current constraints:
>> colocation TEST_colocO2cb inf: cloneO2cb cloneDlm
>> colocation colocGrpMysql inf: grpMysql cloneMountMysql
>> colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
>> colocation colocMountMysql_o2cb inf: cloneMountMysql cloneO2cb
>> colocation colocMountOpencms_drbd inf: cloneMountOpencms msDrbdOpencms:Master
>> colocation colocMountOpencms_o2cb inf: cloneMountOpencms cloneO2cb
>> colocation colocTomcat inf: cloneTomcat cloneMountOpencms:Started
>> order TEST_orderO2cb 0: cloneDlm cloneO2cb
>> order orderGrpMysql 0: cloneMountMysql:start grpMysql
>> order orderMountMysql_drbd 0: msDrbdMysql:promote cloneMountMysql:start
>> order orderMountMysql_o2cb 0: cloneO2cb cloneMountMysql
>> order orderMountOpencms_drbd 0: msDrbdOpencms:promote cloneMountOpencms:start
>> order orderMountOpencms_o2cb 0: cloneO2cb cloneMountOpencms
>> order orderTomcat 0: cloneMountOpencms:start cloneTomcat
>>
> Try specifying "inf" for those ordering scores rather than zero.
> Ordering constraints with score="0" are considered optional and only
> have an effect when both resources are starting or stopping.  You
> should also be able to leave out the ":start" specifiers as this is
> implicit.
>
About those ":start" specifiers on the mount-resources's order 
constraints you're of course right, and I also allready knew about that. 
They're just remains from some tests (probably seek for (other?) 
workarounds or something) I did, which I only - due to their (to my 
knowledge) harmless redundancy - so far allways forgot to remove again 
when doing other, more relevant/important changes. you know, due to the 
crm-shell's (which I currently use for editing my configuration) 
canceling all resource monitor operations on the node the crm-shell is 
started on, I prefer to avoid starting it as much as possible for 
allways having to make sure I afterwards made all monitor operations run 
again (i.e. switch cluster's maintenance-mode on/off or switch node to 
standby and back online).

About those 0-scores, unfortunately they're necessary, since they're the 
- afaik - official workaround for to prevent instances of clone 
resources being also restarted on nodes where it's unnecessary to do so. 
So with scores set to "inf" instead, when I for example put one node 
into standby and/or back to online, most clone resources would also be 
restarted on the other node. That's not acceptable for production.
This behaviour is according to what I remember having read only changed 
in Pacemaker 1.0.7, which isn't shipped with SLES 11 yet. I'm hoping for 
SLES 11 SP1 to change that, but haven't found any reliable informations 
about its version of Pacemaker yet.

>> Constraints added to "work around" at least the DRBD-resources left in
>> state "started (unmanaged) failed":
>> order GNAH_orderDrbdMysql_stop 0: cloneMountMysql:stop msDrbdMysql:stop
>> order GNAH_orderDrbdOpencms_stop 0: cloneMountOpencms:stop
>> msDrbdOpencms:stop
>> (Also tried similiar constraints for msDrbd*:demote and cloneDlm:stop,
>> but neither seemed to have an effect)
>>
> Those shouldn't be necessary (I never tried putting ordering
> constraints on stop ops before...)
>
They shouldn't, right; that's also what I had expected. But as I 
reported in my post above, they - for what reason ever - actually DO 
have an effect! I simply don't know yet, why, and hope others maybe 
having a clue. Anyway, so far, they're the most acceptable workaround I 
know of for those strange constraint issues that made me we write here. 
(Another workaround are start-delays on stop-operations, but such are - 
for there dependency upon individual node's system- and 
resource-performances - not acceptable for production)
I just still don't know if it's just a case of misconfiguration and/or 
lack of knowledge/experience on my side, or if it's really a bug in 
Pacemaker; maybe even a allready fixed one in more recent versions than 
SLES 11's Pacemaker 1.0.6.


For in case someone would like to have a look onto it, I attached 
complete cluster configuration, with and without the workaround and both 
as XML and as output of "crm configure show".
(Please don't wonder about some quite high monitor operation intervals, 
which were just set so when dumping the config; the tests done and 
configs dumped when posting in Novell's support forum were done with 
those timings being 1/100 of it and made no difference)


Here are also some grep'ed Syslogs:

"Failure" of DRBD resource, without workaround:
May 28 21:15:40 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation drbdMysql:0_asyncmon_0 (call=2449, rc=1, cib-update=4731, 
confirmed=false) unknown error
May 28 21:15:40 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation drbdMysql:0_notify_0 (call=2450, rc=0, cib-update=4735, 
confirmed=true) ok
May 28 21:15:40 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation drbdMysql:0_monitor_10000 (call=2442, status=1, cib-update=0, 
confirmed=true) Cancelled
May 28 21:15:40 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation drbdMysql:0_demote_0 (call=2451, rc=1, cib-update=4736, 
confirmed=true) unknown error
May 28 21:15:40 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation drbdMysql:0_notify_0 (call=2452, rc=0, cib-update=4737, 
confirmed=true) ok
May 28 21:15:41 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mysql-ip_monitor_2000 (call=2446, status=1, cib-update=0, 
confirmed=true) Cancelled
May 28 21:15:41 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation drbdMysql:0_notify_0 (call=2454, rc=0, cib-update=4739, 
confirmed=true) ok
May 28 21:15:41 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mysql-ip_stop_0 (call=2453, rc=0, cib-update=4740, 
confirmed=true) ok
May 28 21:15:41 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mysql_monitor_1000000 (call=2444, status=1, cib-update=0, 
confirmed=true) Cancelled
May 28 21:15:41 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation drbdMysql:0_stop_0 (call=2455, rc=1, cib-update=4741, 
confirmed=true) unknown error
May 28 21:15:44 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mysql_stop_0 (call=2456, rc=0, cib-update=4742, confirmed=true) ok
May 28 21:15:44 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mountMysql:0_monitor_1000000 (call=2421, status=1, 
cib-update=0, confirmed=true) Cancelled
May 28 21:15:44 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mountMysql:0_stop_0 (call=2457, rc=0, cib-update=4744, 
confirmed=true) ok

"Failure" of O2CB resource, without workaround:
May 28 21:20:32 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation dlm:0_asyncmon_0 (call=2476, rc=1, cib-update=4774, 
confirmed=false) unknown error
May 28 21:20:32 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation dlm:0_monitor_10000 (call=2405, status=1, cib-update=0, 
confirmed=true) Cancelled
May 28 21:20:52 nde28 crmd: [2846]: ERROR: process_lrm_event: LRM 
operation dlm:0_stop_0 (2477) Timed Out (timeout=20000ms)
May 28 21:20:52 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mysql-ip_monitor_2000 (call=2475, status=1, cib-update=0, 
confirmed=true) Cancelled
May 28 21:20:52 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation tomcat:0_monitor_500000 (call=2448, status=1, cib-update=0, 
confirmed=true) Cancelled
May 28 21:20:53 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mysql-ip_stop_0 (call=2478, rc=0, cib-update=4782, 
confirmed=true) ok
May 28 21:20:53 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mysql_monitor_1000000 (call=2473, status=1, cib-update=0, 
confirmed=true) Cancelled
May 28 21:20:56 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation tomcat:0_stop_0 (call=2479, rc=0, cib-update=4783, 
confirmed=true) ok
May 28 21:20:56 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mountOpencms:0_monitor_1000000 (call=2423, status=1, 
cib-update=0, confirmed=true) Cancelled
May 28 21:20:56 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mountOpencms:0_stop_0 (call=2481, rc=0, cib-update=4784, 
confirmed=true) ok
May 28 21:20:57 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mysql_stop_0 (call=2480, rc=0, cib-update=4785, confirmed=true) ok
May 28 21:20:57 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mountMysql:1_monitor_1000000 (call=2471, status=1, 
cib-update=0, confirmed=true) Cancelled
May 28 21:20:57 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mountMysql:1_stop_0 (call=2482, rc=0, cib-update=4786, 
confirmed=true) ok
May 28 21:20:57 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation o2cb:0_monitor_1000000 (call=2418, status=1, cib-update=0, 
confirmed=true) Cancelled
May 28 21:20:58 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation o2cb:0_stop_0 (call=2483, rc=0, cib-update=4787, 
confirmed=true) ok

Constraints used, without workaround:
colocation TEST_colocO2cb inf: cloneO2cb cloneDlm
colocation colocGrpMysql inf: grpMysql cloneMountMysql
colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
colocation colocMountMysql_o2cb inf: cloneMountMysql cloneO2cb
colocation colocMountOpencms_drbd inf: cloneMountOpencms 
msDrbdOpencms:Master
colocation colocMountOpencms_o2cb inf: cloneMountOpencms cloneO2cb
colocation colocTomcat inf: cloneTomcat cloneMountOpencms:Started
order TEST_orderO2cb 0: cloneDlm cloneO2cb
order orderGrpMysql 0: cloneMountMysql:start grpMysql
order orderMountMysql_drbd 0: msDrbdMysql:promote cloneMountMysql:start
order orderMountMysql_o2cb 0: cloneO2cb cloneMountMysql
order orderMountOpencms_drbd 0: msDrbdOpencms:promote 
cloneMountOpencms:start
order orderMountOpencms_o2cb 0: cloneO2cb cloneMountOpencms
order orderTomcat 0: cloneMountOpencms:start cloneTomcat

"Failure" of DRBD resource, with workaround:
May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation drbdMysql:0_asyncmon_0 (call=2355, rc=1, cib-update=4556, 
confirmed=false) unknown error
May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation drbdMysql:0_notify_0 (call=2356, rc=0, cib-update=4560, 
confirmed=true) ok
May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation drbdMysql:0_monitor_10000 (call=2330, status=1, cib-update=0, 
confirmed=true) Cancelled
May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation drbdMysql:0_demote_0 (call=2357, rc=1, cib-update=4561, 
confirmed=true) unknown error
May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation drbdMysql:0_notify_0 (call=2358, rc=0, cib-update=4562, 
confirmed=true) ok
May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mysql-ip_monitor_2000 (call=2344, status=1, cib-update=0, 
confirmed=true) Cancelled
May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation drbdMysql:0_notify_0 (call=2360, rc=0, cib-update=4564, 
confirmed=true) ok
May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mysql-ip_stop_0 (call=2359, rc=0, cib-update=4565, 
confirmed=true) ok
May 28 20:57:12 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mysql_monitor_1000000 (call=2342, status=1, cib-update=0, 
confirmed=true) Cancelled
May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mysql_stop_0 (call=2361, rc=0, cib-update=4566, confirmed=true) ok
May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mountMysql:0_monitor_1000000 (call=2337, status=1, 
cib-update=0, confirmed=true) Cancelled
May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mountMysql:0_stop_0 (call=2362, rc=0, cib-update=4567, 
confirmed=true) ok
May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation drbdMysql:0_stop_0 (call=2363, rc=0, cib-update=4568, 
confirmed=true) ok
May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation drbdMysql:0_start_0 (call=2364, rc=0, cib-update=4570, 
confirmed=true) ok
May 28 20:57:16 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation drbdMysql:0_notify_0 (call=2365, rc=0, cib-update=4571, 
confirmed=true) ok
May 28 20:57:20 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation drbdMysql:0_notify_0 (call=2366, rc=0, cib-update=4573, 
confirmed=true) ok
May 28 20:57:21 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation drbdMysql:0_promote_0 (call=2367, rc=0, cib-update=4574, 
confirmed=true) ok
May 28 20:57:21 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation drbdMysql:0_notify_0 (call=2368, rc=0, cib-update=4575, 
confirmed=true) ok
May 28 20:57:23 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation drbdMysql:0_monitor_10000 (call=2369, rc=8, cib-update=4577, 
confirmed=false) master
May 28 20:57:23 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mountMysql:1_start_0 (call=2370, rc=0, cib-update=4578, 
confirmed=true) ok
May 28 20:57:23 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mountMysql:1_monitor_1000000 (call=2371, rc=0, 
cib-update=4579, confirmed=false) ok
May 28 20:57:27 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mysql_start_0 (call=2372, rc=0, cib-update=4580, 
confirmed=true) ok
May 28 20:57:27 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mysql_monitor_1000000 (call=2373, rc=0, cib-update=4581, 
confirmed=false) ok
May 28 20:57:27 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mysql-ip_start_0 (call=2374, rc=0, cib-update=4582, 
confirmed=true) ok
May 28 20:57:28 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mysql-ip_monitor_2000 (call=2375, rc=0, cib-update=4583, 
confirmed=false) ok

"Failure" of O2CB resource, with workaround (not working for O2CB; maybe 
for it not being a MS resource too):
May 28 21:01:31 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation dlm:0_asyncmon_0 (call=2376, rc=1, cib-update=4584, 
confirmed=false) unknown error
May 28 21:01:32 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation dlm:0_monitor_10000 (call=2321, status=1, cib-update=0, 
confirmed=true) Cancelled
May 28 21:01:52 nde28 crmd: [2846]: ERROR: process_lrm_event: LRM 
operation dlm:0_stop_0 (2377) Timed Out (timeout=20000ms)
May 28 21:01:52 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mysql-ip_monitor_2000 (call=2375, status=1, cib-update=0, 
confirmed=true) Cancelled
May 28 21:01:52 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation tomcat:0_monitor_500000 (call=2341, status=1, cib-update=0, 
confirmed=true) Cancelled
May 28 21:01:52 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mysql-ip_stop_0 (call=2378, rc=0, cib-update=4592, 
confirmed=true) ok
May 28 21:01:52 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mysql_monitor_1000000 (call=2373, status=1, cib-update=0, 
confirmed=true) Cancelled
May 28 21:01:56 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation tomcat:0_stop_0 (call=2379, rc=0, cib-update=4593, 
confirmed=true) ok
May 28 21:01:56 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mysql_stop_0 (call=2380, rc=0, cib-update=4594, confirmed=true) ok
May 28 21:01:56 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mountMysql:1_monitor_1000000 (call=2371, status=1, 
cib-update=0, confirmed=true) Cancelled
May 28 21:01:57 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mountMysql:1_stop_0 (call=2381, rc=0, cib-update=4595, 
confirmed=true) ok
May 28 21:01:58 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mountOpencms:0_monitor_1000000 (call=2339, status=1, 
cib-update=0, confirmed=true) Cancelled
May 28 21:01:58 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation mountOpencms:0_stop_0 (call=2382, rc=0, cib-update=4596, 
confirmed=true) ok
May 28 21:01:58 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation o2cb:0_monitor_1000000 (call=2334, status=1, cib-update=0, 
confirmed=true) Cancelled
May 28 21:01:59 nde28 crmd: [2846]: info: process_lrm_event: LRM 
operation o2cb:0_stop_0 (call=2383, rc=0, cib-update=4597, 
confirmed=true) ok

Constraints used, with workaround (*GNAH_*) for DRBD resource and tried 
"workaround" for O2CB resource:
colocation TEST_colocO2cb inf: cloneO2cb cloneDlm
colocation colocGrpMysql inf: grpMysql cloneMountMysql
colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
colocation colocMountMysql_o2cb inf: cloneMountMysql cloneO2cb
colocation colocMountOpencms_drbd inf: cloneMountOpencms 
msDrbdOpencms:Master
colocation colocMountOpencms_o2cb inf: cloneMountOpencms cloneO2cb
colocation colocTomcat inf: cloneTomcat cloneMountOpencms:Started
order GNAH_orderDrbdMysql_stop 0: cloneMountMysql:stop msDrbdMysql:stop
order GNAH_orderDrbdOpencms_stop 0: cloneMountOpencms:stop 
msDrbdOpencms:stop
order TESTGNAH_orderDlm_stop 0: cloneO2cb:stop cloneDlm:stop
order TEST_orderO2cb 0: cloneDlm cloneO2cb
order orderGrpMysql 0: cloneMountMysql:start grpMysql
order orderMountMysql_drbd 0: msDrbdMysql:promote cloneMountMysql:start
order orderMountMysql_o2cb 0: cloneO2cb cloneMountMysql
order orderMountOpencms_drbd 0: msDrbdOpencms:promote 
cloneMountOpencms:start
order orderMountOpencms_o2cb 0: cloneO2cb cloneMountOpencms
order orderTomcat 0: cloneMountOpencms:start cloneTomcat
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: crm - config without workaround
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100602/76d0a4d3/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: crm - config with workaround
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100602/76d0a4d3/attachment-0001.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: XML - config without workaround
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100602/76d0a4d3/attachment-0002.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: XML - config with workaround
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100602/76d0a4d3/attachment-0003.ksh>


More information about the Pacemaker mailing list