[Pacemaker] migration-threshold causing unnecessary restart of underlying resources

Tue Aug 17 11:51:14 UTC 2010

On Tue, Aug 17, 2010 at 04:14:17AM +0200, Cnut Jansen wrote:
>  Am 16.08.2010 13:29, schrieb Dejan Muhamedagic:
> >On Sat, Aug 14, 2010 at 06:26:58AM +0200, Cnut Jansen wrote:
> >>Am 12.08.2010 18:46, schrieb Dejan Muhamedagic:
> >>>The migration-threshold shouldn't in any way influence resources
> >>>which don't depend on the resource which fails over. Couldn't
> >>>reproduce it here with our example RAs.
> 
> >>So it seems that - for what reason ever - those constrainted
> >>resources are considered and treated just as they were in a
> >>resource-group, because they move to where they all can run, instead
> >>of the "eat or die" for the dependent resource (mysql) to the
> >>underlying resource (mount) that I had expected with such
> >>constraints as I set them... shouldn't I?! o_O
> >Yes, those two constraints are equivalent to a group.
> So in fact migration-threshold actually does influence resources
> that are neither grouped with nor dependent on the failing resource,
> when the failing resource depends on them?!
> 
> Of course I allready knew that from groups, and there it - imho -
> also makes sense, since defining a group means like saying "I want
> to have all these resources run together on one node; no matter how
> and where". But when setting constraints respectively defining
> dependencies, at least I understand "dependency" one-sided, not
> mutual; meaning the underlying resource is independent towards its
> dependent, therefor it can do whatever it wants to do and doesn't
> have to care about its dependent at all, while the dependent shall
> only start when and where the underlying resource it depends on is
> started.
> So did I understand you right, that for Pacemaker it's actually the
> intentional way of working for both, groups and constraints, that
> they are mutual dependencies?
> 
> And if so: Is there also any possibility to define one-sided
> dependencies/influences?

Take a look at mandatory vs. advisory constraints in the
Configuration Explained doc. A group is equivalent to a set of
order/collocation constraints with the infinite score (inf).

> >>And - concerning the failure-timeout - quite a while later, without
> >>having resetted mysql's failure counter or having done anything else
> >>in the meantime:
> >>
> >>4) alpha: FC(mysql)=3, crm_resource -F -r mysql -H alpha
> >>Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM
> >>operation mysql_asyncmon_0 (call=59, rc=1, cib-update=592,
> >>confirmed=false) unknown error
> >>Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM
> >>operation mysql_stop_0 (call=60, rc=0, cib-update=596,
> >>confirmed=true) ok
> >>Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM
> >>operation mount_stop_0 (call=61, rc=0, cib-update=597,
> >>confirmed=true) ok
> >>beta: FC(mysql)=0
> >>Aug 14 04:44:47 beta crmd: [868]: info: process_lrm_event: LRM
> >>operation mount_start_0 (call=40, rc=0, cib-update=96,
> >>confirmed=true) ok
> >>Aug 14 04:44:47 beta crmd: [868]: info: process_lrm_event: LRM
> >>operation mysql_start_0 (call=41, rc=0, cib-update=97,
> >>confirmed=true) ok
> >>Aug 14 04:47:17 beta crmd: [868]: info: process_lrm_event: LRM
> >>operation mysql_stop_0 (call=42, rc=0, cib-update=98,
> >>confirmed=true) ok
> >>Aug 14 04:47:17 beta crmd: [868]: info: process_lrm_event: LRM
> >>operation mount_stop_0 (call=43, rc=0, cib-update=99,
> >>confirmed=true) ok
> >>alpha: FC(mysql)=4
> >>Aug 14 04:47:17 alpha crmd: [900]: info: process_lrm_event: LRM
> >>operation mount_start_0 (call=62, rc=0, cib-update=599,
> >>confirmed=true) ok
> >>Aug 14 04:47:17 alpha crmd: [900]: info: process_lrm_event: LRM
> >>operation mysql_start_0 (call=63, rc=0, cib-update=600,
> >>confirmed=true) ok
> >This worked as expected, i.e. after the 150s cluster-recheck
> >interval the resources were started at alpha.
> Is it really "as exspected" that many(!) minutes - and even
> cluster-rechecks - after the last picking-on and with a
> failure-timeout of 45 seconds the failure counter is still not only
> showing a count of 3, but also obviously really being 3 (not 0,
> after being reset), thus now migrating resource allready on the
> first following picking-on?!

Of course, that's not how it should work. If you observe such a
case, please file a bugzilla and attach hb_report. I just
commented what was shown above: 04:47:17 - 04:44:47 = 150.
Perhaps I missed something happening earlier?

> >>>BTW, what's the point of cloneMountMysql? If it can run only
> >>>where drbd is master, then it can run on one node only:
> >>>
> >>>colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
> >>>order orderMountMysql_drbd inf: msDrbdMysql:promote cloneMountMysql:start
> >>It's a dual-primary-DRBD-configuration, so there are actually - when
> >>everything is ok (-; - 2 masters of each DRBD-multistate-resource...
> >>even though I admit that at least the dual primary respectively
> >>master for msDrbdMysql is currently (quite) redundant, since in the
> >>current cluster configuration there's only one, primitive
> >>MySQL-resource and thus there'd be no inevitable need for MySQL's
> >>data-dir being mounted all time on both nodes.
> >>But since it's not harmful to have it mounted on the other node too,
> >>and since msDrbdOpencms and msDrbdShared need to be mounted on both
> >>nodes and since I put the complete installation and configuration of
> >>the cluster into flexibly configurable shell-scripts, it's easier
> >>respectively done with less typing to just put all DRBD- and
> >>mount-resources' configuration into just one common loop. (-;
> >OK. It did cross my mind that it may be a dual-master drbd.
> >
> >Your configuration is large. If you are going to run that in
> >producetion and don't really need a dual-master, then it'd be
> >good to get rid of the ocfs2 bits to make maintenance easier.
> Well, there are 3 DRBD resources, and the other 2 DRBD resources
> except the DRBD for MySQL's datadir must be dual-primary allready
> now, since they're needed being mounted on all nodes for the
> Apache/Tomcat/Opencms-teams. Therefor it's indeed easier for
> maintenance to just keep all 3 DRBD's configurations in sync, and
> only requiring one little line more for cloning mountMysql. (-;

Right.

> >>>>d) I also have the impression that fail-counters don't get reset
> >>>>after their failure-timeout, because when migration-threshold=3 is
> >>>>set, upon every(!) following picking-on those issues occure, even
> >>>>when I've waited for nearly 5 minutes (with failure-timeout=90)
> >>>>without any touching the cluster
> >>>That seems to be a bug though I couldn't reproduce it with a
> >>>simple configuration.
> >>I just also tested this once again: It seems like that
> >>failure-timeout only sets back scores from -inf to around 0
> >>(whereever they should normally be), allowing the resources to
> >>return back to the node. I tested with setting a location constraint
> >>for the underlying resource (see configuration): After the
> >>failure-timeout has been completed, on the next cluster-recheck (and
> >>only then!) the underlying resource and its dependants return to the
> >>underlying resource's prefered location, as you see in logs above.
> >The count gets reset, but the cluster acts on it only after the
> >cluster-recheck-interval, unless something else makes the cluster
> >calculate new scores.
> See above, picking-on #4: More than 26 minutes after the last

Hmm, sorry, couldn't see anything going on for 26 mins. I
probably didn't look carefully enough.

> picking-on with settings of migration-threshold=3,
> timeout-failure=40 and cluster-recheck-interval=150, resources get
> allready migrated upon first picking-on (and shown failure-counter
> raises to 4). To me that doesn't look like resetting failure-counter
> to 0 after failure-timeout, but just resetting scores.

failure-timeout serves explicitely to reset the number of
failures not score.

> Actually -
> except maybe by tricks/force - it shouldn't be possible at all to
> get the resource running again on the node it failed on for as long
> as its failure counter there has still reached migration-threshold's
> limit, right?

Right.

> How can then failure counter ever reach counts beyond
> migration-threshold's limit (ok, I could still imagine reasons for
> that) at all,

It shouldn't. I see now above "alpha: FC(mysql)=4", I guess that
that shouldn't have happened.

> and exspecially why does migration-threshold from then
> on behave on every failure like it was set to 1, even when it's i.e.
> set to 3?

Don't quite understand what do you mean by "behave". An attribute
cannot really behave. Well, obviously you ran into an unusual
behaviour, so it'd be best to make a hb_report for the incident
and open a bugzilla.

Thanks,

Dejan

> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker