[Pacemaker] resources not migrating when some are not runnable on one node, maybe because of groups or master/slave clones?
David Vossel
dvossel at redhat.com
Thu Jun 28 17:29:09 UTC 2012
----- Original Message -----
> From: "Phil Frost" <phil at macprofessionals.com>
> To: pacemaker at oss.clusterlabs.org
> Sent: Tuesday, June 26, 2012 9:23:51 AM
> Subject: Re: [Pacemaker] resources not migrating when some are not runnable on one node, maybe because of groups or
> master/slave clones?
>
> On 06/22/2012 04:40 AM, Andreas Kurz wrote:
> >> I took a look at the cib in case2 and saw this in the status for
> >> storage02.
> >> >
> >> > <transient_attributes id="storage02">
> >> > <instance_attributes id="status-storage02">
> >> > <nvpair id="status-storage02-probe_complete"
> >> > name="probe_complete" value="true"/>
> >> > <nvpair id="status-storage02-master-drbd_nfsexports.1"
> >> > name="master-drbd_nfsexports:1" value="10"/>
> >> > </instance_attributes>
> >> > </transient_attributes>
> >> >
> >> >storage02 will not give up the drbd master since it has a higher
> >> >score that storage01. This coupled with the colocation rule
> >> >between test and the drbd master, and the location rule to never
> >> >run "test" on storage02 cause the "test" resource to never
> >> >run.... "test" has to run with the drbd master, and the drbd
> >> >master is stuck because of the transient attributes on a node
> >> >"test" can't run on, so "test" can't start.
> >> >
> >> >I don't understand why the transient attribute is there, or where
> >> >it came from yet.
> > This is added by the RA with the crm_master command. For example
> > the
> > drbd RA chooses this value from the current state of drbd to let
> > Pacemaker promote best candidate.
>
> I'm not really sure I understand this transient attribute business.
> Is
> this suggesting there's a configuration problem, or a problem with
> the
> RA? It looks to me that the colocation constraints aren't being
> considered at all in calculating the promotion scores (at least,
> that's
> what crm_simulate suggests). Can this transient attribute explain
> that,
> or is there something else in play?
I've been looking into multistate resource colocations quite a bit this week. I have a branch I'm working with that may improve this situation for you.
If you are feeling brave, test this branch out with your configuration and see if it fairs better.
https://github.com/davidvossel/pacemaker/tree/master_colo_fixes
If you want to try and apply the patch to your own src, this is commit to use. https://github.com/davidvossel/pacemaker/commit/0062eab18f96d3f75462e0a889e4175f02552d92
-- Vossel
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Pacemaker
mailing list