[Pacemaker] resources not migrating when some are not runnable on one node, maybe because of groups or master/slave clones?

David Vossel dvossel at redhat.com
Thu Jun 28 13:29:09 EDT 2012


----- Original Message -----
> From: "Phil Frost" <phil at macprofessionals.com>
> To: pacemaker at oss.clusterlabs.org
> Sent: Tuesday, June 26, 2012 9:23:51 AM
> Subject: Re: [Pacemaker] resources not migrating when some are not runnable on one node, maybe because of groups or
> master/slave clones?
> 
> On 06/22/2012 04:40 AM, Andreas Kurz wrote:
> >> I took a look at the cib in case2 and saw this in the status for
> >> storage02.
> >> >
> >> >       <transient_attributes id="storage02">
> >> >         <instance_attributes id="status-storage02">
> >> >           <nvpair id="status-storage02-probe_complete"
> >> >           name="probe_complete" value="true"/>
> >> >           <nvpair id="status-storage02-master-drbd_nfsexports.1"
> >> >           name="master-drbd_nfsexports:1" value="10"/>
> >> >         </instance_attributes>
> >> >       </transient_attributes>
> >> >
> >> >storage02 will not give up the drbd master since it has a higher
> >> >score that storage01.  This coupled with the colocation rule
> >> >between test and the drbd master, and the location rule to never
> >> >run "test" on storage02 cause the "test" resource to never
> >> >run.... "test" has to run with the drbd master, and the drbd
> >> >master is stuck because of the transient attributes on a node
> >> >"test" can't run on, so "test" can't start.
> >> >
> >> >I don't understand why the transient attribute is there, or where
> >> >it came from yet.
> > This is added by the RA with the crm_master command. For example
> > the
> > drbd RA chooses this value from the current state of drbd to let
> > Pacemaker promote best candidate.
> 
> I'm not really sure I understand this transient attribute business.
> Is
> this suggesting there's a configuration problem, or a problem with
> the
> RA? It looks to me that the colocation constraints aren't being
> considered at all in calculating the promotion scores (at least,
> that's
> what crm_simulate suggests). Can this transient attribute explain
> that,
> or is there something else in play?

I've been looking into multistate resource colocations quite a bit this week.  I have a branch I'm working with that may improve this situation for you. 

If you are feeling brave, test this branch out with your configuration and see if it fairs better.

https://github.com/davidvossel/pacemaker/tree/master_colo_fixes

If you want to try and apply the patch to your own src, this is commit to use. https://github.com/davidvossel/pacemaker/commit/0062eab18f96d3f75462e0a889e4175f02552d92

-- Vossel


> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 




More information about the Pacemaker mailing list