[Pacemaker] resources not migrating when some are not runnable on one node, maybe because of groups or master/slave clones?

Sun Jul 29 23:15:56 EDT 2012

On Sat, Jun 30, 2012 at 1:59 AM, Phil Frost <phil at macprofessionals.com> wrote:
> On 06/28/2012 01:29 PM, David Vossel wrote:
>>
>> I've been looking into multistate resource colocations quite a bit this
>> week.  I have a branch I'm working with that may improve this situation for
>> you.
>>
>> If you are feeling brave, test this branch out with your configuration and
>> see if it fairs better.
>>
>> https://github.com/davidvossel/pacemaker/tree/master_colo_fixes
>>
>> If you want to try and apply the patch to your own src, this is commit to
>> use.https://github.com/davidvossel/pacemaker/commit/0062eab18f96d3f75462e0a889e4175f02552d92
>
>
> I could be doing something wrong, but that commit doesn't seem to fix my
> problem. I applied that commit against the pacemaker packages from debian
> squeeze-backports (they call it pacemaker-1.1.7) to test. I tweaked my
> production configuration to use Dummy and Stateful resources, and introduced
> a location constraint (named "foo") to simulate a failure of
> nfs_kernel_server on storage02. As before, the nfs export (called
> export_test) stopped, presumably due to it's colocation with
> nfs_kernel_server. At this point I ran cibadmin -Q to generate the attached
> file.
>
> I'm expecting pacemaker to migrate the DRBD master and all the other
> services (all of which are one way or another colocated with the DRBD
> master) to storage01, since they could all run there. If I run:
>
>   crm_simulate -x cib.xml -S
>
> (cib.xml attached), crm_simulate outputs nothing in the "Transition
> Summary", indicating it's happy with the way things are. I was expecting
> this to indicate a migration to storage01 so that export_test can run. If
> it's not too much trouble, could you try this on your development version? I
> wonder if maybe some changes besides the one commit you referenced above are
> needed, but making a build of your branch head is a bit more work than I
> have time to do now.

If I run:

tools/crm_simulate -x ~/Dropbox/phil.xml -Ss | grep "promotion score"

I see:

drbd_exports:1 promotion score on storage02: 110
drbd_exports:0 promotion score on storage01: 6

The 100 coming from one of your rules which says:

           <!--# storage02 is a much more capable machine, so prefer that.-->

So I'm not really understanding why you think we'd migrate everything
to storage01.