[Pacemaker] resources not migrating when some are not runnable on one node, maybe because of groups or master/slave clones?

Mon Jun 18 16:46:55 CEST 2012

On 06/18/2012 04:14 PM, Vladislav Bogdanov wrote:
> 18.06.2012 16:39, Phil Frost wrote:
>> I'm attempting to configure an NFS cluster, and I've observed that under
>> some failure conditions, resources that depend on a failed resource
>> simply stop, and no migration to another node is attempted, even though
>> a manual migration demonstrates the other node can run all resources,
>> and the resources will remain on the good node even after the migration
>> constraint is removed.
>>
>> I was able to reduce the configuration to this:
>>
>> node storage01
>> node storage02
>> primitive drbd_nfsexports ocf:pacemaker:Stateful
>> primitive fs_test ocf:pacemaker:Dummy
>> primitive vg_nfsexports ocf:pacemaker:Dummy
>> group test fs_test
>> ms drbd_nfsexports_ms drbd_nfsexports \
>>         meta master-max="1" master-node-max="1" \
>>         clone-max="2" clone-node-max="1" \
>>         notify="true" target-role="Started"
>> location l fs_test -inf: storage02
>> colocation colo_drbd_master inf: ( test ) ( vg_nfsexports ) (
>> drbd_nfsexports_ms:Master )
> 
> Sets (constraints with more then two members) are evaluated in the
> different order.

_Between_ several resource-sets, and in this example three sets are
created, the order of evaluation is like for simple colocation
constraints ... so the last/most right one is the most significant.

_Within_ one single default colocation resource-set the resources are
evaluated like in a group, so most significant resource is the
first/most left one.

Try (for a real drbd scenario, where also the order is important):

colocation colo_drbd_master inf: vg_nfsexports test
drbd_nfsexports_ms:Master

order order_drbd_promote_first inf: drbd_nfsexports_ms:promote
vg_nfsexports:start test:start

These examples will automatically create two sets each, because of the
different Roles/actions. I prefer having a look at the resulting xml
syntax to be sure the shell created what I planned ;-)

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> Try
> colocation colo_drbd_master inf: ( drbd_nfsexports_ms:Master ) (
> vg_nfsexports ) ( test )
> 
> 
>> property $id="cib-bootstrap-options" \
>>         no-quorum-policy="ignore" \
>>         stonith-enabled="false" \
>>         dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
>>         cluster-infrastructure="openais" \
>>         expected-quorum-votes="2" \
>>         last-lrm-refresh="1339793579"
>>
>> The location constraint "l" exists only to demonstrate the problem; I
>> added it to simulate the NFS server being unrunnable on one node.
>>
>> To see the issue I'm experiencing, put storage01 in standby to force
>> everything on storage02. fs_test will not be able to run. Now bring
>> storage01, which can satisfy all the constraints, and see that no
>> migration takes place. Put storage02 in standby, and everything will
>> migrate to storage01 and start successfully. Take storage02 out of
>> standby, and the services remain on storage01. This demonstrates that
>> even though there is a clear "best" solution where all resources can
>> run, Pacemaker isn't finding it.
>>
>> So far, I've noticed any of the following changes will "fix" the problem:
>>
>> - removing colo_drbd_master
>> - removing any one resource from colo_drbd_master
>> - eliminating the group "test" and referencing fs_test directly in
>> constraints
>> - using a simple clone instead of a master/slave pair for
>> drbd_nfsexports_ms
>>
>> My current understanding is that if there exists a way to run all
>> resources, Pacemaker should find it and prefer it. Is that not the case?
>> Maybe I need to restructure my colocation constraint somehow? Obviously
>> this is a much reduced version of a more complex practical
>> configuration, so I'm trying to understand the underlying mechanisms
>> more than just the solution to this particular scenario.
>>
>> In particular, I'm not really sure how I inspect what Pacemaker is
>> thinking when it places resources. I've tried running crm_simulate -LRs,
>> but I'm a little bit unclear on how to interpret the results. In the
>> output, I do see this:
>>
>> drbd_nfsexports:1 promotion score on storage02: 10
>> drbd_nfsexports:0 promotion score on storage01: 5
>>
>> those numbers seem to account for the default stickiness of 1 for
>> master/slave resources, but don't seem to incorporate at all the
>> colocation constraints. Is that expected?
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20120618/ad5841af/attachment-0001.sig>