[Pacemaker] resources not rebalancing
Andrew Beekhof
andrew at beekhof.net
Thu Jun 5 02:15:22 CEST 2014
On 5 Jun 2014, at 12:57 am, Patrick Hemmer <pacemaker at feystorm.net> wrote:
> From: Andrew Beekhof <andrew at beekhof.net>
> Sent: 2014-06-04 04:15:48 E
> To: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
> Subject: Re: [Pacemaker] resources not rebalancing
>
>> On 4 Jun 2014, at 4:22 pm, Patrick Hemmer <pacemaker at feystorm.net>
>> wrote:
>>
>>
>>> Testing some different scenarios, and after bringing a node back online, none of the resources move to it unless they are restarted. However default-resource-stickiness is set to 0, so they should be able to move around freely.
>>>
>>> # pcs status
>>> Cluster name: docker
>>> Last updated: Wed Jun 4 06:09:26 2014
>>> Last change: Wed Jun 4 06:08:40 2014 via cibadmin on i-093f1f55
>>> Stack: corosync
>>> Current DC: i-083f1f54 (3) - partition with quorum
>>> Version: 1.1.11-1.fc20-9d39a6b
>>> 3 Nodes configured
>>> 8 Resources configured
>>>
>>>
>>> Online: [ i-053f1f59 i-083f1f54 i-093f1f55 ]
>>>
>>> Full list of resources:
>>>
>>> dummy2 (ocf::pacemaker:Dummy): Started i-083f1f54
>>> Clone Set: dummy1-clone [dummy1] (unique)
>>> dummy1:0 (ocf::pacemaker:Dummy): Started i-083f1f54
>>> dummy1:1 (ocf::pacemaker:Dummy): Started i-093f1f55
>>> dummy1:2 (ocf::pacemaker:Dummy): Started i-093f1f55
>>> dummy1:3 (ocf::pacemaker:Dummy): Started i-083f1f54
>>> dummy1:4 (ocf::pacemaker:Dummy): Started i-093f1f55
>>>
>>> # pcs resource show --all
>>> Resource: dummy2 (class=ocf provider=pacemaker type=Dummy)
>>> Clone: dummy1-clone
>>> Meta Attrs: clone-max=5 clone-node-max=5 globally-unique=true
>>> Resource: dummy1 (class=ocf provider=pacemaker type=Dummy)
>>>
>>> # pcs property show --all | grep default-resource-stickiness
>>> default-resource-stickiness: 0
>>>
>>> Notice how i-053f1f59 isn't running anything. I feel like I'm missing something obvious, but it escapes me.
>>>
>> clones are ever so slightly sticky by default, try setting resource-stickiness=0 for the clone resource
>> (and unset it once everything has moved back)
>>
>>
>
> Thanks, that did indeed fix it. But how come dummy2 didn't move? It's not a clone, but it didn't move either?
Do you have a location constraint that says it should prefer i-053f1f59?
>
> And now a separate follow up question, the resources didn't balance as they should. I've got several utilization attributes set, and the resources aren't balanced according to the placement-strategy.
>
> # pcs property show placement-strategy
> Cluster Properties:
> placement-strategy: balanced
>
> # crm_simulate -URL
>
> Current cluster status:
> Online: [ i-053f1f59 i-083f1f54 i-093f1f55 ]
>
> dummy2 (ocf::pacemaker:Dummy): Started i-053f1f59
> Clone Set: dummy1-clone [dummy1] (unique)
> dummy1:0 (ocf::pacemaker:Dummy): Started i-053f1f59
> dummy1:1 (ocf::pacemaker:Dummy): Started i-093f1f55
> dummy1:2 (ocf::pacemaker:Dummy): Started i-083f1f54
> dummy1:3 (ocf::pacemaker:Dummy): Started i-083f1f54
> dummy1:4 (ocf::pacemaker:Dummy): Started i-093f1f55
>
> Utilization information:
> Original: i-053f1f59 capacity: cpu=5000000 mem=3840332000
> Original: i-083f1f54 capacity: cpu=5000000 mem=3840332000
> Original: i-093f1f55 capacity: cpu=5000000 mem=3840332000
> calculate_utilization: dummy2 utilization on i-053f1f59: cpu=10000
> calculate_utilization: dummy1:2 utilization on i-083f1f54: cpu=1000
> calculate_utilization: dummy1:1 utilization on i-093f1f55: cpu=1000
> calculate_utilization: dummy1:0 utilization on i-053f1f59: cpu=1000
> calculate_utilization: dummy1:3 utilization on i-083f1f54: cpu=1000
> calculate_utilization: dummy1:4 utilization on i-093f1f55: cpu=1000
> Remaining: i-053f1f59 capacity: cpu=4989000 mem=3840332000
> Remaining: i-083f1f54 capacity: cpu=4998000 mem=3840332000
> Remaining: i-093f1f55 capacity: cpu=4998000 mem=3840332000
>
>
>
> The "balanced" strategy is defined as: "the node that has more free capacity gets consumed first".
> Notice that dummy2 consumes cpu=10000, while dummy1 is only 1000 (10x less). After dummy2 was placed on i-053f1f59, that should have consumed enough "cpu" resource to keep dummy1 off it and on the other 2 nodes, but dummy1:0 got placed on the node.
But i-053f1f59 still has orders of magnitude more cpu capacity left to run things.
>
> Also how difficult is it to add a strategy?
It might be challenging, the policy engine is deep voodoo :)
Can you create an entry at bugs.clusterlabs.org and include the result of 'cibadmin -Q' when the cluster is in the state you describe above?
It wont make it into 1.1.12 but we can look at it for .13
> I'd be interested in having a strategy which places a resource on a node with the least amount of capacity used? Kind of the inverse of "balanced". The docs say balanced looks at much capacity is free. The 2 strategies would be equivalent if all nodes have the same capacity, but if one node has 10x the capacity of the other nodes, I want the resources to be distributed evenly (based on the capacity each uses), and not over-utilize that one node.
>
> Thanks
>
> -Patrick
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140605/57389952/attachment-0001.sig>
More information about the Pacemaker
mailing list