[Pacemaker] resources not rebalancing

Tue Jun 10 02:25:09 EDT 2014

On 5 Jun 2014, at 10:38 am, Patrick Hemmer <pacemaker at feystorm.net> wrote:

> From: Andrew Beekhof <andrew at beekhof.net>
> Sent: 2014-06-04 20:15:22 EDT
> To: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
> Subject: Re: [Pacemaker] resources not rebalancing
> 
>> On 5 Jun 2014, at 12:57 am, Patrick Hemmer <pacemaker at feystorm.net>
>>  wrote:
>> 
>> 
>>> From: Andrew Beekhof <andrew at beekhof.net>
>>> 
>>> Sent: 2014-06-04 04:15:48 E
>>> To: The Pacemaker cluster resource manager 
>>> <pacemaker at oss.clusterlabs.org>
>>> 
>>> Subject: Re: [Pacemaker] resources not rebalancing
>>> 
>>> 
>>>> On 4 Jun 2014, at 4:22 pm, Patrick Hemmer <pacemaker at feystorm.net>
>>>> 
>>>>  wrote:
>>>> 
>>>> 
>>>> 
>>>>> Testing some different scenarios, and after bringing a node back online, none of the resources move to it unless they are restarted. However default-resource-stickiness is set to 0, so they should be able to move around freely.
>>>>> 
>>>>> # pcs status
>>>>> Cluster name: docker
>>>>> Last updated: Wed Jun  4 06:09:26 2014
>>>>> Last change: Wed Jun  4 06:08:40 2014 via cibadmin on i-093f1f55
>>>>> Stack: corosync
>>>>> Current DC: i-083f1f54 (3) - partition with quorum
>>>>> Version: 1.1.11-1.fc20-9d39a6b
>>>>> 3 Nodes configured
>>>>> 8 Resources configured
>>>>> 
>>>>> 
>>>>> Online: [ i-053f1f59 i-083f1f54 i-093f1f55 ]
>>>>> 
>>>>> Full list of resources:
>>>>> 
>>>>>  dummy2    (ocf::pacemaker:Dummy):    Started i-083f1f54 
>>>>>  Clone Set: dummy1-clone [dummy1] (unique)
>>>>>      dummy1:0    (ocf::pacemaker:Dummy):    Started i-083f1f54 
>>>>>      dummy1:1    (ocf::pacemaker:Dummy):    Started i-093f1f55 
>>>>>      dummy1:2    (ocf::pacemaker:Dummy):    Started i-093f1f55 
>>>>>      dummy1:3    (ocf::pacemaker:Dummy):    Started i-083f1f54 
>>>>>      dummy1:4    (ocf::pacemaker:Dummy):    Started i-093f1f55 
>>>>> 
>>>>> # pcs resource show --all 
>>>>>  Resource: dummy2 (class=ocf provider=pacemaker type=Dummy)
>>>>>  Clone: dummy1-clone
>>>>>   Meta Attrs: clone-max=5 clone-node-max=5 globally-unique=true 
>>>>>   Resource: dummy1 (class=ocf provider=pacemaker type=Dummy)
>>>>> 
>>>>> # pcs property show --all | grep default-resource-stickiness
>>>>>  default-resource-stickiness: 0
>>>>> 
>>>>> Notice how i-053f1f59 isn't running anything. I feel like I'm missing something obvious, but it escapes me.
>>>>> 
>>>>> 
>>>> clones are ever so slightly sticky by default, try setting resource-stickiness=0 for the clone resource
>>>> (and unset it once everything has moved back)
>>>> 
>>>> 
>>>> 
>>> Thanks, that did indeed fix it. But how come dummy2 didn't move? It's not a clone, but it didn't move either?
>>> 
>> Do you have a location constraint that says it should prefer i-053f1f59?
> No location constraint.
> 
>>> And now a separate follow up question, the resources didn't balance as they should. I've got several utilization attributes set, and the resources aren't balanced according to the placement-strategy.
>>> 
>>> # pcs property show placement-strategy
>>> Cluster Properties:
>>>  placement-strategy: balanced
>>> 
>>> # crm_simulate -URL
>>> 
>>> Current cluster status:
>>> Online: [ i-053f1f59 i-083f1f54 i-093f1f55 ]
>>> 
>>>  dummy2    (ocf::pacemaker:Dummy):    Started i-053f1f59 
>>>  Clone Set: dummy1-clone [dummy1] (unique)
>>>      dummy1:0    (ocf::pacemaker:Dummy):    Started i-053f1f59 
>>>      dummy1:1    (ocf::pacemaker:Dummy):    Started i-093f1f55 
>>>      dummy1:2    (ocf::pacemaker:Dummy):    Started i-083f1f54 
>>>      dummy1:3    (ocf::pacemaker:Dummy):    Started i-083f1f54 
>>>      dummy1:4    (ocf::pacemaker:Dummy):    Started i-093f1f55 
>>> 
>>> Utilization information:
>>> Original: i-053f1f59 capacity: cpu=5000000 mem=3840332000
>>> Original: i-083f1f54 capacity: cpu=5000000 mem=3840332000
>>> Original: i-093f1f55 capacity: cpu=5000000 mem=3840332000
>>> calculate_utilization: dummy2 utilization on i-053f1f59: cpu=10000
>>> calculate_utilization: dummy1:2 utilization on i-083f1f54: cpu=1000
>>> calculate_utilization: dummy1:1 utilization on i-093f1f55: cpu=1000
>>> calculate_utilization: dummy1:0 utilization on i-053f1f59: cpu=1000
>>> calculate_utilization: dummy1:3 utilization on i-083f1f54: cpu=1000
>>> calculate_utilization: dummy1:4 utilization on i-093f1f55: cpu=1000
>>> Remaining: i-053f1f59 capacity: cpu=4989000 mem=3840332000
>>> Remaining: i-083f1f54 capacity: cpu=4998000 mem=3840332000
>>> Remaining: i-093f1f55 capacity: cpu=4998000 mem=3840332000
>>> 
>>> 
>>> 
>>> The "balanced" strategy is defined as: "the node that has more free capacity gets consumed first".
>>> Notice that dummy2 consumes cpu=10000, while dummy1 is only 1000 (10x less). After dummy2 was placed on i-053f1f59, that should have consumed enough "cpu" resource to keep dummy1 off it and on the other 2 nodes, but dummy1:0 got placed on the node.
>>> 
>> But i-053f1f59 still has orders of magnitude more cpu capacity left to run things. 
> 
> I don't follow. They're all equal in terms of total "cpu" capacity.

Right. But each node still has 4998000+ units with which to accommodate something that only requires 10000.
Thats about 0.2% of the remaining capacity, so wherever it starts, its hardly making a dint.

>  And at the bottom of the simulate output, the "Remaining" even shows i-053f1f59 has less remaining than the other nodes.
> 
> However after playing with it some more, this appears to be an issue with clones. When I created 5 separate resources instead, this does work as expected. the dummy2 resource gets put on a node by itself, and the other resources get distributed among the remaining nodes (at least until the "cpu" used balances out).
> 
> Since this smells like a bug, I can enter it on the bug tracker you mention below.

Its probably a result of clone stickiness (they have a default of 1) and the hoops we have to jump through to avoid them needlessly shuffling around.

> 
>> 
>>> Also how difficult is it to add a strategy?
>>> 
>> It might be challenging, the policy engine is deep voodoo :)
>> Can you create an entry at bugs.clusterlabs.org and include the result of 'cibadmin -Q' when the cluster is in the state you describe above?
>> 
>> It wont make it into 1.1.12 but we can look at it for .13
>> 
> 
> Will ponder possible scenarios and then enter it. Another thought occurred that you might want to balance based on percentage of capacity used. So now you've got, balanced based on amount of capacity used, balanced based on amount of capacity free, and balance based on percent of capacity. All 3 of them are probably similar enough in logic that the same algorithm could take care of them, would just need a way to tune that algorithm (this would be my guess anyway, no clue what the code looks like).
> 
> 
>> 
>>> I'd be interested in having a strategy which places a resource on a node with the least amount of capacity used? Kind of the inverse of "balanced". The docs say balanced looks at much capacity is free. The 2 strategies would be equivalent if all nodes have the same capacity, but if one node has 10x the capacity of the other nodes, I want the resources to be distributed evenly (based on the capacity each uses), and not over-utilize that one node.
>>> 
>>> Thanks
>>> 
>>> -Patrick
>>> 
>>> _
>>> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140610/ec48f4d3/attachment-0003.sig>