[ClusterLabs] Help understanding resource placement

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu Apr 26 04:36:22 EDT 2018


Hi!

I have a problem which I don't understand:
On a 3-node Xen host cluster (h01, h05, h10) there are about 10 Xen guest VMs.
I had rebooted two of them, and during that the VMs were started on a different host than before. Actually both VMs were running on h01 before the reboot and were started on h10.

However this is against expectations, because the eventual load of the hosts looks like this (VMs per host):
h01: 1
h05: 3
h10: 5

So my expectation was that the last VM being rebooted on h01 would be restarted there.
I checked for errors, but there were none.

I almost suspect that some recent change to the cluster software introduced some bug.
We also use resource utilization (CPUs and RAM in obscure units) to limit the VMs per host.
All three hosts have the same capacity:
Original: h01 capacity: utl_cpu=200 utl_ram=1240
Original: h05 capacity: utl_cpu=200 utl_ram=1240
Original: h10 capacity: utl_cpu=200 utl_ram=1240

The current state leaves these capacities:
Remaining: h01 capacity: utl_cpu=180 utl_ram=1076
Remaining: h05 capacity: utl_cpu=130 utl_ram=748
Remaining: h10 capacity: utl_cpu=100 utl_ram=728

Obviously h01 has most capacity left, and h10 has the fewest.

I don't understand the scores for v04 (the last VM being restarted):
native_color: prm_xen_v04 allocation score on h01: 0
native_color: prm_xen_v04 allocation score on h05: 0
native_color: prm_xen_v04 allocation score on h10: 100
native_assign_node: prm_xen_v04 utilization on h10: utl_cpu=20 utl_ram=10

v07 is the second-to-last VM being restarted:
native_color: prm_xen_v07 allocation score on h01: 0
native_color: prm_xen_v07 allocation score on h05: 0
native_color: prm_xen_v07 allocation score on h10: 100
native_assign_node: prm_xen_v07 utilization on h10: utl_cpu=20 utl_ram=10

Unfortunately I'm not good in using crm_simulate to get the details or reasons for the score.

The only thing I noticed from the crm history log was that a score evaluation was done when stopping v04 ("info: native_color:    Resource prm_xen_v04 cannot run anywhere"), but obviously not before making the decision to start v04 on h10.

Version being used is "Version: 1.1.12-f47ea56" from SLES11 SP4...

Any insights or ideas?

Regards,
Ulrich





More information about the Users mailing list