[Pacemaker] Managing big number of globally-unique clone instances

Mon Jul 21 07:09:58 CEST 2014

21.07.2014 06:21, Andrew Beekhof wrote:
> 
> On 18 Jul 2014, at 5:16 pm, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
> 
>> Hi Andrew, all,
>>
>> I have a task which seems to be easily solvable with the use of
>> globally-unique clone: start huge number of specific virtual machines to
>> provide a load to a connection multiplexer.
>>
>> I decided to look how pacemaker behaves in such setup with Dummy
>> resource agent, and found that handling of every instance in an
>> "initial" transition (probe+start) slows down with increase of clone-max.
> 
> "yep"
> 
> for non unique clones the number of probes needed is N, where N is the number of nodes.
> for unique clones, we must test every instance and node combination, or N*M, where M is clone-max.
> 
> And that's just the running of the probes... just figuring out which nodes need to be
> probed is incredibly resource intensive (run crm_simulate and it will be painfully obvious). 
> 
>>
>> F.e. for 256 instances transition took 225 seconds, ~0.88s per instance.
>> After I added 768 more instances (set clone-max to 1024) together with
>> increasing batch-limit to 512, transition took almost an hour (3507
>> seconds), or ~4.57s per added instance. Even if I take in account that
>> monitoring of already started instances consumes some resources, last
>> number seems to be rather big,

I believe this ^ is the main point.
If with N instances probe/start of _each_ instance takes X time slots,
then with 4*N instances probe/start of _each_ instance takes ~5*X time
slots. In an ideal world, I would expect it to remain constant.
Otherwise we have an issue with scalability into this direction.

>>
>> Main CPU consumer on DC while transition is running is crmd, Its memory
>> footprint is around 85Mb, resulting CIB size together with the status
>> section is around 2Mb,
> 
> You said CPU and then listed RAM...

Something wrong with that? :)
That just three distinct facts.

> 
>>
>> Could it be possible to optimize this use-case from your opinion with
>> minimal efforts? Could it be optimized with just configuration? Or may
>> it be some trivial development task, f.e replace one GList with
>> GHashtable somewhere?
> 
> Optimize: yes, Minimal: no
> 
>>
>> Sure I can look deeper and get any additional information, f.e. to get
>> crmd profiling results if it is hard to get an answer just from the head.
> 
> Perhaps start looking in clone_create_probe()

Got it, thanks for pointer!

> 
>>
>> Best,
>> Vladislav
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>