[Pacemaker] pacemaker processes RSS growth

Mon Dec 10 21:08:12 EST 2012

On Mon, Dec 10, 2012 at 5:56 PM, Vladislav Bogdanov
<bubble at hoster-ok.com> wrote:
> 10.12.2012 04:29, Andrew Beekhof wrote:
>> On Fri, Dec 7, 2012 at 5:37 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>> 06.12.2012 09:04, Vladislav Bogdanov wrote:
>>>> 06.12.2012 06:05, Andrew Beekhof wrote:
>>>>> I wonder what the growth looks like with the recent libqb fix.
>>>>> That could be an explanation.
>>>>
>>>> Valid point. I will watch.
>>>
>>> On a almost static cluster the only change in memory state during 24
>>> hours is +700kb of shared memory to crmd on a DC. Will look after that
>>> one for more time.
>
> It still grows. ~650-700k per day. I sampled 'maps' and 'smaps' content
> from crmd's proc and will look what differs there over the time.

:frown:

>> The blackbox was disabled?
>
> I did not enable it. If it is disabled by default, then it should be
> disabled.
> There are only two files in /dev/shm which have 'blackbox' in a
> filename: qb-corosync-blackbox-data and qb-corosync-blackbox-header.

Ok, thats good.

>
>
>>
>>>
>>> RSS-SHR (actual malloc'ed memory) remains the same on all nodes for all
>>> processes.
>>
>> Thats encouraging.
>>
>>> There is some difference between how much memory does specified process
>>> consume on different nodes though. Here are analysis:
>>> pacemakerd takes from 1184 to 2964 kb of RSS-SHR (almost 3 times bigger).
>>> cib takes from 7772 (on DC) to 9692 kb.
>>> crmd takes from 2640 to 3056 kb on non-DC nodes.
>>> stonithd takes from 1664 (on DC where 1 stonith resource runs) to 2936
>>> kb (on a node with no local stonith resources).
>>>
>>> pengine and crmd take much more memory on a DC (expected).
>>>
>>> lrmd has the same size everywhere (+-4k, depending on number of locally
>>> running resources and size of their parameters?).
>>>
>>> pengine has the same size on all non-DC nodes (expected).
>>>
>>> attrd differs not more than 12 kb.
>>>
>>>
>>> The node where I observe maximum values is the same for all processes
>>> (that may be related to the fact that I run a long-living CIB client
>>> there, although I shutdown it for measurements).
>>>
>>> Fact that some processes take less memory on DC may be based on
>>> differences between client and server memory consumption for some
>>> inter-node connections.
>>
>> Hmmm... still sounds odd to me.
>
> That may be additional glib types needed for client-side communications
> support, or something else from glib. Anyways it does not grow, so it is ok.
>
>>
>>> Anyways, I think that remaining issues from an original report are now
>>> fully fixed with libqb-master and pacemaker-master (with one patch from
>>> your private repo, 4124d27).
>>>
>>> I can send the spreadsheet with values if you need.
>>>
>>> One more thing I'd want to do is to provide some "load" to cluster
>>> (restart/migrate resources, put nodes to standby/online). May be will do
>>> it later.
>>
>>
>> That would be interesting
>
> Ok, I will do that after I understand why crmd on DC grows in SHM.
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org