[Pacemaker] pacemaker processes RSS growth

Fri Dec 7 01:37:25 EST 2012

06.12.2012 09:04, Vladislav Bogdanov wrote:
> 06.12.2012 06:05, Andrew Beekhof wrote:
>> I wonder what the growth looks like with the recent libqb fix.
>> That could be an explanation.
> 
> Valid point. I will watch.

On a almost static cluster the only change in memory state during 24
hours is +700kb of shared memory to crmd on a DC. Will look after that
one for more time.

RSS-SHR (actual malloc'ed memory) remains the same on all nodes for all
processes.

There is some difference between how much memory does specified process
consume on different nodes though. Here are analysis:
pacemakerd takes from 1184 to 2964 kb of RSS-SHR (almost 3 times bigger).
cib takes from 7772 (on DC) to 9692 kb.
crmd takes from 2640 to 3056 kb on non-DC nodes.
stonithd takes from 1664 (on DC where 1 stonith resource runs) to 2936
kb (on a node with no local stonith resources).

pengine and crmd take much more memory on a DC (expected).

lrmd has the same size everywhere (+-4k, depending on number of locally
running resources and size of their parameters?).

pengine has the same size on all non-DC nodes (expected).

attrd differs not more than 12 kb.

The node where I observe maximum values is the same for all processes
(that may be related to the fact that I run a long-living CIB client
there, although I shutdown it for measurements).

Fact that some processes take less memory on DC may be based on
differences between client and server memory consumption for some
inter-node connections.

Anyways, I think that remaining issues from an original report are now
fully fixed with libqb-master and pacemaker-master (with one patch from
your private repo, 4124d27).

I can send the spreadsheet with values if you need.

One more thing I'd want to do is to provide some "load" to cluster
(restart/migrate resources, put nodes to standby/online). May be will do
it later.

Best,
Vladislav.

> 
>>
>> On Sat, Sep 15, 2012 at 5:23 AM, Vladislav Bogdanov
>> <bubble at hoster-ok.com> wrote:
>>> 14.09.2012 09:54, Vladislav Bogdanov wrote:
>>>> 13.09.2012 15:18, Vladislav Bogdanov wrote:
>>>>
>>>> ...
>>>>
>>>>> and now it runs on my testing cluster.
>>>>>
>>>>> Ipc-related memory problems seem to be completely fixed now, processes
>>>>> own memory (RES-SHR in terms of htop) does not grow any longer (after 40
>>>>> minutes). Although I see that both RES and SHR counters sometimes
>>>>> increase synchronously. lrmd does not grow at all. Will look again after
>>>>> few hours.
>>>>
>>>>
>>>> So, lrmd is ok. I see only 4kb growth in RES-SHR on one node (current
>>>> DC). Other instances are of the constant size for almost a day.
>>>>
>>>> I see RES-SHR growth in pacemakerd (>100kb per day). So I expect some
>>>> leakage here. Should I run it under valgrind?
>>>
>>> Valgrind doesn't find anything valuable here (1 and 9 hours runs).
>>>
>>> ==23851== LEAK SUMMARY:
>>> ==23851==    definitely lost: 528 bytes in 3 blocks
>>> ==23851==    indirectly lost: 17,361 bytes in 36 blocks
>>> ==23851==      possibly lost: 234 bytes in 8 blocks
>>> ==23851==    still reachable: 17,458 bytes in 163 blocks
>>> ==23851==         suppressed: 0 bytes in 0 blocks
>>>
>>>>
>>>> And I see that both RES and SHR synchronously grow in crmd (600-700kb
>>>> per day on member nodes, 6Mb on DC), while RES-SHR is reduced by 24kb on
>>>> DC.
>>>>
>>>> And I see cib growth in both RES and SHR in range 12-340 kb, and 4kb
>>>> growth in RES-SHR on nodes except DC.
>>>>
>>>> I can't say for sure what causes growth of shared pages.
>>>> May be it is /dev/shm. Lot of files are there. I'll look if it grows.
>>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>