[Pacemaker] pacemaker processes RSS growth

Mon Dec 10 01:56:04 EST 2012

10.12.2012 04:29, Andrew Beekhof wrote:
> On Fri, Dec 7, 2012 at 5:37 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>> 06.12.2012 09:04, Vladislav Bogdanov wrote:
>>> 06.12.2012 06:05, Andrew Beekhof wrote:
>>>> I wonder what the growth looks like with the recent libqb fix.
>>>> That could be an explanation.
>>>
>>> Valid point. I will watch.
>>
>> On a almost static cluster the only change in memory state during 24
>> hours is +700kb of shared memory to crmd on a DC. Will look after that
>> one for more time.

It still grows. ~650-700k per day. I sampled 'maps' and 'smaps' content
from crmd's proc and will look what differs there over the time.

> The blackbox was disabled?

I did not enable it. If it is disabled by default, then it should be
disabled.
There are only two files in /dev/shm which have 'blackbox' in a
filename: qb-corosync-blackbox-data and qb-corosync-blackbox-header.

> 
>>
>> RSS-SHR (actual malloc'ed memory) remains the same on all nodes for all
>> processes.
> 
> Thats encouraging.
> 
>> There is some difference between how much memory does specified process
>> consume on different nodes though. Here are analysis:
>> pacemakerd takes from 1184 to 2964 kb of RSS-SHR (almost 3 times bigger).
>> cib takes from 7772 (on DC) to 9692 kb.
>> crmd takes from 2640 to 3056 kb on non-DC nodes.
>> stonithd takes from 1664 (on DC where 1 stonith resource runs) to 2936
>> kb (on a node with no local stonith resources).
>>
>> pengine and crmd take much more memory on a DC (expected).
>>
>> lrmd has the same size everywhere (+-4k, depending on number of locally
>> running resources and size of their parameters?).
>>
>> pengine has the same size on all non-DC nodes (expected).
>>
>> attrd differs not more than 12 kb.
>>
>>
>> The node where I observe maximum values is the same for all processes
>> (that may be related to the fact that I run a long-living CIB client
>> there, although I shutdown it for measurements).
>>
>> Fact that some processes take less memory on DC may be based on
>> differences between client and server memory consumption for some
>> inter-node connections.
> 
> Hmmm... still sounds odd to me.

That may be additional glib types needed for client-side communications
support, or something else from glib. Anyways it does not grow, so it is ok.

> 
>> Anyways, I think that remaining issues from an original report are now
>> fully fixed with libqb-master and pacemaker-master (with one patch from
>> your private repo, 4124d27).
>>
>> I can send the spreadsheet with values if you need.
>>
>> One more thing I'd want to do is to provide some "load" to cluster
>> (restart/migrate resources, put nodes to standby/online). May be will do
>> it later.
> 
> 
> That would be interesting

Ok, I will do that after I understand why crmd on DC grows in SHM.