[Pacemaker] lrmd Memory Usage

Tue May 6 11:47:14 CEST 2014

Here you go - I’ve only run lrmd for 30 minutes since installing the debug
package, but hopefully that’s enough - if not, let me know and I’ll do a
longer capture.

On 06/05/2014 10:08, "Andrew Beekhof" <andrew at beekhof.net> wrote:

>Oh, any any chance you could install the debug packages? It will make the
>output even more useful :-)
>
>On 6 May 2014, at 7:06 pm, Andrew Beekhof <andrew at beekhof.net> wrote:
>
>> 
>> On 6 May 2014, at 6:05 pm, Greg Murphy <greg.murphy at gamesparks.com>
>>wrote:
>> 
>>> Attached are the valgrind outputs from two separate runs of lrmd with
>>>the
>>> suggested variables set. Do they help narrow the issue down?
>> 
>> They do somewhat.  I'll investigate.  But much of the memory is still
>>reachable:
>> 
>> ==26203==    indirectly lost: 17,945,950 bytes in 642,546 blocks
>> ==26203==      possibly lost: 2,805 bytes in 60 blocks
>> ==26203==    still reachable: 26,104,781 bytes in 544,782 blocks
>> ==26203==         suppressed: 8,652 bytes in 176 blocks
>> ==26203== Reachable blocks (those to which a pointer was found) are not
>>shown.
>> ==26203== To see them, rerun with: --leak-check=full
>>--show-reachable=yes
>> 
>> Could you add the --show-reachable=yes to VALGRIND_OPTS variable?
>> 
>>> 
>>> 
>>> Thanks
>>> 
>>> Greg
>>> 
>>> 
>>> On 02/05/2014 03:01, "Andrew Beekhof" <andrew at beekhof.net> wrote:
>>> 
>>>> 
>>>> On 30 Apr 2014, at 9:01 pm, Greg Murphy <greg.murphy at gamesparks.com>
>>>> wrote:
>>>> 
>>>>> Hi
>>>>> 
>>>>> I¹m running a two-node Pacemaker cluster on Ubuntu Saucy (13.10),
>>>>> kernel 3.11.0-17-generic and the Ubuntu Pacemaker package, version
>>>>> 1.1.10+git20130802-1ubuntu1.
>>>> 
>>>> The problem is that I have no way of knowing what code is/isn't
>>>>included
>>>> in '1.1.10+git20130802-1ubuntu1'.
>>>> You could try setting the following in your environment before
>>>>starting
>>>> pacemaker though
>>>> 
>>>> # Variables for running child daemons under valgrind and/or checking
>>>>for
>>>> memory problems
>>>> G_SLICE=always-malloc
>>>> MALLOC_PERTURB_=221 # or 0
>>>> MALLOC_CHECK_=3     # or 0,1,2
>>>> PCMK_valgrind_enabled=lrmd
>>>> VALGRIND_OPTS="--leak-check=full --trace-children=no --num-callers=25
>>>> --log-file=/var/lib/pacemaker/valgrind-%p
>>>> --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions
>>>> --gen-suppressions=all"
>>>> 
>>>> 
>>>>> The cluster is configured with a DRBD master/slave set and then a
>>>>> failover resource group containing MySQL (along with its DRBD
>>>>> filesystem) and a Zabbix Proxy and Agent.
>>>>> 
>>>>> Since I built the cluster around two months ago I¹ve noticed that on
>>>>> the the active node the memory footprint of lrmd gradually grows to
>>>>> quite a significant size. The cluster was last restarted three weeks
>>>>> ago, and now lrmd has over 1GB of mapped memory on the active node
>>>>>and
>>>>> only 151MB on the passive node. Current excerpts from
>>>>>/proc/PID/status
>>>>> are:
>>>>> 
>>>>> Active node
>>>>> VmPeak:
>>>>> 1146740 kB
>>>>> VmSize:
>>>>> 1146740 kB
>>>>> VmLck:
>>>>>     0 kB
>>>>> VmPin:
>>>>>     0 kB
>>>>> VmHWM:
>>>>> 267680 kB
>>>>> VmRSS:
>>>>> 188764 kB
>>>>> VmData:
>>>>> 1065860 kB
>>>>> VmStk:
>>>>>   136 kB
>>>>> VmExe:
>>>>>     32 kB
>>>>> VmLib:
>>>>> 10416 kB
>>>>> VmPTE:
>>>>>   2164 kB
>>>>> VmSwap:
>>>>> 822752 kB
>>>>> 
>>>>> Passive node
>>>>> VmPeak:
>>>>> 220832 kB
>>>>> VmSize:
>>>>> 155428 kB
>>>>> VmLck:
>>>>>     0 kB
>>>>> VmPin:
>>>>>     0 kB
>>>>> VmHWM:
>>>>>   4568 kB
>>>>> VmRSS:
>>>>>   3880 kB
>>>>> VmData:
>>>>> 74548 kB
>>>>> VmStk:
>>>>>   136 kB
>>>>> VmExe:
>>>>>     32 kB
>>>>> VmLib:
>>>>> 10416 kB
>>>>> VmPTE:
>>>>>   172 kB
>>>>> VmSwap:
>>>>>     0 kB
>>>>> 
>>>>> During the last week or so I¹ve taken a couple of snapshots of
>>>>> /proc/PID/smaps on the active node, and the heap particularly stands
>>>>>out
>>>>> as growing: (I have the full outputs captured if they¹ll help)
>>>>> 
>>>>> 20140422
>>>>> 7f92e1578000-7f92f218b000 rw-p 00000000 00:00 0
>>>>> [heap]
>>>>> Size:             274508 kB
>>>>> Rss:              180152 kB
>>>>> Pss:              180152 kB
>>>>> Shared_Clean:          0 kB
>>>>> Shared_Dirty:          0 kB
>>>>> Private_Clean:         0 kB
>>>>> Private_Dirty:    180152 kB
>>>>> Referenced:       120472 kB
>>>>> Anonymous:        180152 kB
>>>>> AnonHugePages:         0 kB
>>>>> Swap:              91568 kB
>>>>> KernelPageSize:        4 kB
>>>>> MMUPageSize:           4 kB
>>>>> Locked:                0 kB
>>>>> VmFlags: rd wr mr mw me ac
>>>>> 
>>>>> 
>>>>> 20140423
>>>>> 7f92e1578000-7f92f305e000 rw-p 00000000 00:00 0
>>>>> [heap]
>>>>> Size:             289688 kB
>>>>> Rss:              184136 kB
>>>>> Pss:              184136 kB
>>>>> Shared_Clean:          0 kB
>>>>> Shared_Dirty:          0 kB
>>>>> Private_Clean:         0 kB
>>>>> Private_Dirty:    184136 kB
>>>>> Referenced:        69748 kB
>>>>> Anonymous:        184136 kB
>>>>> AnonHugePages:         0 kB
>>>>> Swap:             103112 kB
>>>>> KernelPageSize:        4 kB
>>>>> MMUPageSize:           4 kB
>>>>> Locked:                0 kB
>>>>> VmFlags: rd wr mr mw me ac
>>>>> 
>>>>> 20140430
>>>>> 7f92e1578000-7f92fc01d000 rw-p 00000000 00:00 0
>>>>> [heap]
>>>>> Size:             436884 kB
>>>>> Rss:              140812 kB
>>>>> Pss:              140812 kB
>>>>> Shared_Clean:          0 kB
>>>>> Shared_Dirty:          0 kB
>>>>> Private_Clean:       744 kB
>>>>> Private_Dirty:    140068 kB
>>>>> Referenced:        43600 kB
>>>>> Anonymous:        140812 kB
>>>>> AnonHugePages:         0 kB
>>>>> Swap:             287392 kB
>>>>> KernelPageSize:        4 kB
>>>>> MMUPageSize:           4 kB
>>>>> Locked:                0 kB
>>>>> VmFlags: rd wr mr mw me ac
>>>>> 
>>>>> I noticed in the release notes for 1.1.10-rc1
>>>>> 
>>>>>(https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.1
>>>>>0-r
>>>>> c1) that there was work done to fix "crmd: lrmd: stonithd: fixed
>>>>>memory
>>>>> leaks² but I¹m not sure which particular bug this was related to.
>>>>>(And
>>>>> those fixes should be in the version I¹m running anyway).
>>>>> 
>>>>> I¹ve also spotted a few memory leak fixes in
>>>>> https://github.com/beekhof/pacemaker, but I¹m not sure whether they
>>>>> relate to my issue (assuming I have a memory leak and this isn¹t
>>>>> expected behaviour).
>>>>> 
>>>>> Is there additional debugging that I can perform to check whether I
>>>>> have a leak, or is there enough evidence to justify upgrading to
>>>>>1.1.11?
>>>>> 
>>>>> Thanks in advance
>>>>> 
>>>>> Greg Murphy
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>> 
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>>>>>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>> 
>>> 
>>> <lrmd.tgz>_______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started:
>>>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: lrmd-dbg.tgz
Type: application/octet-stream
Size: 61898 bytes
Desc: lrmd-dbg.tgz
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140506/1271c01d/attachment-0001.obj>