[Pacemaker] lrmd Memory Usage

Tue May 6 05:08:16 EDT 2014

Oh, any any chance you could install the debug packages? It will make the output even more useful :-)

On 6 May 2014, at 7:06 pm, Andrew Beekhof <andrew at beekhof.net> wrote:

> 
> On 6 May 2014, at 6:05 pm, Greg Murphy <greg.murphy at gamesparks.com> wrote:
> 
>> Attached are the valgrind outputs from two separate runs of lrmd with the
>> suggested variables set. Do they help narrow the issue down?
> 
> They do somewhat.  I'll investigate.  But much of the memory is still reachable:
> 
> ==26203==    indirectly lost: 17,945,950 bytes in 642,546 blocks
> ==26203==      possibly lost: 2,805 bytes in 60 blocks
> ==26203==    still reachable: 26,104,781 bytes in 544,782 blocks
> ==26203==         suppressed: 8,652 bytes in 176 blocks
> ==26203== Reachable blocks (those to which a pointer was found) are not shown.
> ==26203== To see them, rerun with: --leak-check=full --show-reachable=yes
> 
> Could you add the --show-reachable=yes to VALGRIND_OPTS variable?
> 
>> 
>> 
>> Thanks
>> 
>> Greg
>> 
>> 
>> On 02/05/2014 03:01, "Andrew Beekhof" <andrew at beekhof.net> wrote:
>> 
>>> 
>>> On 30 Apr 2014, at 9:01 pm, Greg Murphy <greg.murphy at gamesparks.com>
>>> wrote:
>>> 
>>>> Hi
>>>> 
>>>> I¹m running a two-node Pacemaker cluster on Ubuntu Saucy (13.10),
>>>> kernel 3.11.0-17-generic and the Ubuntu Pacemaker package, version
>>>> 1.1.10+git20130802-1ubuntu1.
>>> 
>>> The problem is that I have no way of knowing what code is/isn't included
>>> in '1.1.10+git20130802-1ubuntu1'.
>>> You could try setting the following in your environment before starting
>>> pacemaker though
>>> 
>>> # Variables for running child daemons under valgrind and/or checking for
>>> memory problems
>>> G_SLICE=always-malloc
>>> MALLOC_PERTURB_=221 # or 0
>>> MALLOC_CHECK_=3     # or 0,1,2
>>> PCMK_valgrind_enabled=lrmd
>>> VALGRIND_OPTS="--leak-check=full --trace-children=no --num-callers=25
>>> --log-file=/var/lib/pacemaker/valgrind-%p
>>> --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions
>>> --gen-suppressions=all"
>>> 
>>> 
>>>> The cluster is configured with a DRBD master/slave set and then a
>>>> failover resource group containing MySQL (along with its DRBD
>>>> filesystem) and a Zabbix Proxy and Agent.
>>>> 
>>>> Since I built the cluster around two months ago I¹ve noticed that on
>>>> the the active node the memory footprint of lrmd gradually grows to
>>>> quite a significant size. The cluster was last restarted three weeks
>>>> ago, and now lrmd has over 1GB of mapped memory on the active node and
>>>> only 151MB on the passive node. Current excerpts from /proc/PID/status
>>>> are:
>>>> 
>>>> Active node
>>>> VmPeak:
>>>> 1146740 kB
>>>> VmSize:
>>>> 1146740 kB
>>>> VmLck:
>>>>     0 kB
>>>> VmPin:
>>>>     0 kB
>>>> VmHWM:
>>>> 267680 kB
>>>> VmRSS:
>>>> 188764 kB
>>>> VmData:
>>>> 1065860 kB
>>>> VmStk:
>>>>   136 kB
>>>> VmExe:
>>>>     32 kB
>>>> VmLib:
>>>> 10416 kB
>>>> VmPTE:
>>>>   2164 kB
>>>> VmSwap:
>>>> 822752 kB
>>>> 
>>>> Passive node
>>>> VmPeak:
>>>> 220832 kB
>>>> VmSize:
>>>> 155428 kB
>>>> VmLck:
>>>>     0 kB
>>>> VmPin:
>>>>     0 kB
>>>> VmHWM:
>>>>   4568 kB
>>>> VmRSS:
>>>>   3880 kB
>>>> VmData:
>>>> 74548 kB
>>>> VmStk:
>>>>   136 kB
>>>> VmExe:
>>>>     32 kB
>>>> VmLib:
>>>> 10416 kB
>>>> VmPTE:
>>>>   172 kB
>>>> VmSwap:
>>>>     0 kB
>>>> 
>>>> During the last week or so I¹ve taken a couple of snapshots of
>>>> /proc/PID/smaps on the active node, and the heap particularly stands out
>>>> as growing: (I have the full outputs captured if they¹ll help)
>>>> 
>>>> 20140422
>>>> 7f92e1578000-7f92f218b000 rw-p 00000000 00:00 0
>>>> [heap]
>>>> Size:             274508 kB
>>>> Rss:              180152 kB
>>>> Pss:              180152 kB
>>>> Shared_Clean:          0 kB
>>>> Shared_Dirty:          0 kB
>>>> Private_Clean:         0 kB
>>>> Private_Dirty:    180152 kB
>>>> Referenced:       120472 kB
>>>> Anonymous:        180152 kB
>>>> AnonHugePages:         0 kB
>>>> Swap:              91568 kB
>>>> KernelPageSize:        4 kB
>>>> MMUPageSize:           4 kB
>>>> Locked:                0 kB
>>>> VmFlags: rd wr mr mw me ac
>>>> 
>>>> 
>>>> 20140423
>>>> 7f92e1578000-7f92f305e000 rw-p 00000000 00:00 0
>>>> [heap]
>>>> Size:             289688 kB
>>>> Rss:              184136 kB
>>>> Pss:              184136 kB
>>>> Shared_Clean:          0 kB
>>>> Shared_Dirty:          0 kB
>>>> Private_Clean:         0 kB
>>>> Private_Dirty:    184136 kB
>>>> Referenced:        69748 kB
>>>> Anonymous:        184136 kB
>>>> AnonHugePages:         0 kB
>>>> Swap:             103112 kB
>>>> KernelPageSize:        4 kB
>>>> MMUPageSize:           4 kB
>>>> Locked:                0 kB
>>>> VmFlags: rd wr mr mw me ac
>>>> 
>>>> 20140430
>>>> 7f92e1578000-7f92fc01d000 rw-p 00000000 00:00 0
>>>> [heap]
>>>> Size:             436884 kB
>>>> Rss:              140812 kB
>>>> Pss:              140812 kB
>>>> Shared_Clean:          0 kB
>>>> Shared_Dirty:          0 kB
>>>> Private_Clean:       744 kB
>>>> Private_Dirty:    140068 kB
>>>> Referenced:        43600 kB
>>>> Anonymous:        140812 kB
>>>> AnonHugePages:         0 kB
>>>> Swap:             287392 kB
>>>> KernelPageSize:        4 kB
>>>> MMUPageSize:           4 kB
>>>> Locked:                0 kB
>>>> VmFlags: rd wr mr mw me ac
>>>> 
>>>> I noticed in the release notes for 1.1.10-rc1
>>>> (https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.10-r
>>>> c1) that there was work done to fix "crmd: lrmd: stonithd: fixed memory
>>>> leaks² but I¹m not sure which particular bug this was related to. (And
>>>> those fixes should be in the version I¹m running anyway).
>>>> 
>>>> I¹ve also spotted a few memory leak fixes in
>>>> https://github.com/beekhof/pacemaker, but I¹m not sure whether they
>>>> relate to my issue (assuming I have a memory leak and this isn¹t
>>>> expected behaviour).
>>>> 
>>>> Is there additional debugging that I can perform to check whether I
>>>> have a leak, or is there enough evidence to justify upgrading to 1.1.11?
>>>> 
>>>> Thanks in advance
>>>> 
>>>> Greg Murphy
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>> 
>> 
>> <lrmd.tgz>_______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140506/d94f6163/attachment-0003.sig>