[Pacemaker] lrmd Memory Usage

Tue May 6 09:06:14 UTC 2014

On 6 May 2014, at 6:05 pm, Greg Murphy <greg.murphy at gamesparks.com> wrote:

> Attached are the valgrind outputs from two separate runs of lrmd with the
> suggested variables set. Do they help narrow the issue down?

They do somewhat.  I'll investigate.  But much of the memory is still reachable:

==26203==    indirectly lost: 17,945,950 bytes in 642,546 blocks
==26203==      possibly lost: 2,805 bytes in 60 blocks
==26203==    still reachable: 26,104,781 bytes in 544,782 blocks
==26203==         suppressed: 8,652 bytes in 176 blocks
==26203== Reachable blocks (those to which a pointer was found) are not shown.
==26203== To see them, rerun with: --leak-check=full --show-reachable=yes

Could you add the --show-reachable=yes to VALGRIND_OPTS variable?

> 
> 
> Thanks
> 
> Greg
> 
> 
> On 02/05/2014 03:01, "Andrew Beekhof" <andrew at beekhof.net> wrote:
> 
>> 
>> On 30 Apr 2014, at 9:01 pm, Greg Murphy <greg.murphy at gamesparks.com>
>> wrote:
>> 
>>> Hi
>>> 
>>> I¹m running a two-node Pacemaker cluster on Ubuntu Saucy (13.10),
>>> kernel 3.11.0-17-generic and the Ubuntu Pacemaker package, version
>>> 1.1.10+git20130802-1ubuntu1.
>> 
>> The problem is that I have no way of knowing what code is/isn't included
>> in '1.1.10+git20130802-1ubuntu1'.
>> You could try setting the following in your environment before starting
>> pacemaker though
>> 
>> # Variables for running child daemons under valgrind and/or checking for
>> memory problems
>> G_SLICE=always-malloc
>> MALLOC_PERTURB_=221 # or 0
>> MALLOC_CHECK_=3     # or 0,1,2
>> PCMK_valgrind_enabled=lrmd
>> VALGRIND_OPTS="--leak-check=full --trace-children=no --num-callers=25
>> --log-file=/var/lib/pacemaker/valgrind-%p
>> --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions
>> --gen-suppressions=all"
>> 
>> 
>>> The cluster is configured with a DRBD master/slave set and then a
>>> failover resource group containing MySQL (along with its DRBD
>>> filesystem) and a Zabbix Proxy and Agent.
>>> 
>>> Since I built the cluster around two months ago I¹ve noticed that on
>>> the the active node the memory footprint of lrmd gradually grows to
>>> quite a significant size. The cluster was last restarted three weeks
>>> ago, and now lrmd has over 1GB of mapped memory on the active node and
>>> only 151MB on the passive node. Current excerpts from /proc/PID/status
>>> are:
>>> 
>>> Active node
>>> VmPeak:
>>> 1146740 kB
>>> VmSize:
>>> 1146740 kB
>>> VmLck:
>>>      0 kB
>>> VmPin:
>>>      0 kB
>>> VmHWM:
>>>  267680 kB
>>> VmRSS:
>>>  188764 kB
>>> VmData:
>>> 1065860 kB
>>> VmStk:
>>>    136 kB
>>> VmExe:
>>>      32 kB
>>> VmLib:
>>>  10416 kB
>>> VmPTE:
>>>    2164 kB
>>> VmSwap:
>>>  822752 kB
>>> 
>>> Passive node
>>> VmPeak:
>>>  220832 kB
>>> VmSize:
>>>  155428 kB
>>> VmLck:
>>>      0 kB
>>> VmPin:
>>>      0 kB
>>> VmHWM:
>>>    4568 kB
>>> VmRSS:
>>>    3880 kB
>>> VmData:
>>>  74548 kB
>>> VmStk:
>>>    136 kB
>>> VmExe:
>>>      32 kB
>>> VmLib:
>>>  10416 kB
>>> VmPTE:
>>>    172 kB
>>> VmSwap:
>>>      0 kB
>>> 
>>> During the last week or so I¹ve taken a couple of snapshots of
>>> /proc/PID/smaps on the active node, and the heap particularly stands out
>>> as growing: (I have the full outputs captured if they¹ll help)
>>> 
>>> 20140422
>>> 7f92e1578000-7f92f218b000 rw-p 00000000 00:00 0
>>> [heap]
>>> Size:             274508 kB
>>> Rss:              180152 kB
>>> Pss:              180152 kB
>>> Shared_Clean:          0 kB
>>> Shared_Dirty:          0 kB
>>> Private_Clean:         0 kB
>>> Private_Dirty:    180152 kB
>>> Referenced:       120472 kB
>>> Anonymous:        180152 kB
>>> AnonHugePages:         0 kB
>>> Swap:              91568 kB
>>> KernelPageSize:        4 kB
>>> MMUPageSize:           4 kB
>>> Locked:                0 kB
>>> VmFlags: rd wr mr mw me ac
>>> 
>>> 
>>> 20140423
>>> 7f92e1578000-7f92f305e000 rw-p 00000000 00:00 0
>>> [heap]
>>> Size:             289688 kB
>>> Rss:              184136 kB
>>> Pss:              184136 kB
>>> Shared_Clean:          0 kB
>>> Shared_Dirty:          0 kB
>>> Private_Clean:         0 kB
>>> Private_Dirty:    184136 kB
>>> Referenced:        69748 kB
>>> Anonymous:        184136 kB
>>> AnonHugePages:         0 kB
>>> Swap:             103112 kB
>>> KernelPageSize:        4 kB
>>> MMUPageSize:           4 kB
>>> Locked:                0 kB
>>> VmFlags: rd wr mr mw me ac
>>> 
>>> 20140430
>>> 7f92e1578000-7f92fc01d000 rw-p 00000000 00:00 0
>>> [heap]
>>> Size:             436884 kB
>>> Rss:              140812 kB
>>> Pss:              140812 kB
>>> Shared_Clean:          0 kB
>>> Shared_Dirty:          0 kB
>>> Private_Clean:       744 kB
>>> Private_Dirty:    140068 kB
>>> Referenced:        43600 kB
>>> Anonymous:        140812 kB
>>> AnonHugePages:         0 kB
>>> Swap:             287392 kB
>>> KernelPageSize:        4 kB
>>> MMUPageSize:           4 kB
>>> Locked:                0 kB
>>> VmFlags: rd wr mr mw me ac
>>> 
>>> I noticed in the release notes for 1.1.10-rc1
>>> (https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.10-r
>>> c1) that there was work done to fix "crmd: lrmd: stonithd: fixed memory
>>> leaks² but I¹m not sure which particular bug this was related to. (And
>>> those fixes should be in the version I¹m running anyway).
>>> 
>>> I¹ve also spotted a few memory leak fixes in
>>> https://github.com/beekhof/pacemaker, but I¹m not sure whether they
>>> relate to my issue (assuming I have a memory leak and this isn¹t
>>> expected behaviour).
>>> 
>>> Is there additional debugging that I can perform to check whether I
>>> have a leak, or is there enough evidence to justify upgrading to 1.1.11?
>>> 
>>> Thanks in advance
>>> 
>>> Greg Murphy
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> <lrmd.tgz>_______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140506/5898db43/attachment-0004.sig>