[Pacemaker] lrmd Memory Usage

Andrew Beekhof andrew at beekhof.net
Wed May 7 06:37:51 EDT 2014


On 7 May 2014, at 8:31 pm, Greg Murphy <greg.murphy at gamesparks.com> wrote:

> Thanks Andrew, much appreciated.
> 
> I’ll try upgrading to 1.11 and report back with how it goes.

At this point it may even be worth trying the .12 release candidate

> 
> 
> 
> On 07/05/2014 01:20, "Andrew Beekhof" <andrew at beekhof.net> wrote:
> 
>> 
>> On 6 May 2014, at 7:47 pm, Greg Murphy <greg.murphy at gamesparks.com> wrote:
>> 
>>> Here you go - I’ve only run lrmd for 30 minutes since installing the
>>> debug
>>> package, but hopefully that’s enough - if not, let me know and I’ll do a
>>> longer capture.
>>> 
>> 
>> I'll keep looking, but almost everything so far seems to be from or
>> related to the g_dbus API:
>> 
>> ...
>> ==37625==    by 0x6F20E30: g_dbus_proxy_new_for_bus_sync (in
>> /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.3800.1)
>> ==37625==    by 0x507B90B: get_proxy (upstart.c:66)
>> ==37625==    by 0x507B9BF: upstart_init (upstart.c:85)
>> ==37625==    by 0x507C88E: upstart_job_exec (upstart.c:429)
>> ==37625==    by 0x10CE03: lrmd_rsc_dispatch (lrmd.c:879)
>> ==37625==    by 0x4E5F112: crm_trigger_dispatch (mainloop.c:105)
>> ==37625==    by 0x58A13B5: g_main_context_dispatch (in
>> /lib/x86_64-linux-gnu/libglib-2.0.so.0.3800.1)
>> ==37625==    by 0x58A1707: ??? (in
>> /lib/x86_64-linux-gnu/libglib-2.0.so.0.3800.1)
>> ==37625==    by 0x58A1B09: g_main_loop_run (in
>> /lib/x86_64-linux-gnu/libglib-2.0.so.0.3800.1)
>> ==37625==    by 0x10AC3A: main (main.c:314)
>> 
>> Which is going to be called every time an upstart job is run (ie.
>> recurring monitor of an upstart resource)
>> 
>> There were several problems with that API and we removed all use of it in
>> 1.1.11.
>> I'm quite confident that most, if not all, of the memory issues would go
>> away if you upgraded.
>> 
>> 
>>> 
>>> 
>>> On 06/05/2014 10:08, "Andrew Beekhof" <andrew at beekhof.net> wrote:
>>> 
>>>> Oh, any any chance you could install the debug packages? It will make
>>>> the
>>>> output even more useful :-)
>>>> 
>>>> On 6 May 2014, at 7:06 pm, Andrew Beekhof <andrew at beekhof.net> wrote:
>>>> 
>>>>> 
>>>>> On 6 May 2014, at 6:05 pm, Greg Murphy <greg.murphy at gamesparks.com>
>>>>> wrote:
>>>>> 
>>>>>> Attached are the valgrind outputs from two separate runs of lrmd with
>>>>>> the
>>>>>> suggested variables set. Do they help narrow the issue down?
>>>>> 
>>>>> They do somewhat.  I'll investigate.  But much of the memory is still
>>>>> reachable:
>>>>> 
>>>>> ==26203==    indirectly lost: 17,945,950 bytes in 642,546 blocks
>>>>> ==26203==      possibly lost: 2,805 bytes in 60 blocks
>>>>> ==26203==    still reachable: 26,104,781 bytes in 544,782 blocks
>>>>> ==26203==         suppressed: 8,652 bytes in 176 blocks
>>>>> ==26203== Reachable blocks (those to which a pointer was found) are
>>>>> not
>>>>> shown.
>>>>> ==26203== To see them, rerun with: --leak-check=full
>>>>> --show-reachable=yes
>>>>> 
>>>>> Could you add the --show-reachable=yes to VALGRIND_OPTS variable?
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> Greg
>>>>>> 
>>>>>> 
>>>>>> On 02/05/2014 03:01, "Andrew Beekhof" <andrew at beekhof.net> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> On 30 Apr 2014, at 9:01 pm, Greg Murphy <greg.murphy at gamesparks.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi
>>>>>>>> 
>>>>>>>> I¹m running a two-node Pacemaker cluster on Ubuntu Saucy (13.10),
>>>>>>>> kernel 3.11.0-17-generic and the Ubuntu Pacemaker package, version
>>>>>>>> 1.1.10+git20130802-1ubuntu1.
>>>>>>> 
>>>>>>> The problem is that I have no way of knowing what code is/isn't
>>>>>>> included
>>>>>>> in '1.1.10+git20130802-1ubuntu1'.
>>>>>>> You could try setting the following in your environment before
>>>>>>> starting
>>>>>>> pacemaker though
>>>>>>> 
>>>>>>> # Variables for running child daemons under valgrind and/or checking
>>>>>>> for
>>>>>>> memory problems
>>>>>>> G_SLICE=always-malloc
>>>>>>> MALLOC_PERTURB_=221 # or 0
>>>>>>> MALLOC_CHECK_=3     # or 0,1,2
>>>>>>> PCMK_valgrind_enabled=lrmd
>>>>>>> VALGRIND_OPTS="--leak-check=full --trace-children=no
>>>>>>> --num-callers=25
>>>>>>> --log-file=/var/lib/pacemaker/valgrind-%p
>>>>>>> --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions
>>>>>>> --gen-suppressions=all"
>>>>>>> 
>>>>>>> 
>>>>>>>> The cluster is configured with a DRBD master/slave set and then a
>>>>>>>> failover resource group containing MySQL (along with its DRBD
>>>>>>>> filesystem) and a Zabbix Proxy and Agent.
>>>>>>>> 
>>>>>>>> Since I built the cluster around two months ago I¹ve noticed that
>>>>>>>> on
>>>>>>>> the the active node the memory footprint of lrmd gradually grows to
>>>>>>>> quite a significant size. The cluster was last restarted three
>>>>>>>> weeks
>>>>>>>> ago, and now lrmd has over 1GB of mapped memory on the active node
>>>>>>>> and
>>>>>>>> only 151MB on the passive node. Current excerpts from
>>>>>>>> /proc/PID/status
>>>>>>>> are:
>>>>>>>> 
>>>>>>>> Active node
>>>>>>>> VmPeak:
>>>>>>>> 1146740 kB
>>>>>>>> VmSize:
>>>>>>>> 1146740 kB
>>>>>>>> VmLck:
>>>>>>>>   0 kB
>>>>>>>> VmPin:
>>>>>>>>   0 kB
>>>>>>>> VmHWM:
>>>>>>>> 267680 kB
>>>>>>>> VmRSS:
>>>>>>>> 188764 kB
>>>>>>>> VmData:
>>>>>>>> 1065860 kB
>>>>>>>> VmStk:
>>>>>>>> 136 kB
>>>>>>>> VmExe:
>>>>>>>>   32 kB
>>>>>>>> VmLib:
>>>>>>>> 10416 kB
>>>>>>>> VmPTE:
>>>>>>>> 2164 kB
>>>>>>>> VmSwap:
>>>>>>>> 822752 kB
>>>>>>>> 
>>>>>>>> Passive node
>>>>>>>> VmPeak:
>>>>>>>> 220832 kB
>>>>>>>> VmSize:
>>>>>>>> 155428 kB
>>>>>>>> VmLck:
>>>>>>>>   0 kB
>>>>>>>> VmPin:
>>>>>>>>   0 kB
>>>>>>>> VmHWM:
>>>>>>>> 4568 kB
>>>>>>>> VmRSS:
>>>>>>>> 3880 kB
>>>>>>>> VmData:
>>>>>>>> 74548 kB
>>>>>>>> VmStk:
>>>>>>>> 136 kB
>>>>>>>> VmExe:
>>>>>>>>   32 kB
>>>>>>>> VmLib:
>>>>>>>> 10416 kB
>>>>>>>> VmPTE:
>>>>>>>> 172 kB
>>>>>>>> VmSwap:
>>>>>>>>   0 kB
>>>>>>>> 
>>>>>>>> During the last week or so I¹ve taken a couple of snapshots of
>>>>>>>> /proc/PID/smaps on the active node, and the heap particularly
>>>>>>>> stands
>>>>>>>> out
>>>>>>>> as growing: (I have the full outputs captured if they¹ll help)
>>>>>>>> 
>>>>>>>> 20140422
>>>>>>>> 7f92e1578000-7f92f218b000 rw-p 00000000 00:00 0
>>>>>>>> [heap]
>>>>>>>> Size:             274508 kB
>>>>>>>> Rss:              180152 kB
>>>>>>>> Pss:              180152 kB
>>>>>>>> Shared_Clean:          0 kB
>>>>>>>> Shared_Dirty:          0 kB
>>>>>>>> Private_Clean:         0 kB
>>>>>>>> Private_Dirty:    180152 kB
>>>>>>>> Referenced:       120472 kB
>>>>>>>> Anonymous:        180152 kB
>>>>>>>> AnonHugePages:         0 kB
>>>>>>>> Swap:              91568 kB
>>>>>>>> KernelPageSize:        4 kB
>>>>>>>> MMUPageSize:           4 kB
>>>>>>>> Locked:                0 kB
>>>>>>>> VmFlags: rd wr mr mw me ac
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 20140423
>>>>>>>> 7f92e1578000-7f92f305e000 rw-p 00000000 00:00 0
>>>>>>>> [heap]
>>>>>>>> Size:             289688 kB
>>>>>>>> Rss:              184136 kB
>>>>>>>> Pss:              184136 kB
>>>>>>>> Shared_Clean:          0 kB
>>>>>>>> Shared_Dirty:          0 kB
>>>>>>>> Private_Clean:         0 kB
>>>>>>>> Private_Dirty:    184136 kB
>>>>>>>> Referenced:        69748 kB
>>>>>>>> Anonymous:        184136 kB
>>>>>>>> AnonHugePages:         0 kB
>>>>>>>> Swap:             103112 kB
>>>>>>>> KernelPageSize:        4 kB
>>>>>>>> MMUPageSize:           4 kB
>>>>>>>> Locked:                0 kB
>>>>>>>> VmFlags: rd wr mr mw me ac
>>>>>>>> 
>>>>>>>> 20140430
>>>>>>>> 7f92e1578000-7f92fc01d000 rw-p 00000000 00:00 0
>>>>>>>> [heap]
>>>>>>>> Size:             436884 kB
>>>>>>>> Rss:              140812 kB
>>>>>>>> Pss:              140812 kB
>>>>>>>> Shared_Clean:          0 kB
>>>>>>>> Shared_Dirty:          0 kB
>>>>>>>> Private_Clean:       744 kB
>>>>>>>> Private_Dirty:    140068 kB
>>>>>>>> Referenced:        43600 kB
>>>>>>>> Anonymous:        140812 kB
>>>>>>>> AnonHugePages:         0 kB
>>>>>>>> Swap:             287392 kB
>>>>>>>> KernelPageSize:        4 kB
>>>>>>>> MMUPageSize:           4 kB
>>>>>>>> Locked:                0 kB
>>>>>>>> VmFlags: rd wr mr mw me ac
>>>>>>>> 
>>>>>>>> I noticed in the release notes for 1.1.10-rc1
>>>>>>>> 
>>>>>>>> 
>>>>>>>> (https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1
>>>>>>>> .1
>>>>>>>> 0-r
>>>>>>>> c1) that there was work done to fix "crmd: lrmd: stonithd: fixed
>>>>>>>> memory
>>>>>>>> leaks² but I¹m not sure which particular bug this was related to.
>>>>>>>> (And
>>>>>>>> those fixes should be in the version I¹m running anyway).
>>>>>>>> 
>>>>>>>> I¹ve also spotted a few memory leak fixes in
>>>>>>>> https://github.com/beekhof/pacemaker, but I¹m not sure whether they
>>>>>>>> relate to my issue (assuming I have a memory leak and this isn¹t
>>>>>>>> expected behaviour).
>>>>>>>> 
>>>>>>>> Is there additional debugging that I can perform to check whether I
>>>>>>>> have a leak, or is there enough evidence to justify upgrading to
>>>>>>>> 1.1.11?
>>>>>>>> 
>>>>>>>> Thanks in advance
>>>>>>>> 
>>>>>>>> Greg Murphy
>>>>>>>> _______________________________________________
>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>> 
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started:
>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>> 
>>>>>> 
>>>>>> <lrmd.tgz>_______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>> 
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>> 
>>>> 
>>> 
>>> <lrmd-dbg.tgz>_______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140507/5d22fa98/attachment-0003.sig>


More information about the Pacemaker mailing list