[Pacemaker] lrmd Memory Usage
Greg Murphy
greg.murphy at gamesparks.com
Wed May 7 10:31:30 UTC 2014
Thanks Andrew, much appreciated.
I’ll try upgrading to 1.11 and report back with how it goes.
On 07/05/2014 01:20, "Andrew Beekhof" <andrew at beekhof.net> wrote:
>
>On 6 May 2014, at 7:47 pm, Greg Murphy <greg.murphy at gamesparks.com> wrote:
>
>> Here you go - I’ve only run lrmd for 30 minutes since installing the
>>debug
>> package, but hopefully that’s enough - if not, let me know and I’ll do a
>> longer capture.
>>
>
>I'll keep looking, but almost everything so far seems to be from or
>related to the g_dbus API:
>
>...
>==37625== by 0x6F20E30: g_dbus_proxy_new_for_bus_sync (in
>/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.3800.1)
>==37625== by 0x507B90B: get_proxy (upstart.c:66)
>==37625== by 0x507B9BF: upstart_init (upstart.c:85)
>==37625== by 0x507C88E: upstart_job_exec (upstart.c:429)
>==37625== by 0x10CE03: lrmd_rsc_dispatch (lrmd.c:879)
>==37625== by 0x4E5F112: crm_trigger_dispatch (mainloop.c:105)
>==37625== by 0x58A13B5: g_main_context_dispatch (in
>/lib/x86_64-linux-gnu/libglib-2.0.so.0.3800.1)
>==37625== by 0x58A1707: ??? (in
>/lib/x86_64-linux-gnu/libglib-2.0.so.0.3800.1)
>==37625== by 0x58A1B09: g_main_loop_run (in
>/lib/x86_64-linux-gnu/libglib-2.0.so.0.3800.1)
>==37625== by 0x10AC3A: main (main.c:314)
>
>Which is going to be called every time an upstart job is run (ie.
>recurring monitor of an upstart resource)
>
>There were several problems with that API and we removed all use of it in
>1.1.11.
>I'm quite confident that most, if not all, of the memory issues would go
>away if you upgraded.
>
>
>>
>>
>> On 06/05/2014 10:08, "Andrew Beekhof" <andrew at beekhof.net> wrote:
>>
>>> Oh, any any chance you could install the debug packages? It will make
>>>the
>>> output even more useful :-)
>>>
>>> On 6 May 2014, at 7:06 pm, Andrew Beekhof <andrew at beekhof.net> wrote:
>>>
>>>>
>>>> On 6 May 2014, at 6:05 pm, Greg Murphy <greg.murphy at gamesparks.com>
>>>> wrote:
>>>>
>>>>> Attached are the valgrind outputs from two separate runs of lrmd with
>>>>> the
>>>>> suggested variables set. Do they help narrow the issue down?
>>>>
>>>> They do somewhat. I'll investigate. But much of the memory is still
>>>> reachable:
>>>>
>>>> ==26203== indirectly lost: 17,945,950 bytes in 642,546 blocks
>>>> ==26203== possibly lost: 2,805 bytes in 60 blocks
>>>> ==26203== still reachable: 26,104,781 bytes in 544,782 blocks
>>>> ==26203== suppressed: 8,652 bytes in 176 blocks
>>>> ==26203== Reachable blocks (those to which a pointer was found) are
>>>>not
>>>> shown.
>>>> ==26203== To see them, rerun with: --leak-check=full
>>>> --show-reachable=yes
>>>>
>>>> Could you add the --show-reachable=yes to VALGRIND_OPTS variable?
>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Greg
>>>>>
>>>>>
>>>>> On 02/05/2014 03:01, "Andrew Beekhof" <andrew at beekhof.net> wrote:
>>>>>
>>>>>>
>>>>>> On 30 Apr 2014, at 9:01 pm, Greg Murphy <greg.murphy at gamesparks.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>> I¹m running a two-node Pacemaker cluster on Ubuntu Saucy (13.10),
>>>>>>> kernel 3.11.0-17-generic and the Ubuntu Pacemaker package, version
>>>>>>> 1.1.10+git20130802-1ubuntu1.
>>>>>>
>>>>>> The problem is that I have no way of knowing what code is/isn't
>>>>>> included
>>>>>> in '1.1.10+git20130802-1ubuntu1'.
>>>>>> You could try setting the following in your environment before
>>>>>> starting
>>>>>> pacemaker though
>>>>>>
>>>>>> # Variables for running child daemons under valgrind and/or checking
>>>>>> for
>>>>>> memory problems
>>>>>> G_SLICE=always-malloc
>>>>>> MALLOC_PERTURB_=221 # or 0
>>>>>> MALLOC_CHECK_=3 # or 0,1,2
>>>>>> PCMK_valgrind_enabled=lrmd
>>>>>> VALGRIND_OPTS="--leak-check=full --trace-children=no
>>>>>>--num-callers=25
>>>>>> --log-file=/var/lib/pacemaker/valgrind-%p
>>>>>> --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions
>>>>>> --gen-suppressions=all"
>>>>>>
>>>>>>
>>>>>>> The cluster is configured with a DRBD master/slave set and then a
>>>>>>> failover resource group containing MySQL (along with its DRBD
>>>>>>> filesystem) and a Zabbix Proxy and Agent.
>>>>>>>
>>>>>>> Since I built the cluster around two months ago I¹ve noticed that
>>>>>>>on
>>>>>>> the the active node the memory footprint of lrmd gradually grows to
>>>>>>> quite a significant size. The cluster was last restarted three
>>>>>>>weeks
>>>>>>> ago, and now lrmd has over 1GB of mapped memory on the active node
>>>>>>> and
>>>>>>> only 151MB on the passive node. Current excerpts from
>>>>>>> /proc/PID/status
>>>>>>> are:
>>>>>>>
>>>>>>> Active node
>>>>>>> VmPeak:
>>>>>>> 1146740 kB
>>>>>>> VmSize:
>>>>>>> 1146740 kB
>>>>>>> VmLck:
>>>>>>> 0 kB
>>>>>>> VmPin:
>>>>>>> 0 kB
>>>>>>> VmHWM:
>>>>>>> 267680 kB
>>>>>>> VmRSS:
>>>>>>> 188764 kB
>>>>>>> VmData:
>>>>>>> 1065860 kB
>>>>>>> VmStk:
>>>>>>> 136 kB
>>>>>>> VmExe:
>>>>>>> 32 kB
>>>>>>> VmLib:
>>>>>>> 10416 kB
>>>>>>> VmPTE:
>>>>>>> 2164 kB
>>>>>>> VmSwap:
>>>>>>> 822752 kB
>>>>>>>
>>>>>>> Passive node
>>>>>>> VmPeak:
>>>>>>> 220832 kB
>>>>>>> VmSize:
>>>>>>> 155428 kB
>>>>>>> VmLck:
>>>>>>> 0 kB
>>>>>>> VmPin:
>>>>>>> 0 kB
>>>>>>> VmHWM:
>>>>>>> 4568 kB
>>>>>>> VmRSS:
>>>>>>> 3880 kB
>>>>>>> VmData:
>>>>>>> 74548 kB
>>>>>>> VmStk:
>>>>>>> 136 kB
>>>>>>> VmExe:
>>>>>>> 32 kB
>>>>>>> VmLib:
>>>>>>> 10416 kB
>>>>>>> VmPTE:
>>>>>>> 172 kB
>>>>>>> VmSwap:
>>>>>>> 0 kB
>>>>>>>
>>>>>>> During the last week or so I¹ve taken a couple of snapshots of
>>>>>>> /proc/PID/smaps on the active node, and the heap particularly
>>>>>>>stands
>>>>>>> out
>>>>>>> as growing: (I have the full outputs captured if they¹ll help)
>>>>>>>
>>>>>>> 20140422
>>>>>>> 7f92e1578000-7f92f218b000 rw-p 00000000 00:00 0
>>>>>>> [heap]
>>>>>>> Size: 274508 kB
>>>>>>> Rss: 180152 kB
>>>>>>> Pss: 180152 kB
>>>>>>> Shared_Clean: 0 kB
>>>>>>> Shared_Dirty: 0 kB
>>>>>>> Private_Clean: 0 kB
>>>>>>> Private_Dirty: 180152 kB
>>>>>>> Referenced: 120472 kB
>>>>>>> Anonymous: 180152 kB
>>>>>>> AnonHugePages: 0 kB
>>>>>>> Swap: 91568 kB
>>>>>>> KernelPageSize: 4 kB
>>>>>>> MMUPageSize: 4 kB
>>>>>>> Locked: 0 kB
>>>>>>> VmFlags: rd wr mr mw me ac
>>>>>>>
>>>>>>>
>>>>>>> 20140423
>>>>>>> 7f92e1578000-7f92f305e000 rw-p 00000000 00:00 0
>>>>>>> [heap]
>>>>>>> Size: 289688 kB
>>>>>>> Rss: 184136 kB
>>>>>>> Pss: 184136 kB
>>>>>>> Shared_Clean: 0 kB
>>>>>>> Shared_Dirty: 0 kB
>>>>>>> Private_Clean: 0 kB
>>>>>>> Private_Dirty: 184136 kB
>>>>>>> Referenced: 69748 kB
>>>>>>> Anonymous: 184136 kB
>>>>>>> AnonHugePages: 0 kB
>>>>>>> Swap: 103112 kB
>>>>>>> KernelPageSize: 4 kB
>>>>>>> MMUPageSize: 4 kB
>>>>>>> Locked: 0 kB
>>>>>>> VmFlags: rd wr mr mw me ac
>>>>>>>
>>>>>>> 20140430
>>>>>>> 7f92e1578000-7f92fc01d000 rw-p 00000000 00:00 0
>>>>>>> [heap]
>>>>>>> Size: 436884 kB
>>>>>>> Rss: 140812 kB
>>>>>>> Pss: 140812 kB
>>>>>>> Shared_Clean: 0 kB
>>>>>>> Shared_Dirty: 0 kB
>>>>>>> Private_Clean: 744 kB
>>>>>>> Private_Dirty: 140068 kB
>>>>>>> Referenced: 43600 kB
>>>>>>> Anonymous: 140812 kB
>>>>>>> AnonHugePages: 0 kB
>>>>>>> Swap: 287392 kB
>>>>>>> KernelPageSize: 4 kB
>>>>>>> MMUPageSize: 4 kB
>>>>>>> Locked: 0 kB
>>>>>>> VmFlags: rd wr mr mw me ac
>>>>>>>
>>>>>>> I noticed in the release notes for 1.1.10-rc1
>>>>>>>
>>>>>>>
>>>>>>>(https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1
>>>>>>>.1
>>>>>>> 0-r
>>>>>>> c1) that there was work done to fix "crmd: lrmd: stonithd: fixed
>>>>>>> memory
>>>>>>> leaks² but I¹m not sure which particular bug this was related to.
>>>>>>> (And
>>>>>>> those fixes should be in the version I¹m running anyway).
>>>>>>>
>>>>>>> I¹ve also spotted a few memory leak fixes in
>>>>>>> https://github.com/beekhof/pacemaker, but I¹m not sure whether they
>>>>>>> relate to my issue (assuming I have a memory leak and this isn¹t
>>>>>>> expected behaviour).
>>>>>>>
>>>>>>> Is there additional debugging that I can perform to check whether I
>>>>>>> have a leak, or is there enough evidence to justify upgrading to
>>>>>>> 1.1.11?
>>>>>>>
>>>>>>> Thanks in advance
>>>>>>>
>>>>>>> Greg Murphy
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started:
>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>
>>>>> <lrmd.tgz>_______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>
>> <lrmd-dbg.tgz>_______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
More information about the Pacemaker
mailing list