[Pacemaker] Pacemaker still may include memory leaks
Andrew Beekhof
andrew at beekhof.net
Wed May 29 08:01:34 UTC 2013
On 28/05/2013, at 4:30 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
>
> On 28/05/2013, at 10:12 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
>
>>
>> On 27/05/2013, at 5:08 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>
>>> 27.05.2013 04:20, Yuichi SEINO wrote:
>>>> Hi,
>>>>
>>>> 2013/5/24 Vladislav Bogdanov <bubble at hoster-ok.com>:
>>>>> 24.05.2013 06:34, Andrew Beekhof wrote:
>>>>>> Any help figuring out where the leaks might be would be very much appreciated :)
>>>>>
>>>>> One (and the only) suspect is unfortunately crmd itself.
>>>>> It has private heap grown from 2708 to 3680 kB.
>>>>>
>>>>> All other relevant differences are in qb shm buffers, which are
>>>>> controlled and may grow until they reach configured size.
>>>>>
>>>>> @Yuichi
>>>>> I would recommend to try running under valgrind on a testing cluster to
>>>>> figure out is that a memleak (lost memory) or some history data
>>>>> (referenced memory). Latter may be a logical memleak though. You may
>>>>> look in /etc/sysconfig/pacemaker for details.
>>>>
>>>> I got valgrind for about 2 days. And, I attached valgrind in ACT node
>>>> and SBY node.
>>>
>>>
>>> I do not see any "direct" memory leaks (repeating 'definitely-lost'
>>> allocations) there.
>>>
>>> So what we see is probably one of:
>>> * Cache/history/etc, which grows up to some limit (or expired at the
>>> some point in time).
>>> * Unlimited/not-expirable lists/hashes of data structures, which are
>>> correctly freed at exit
>>
>> There is still plenty of memory chunks not free'd at exit, I'm slowly working through those.
>
> I've pushed the following to my repo:
>
> + Andrew Beekhof (2 hours ago) d070092: Test: More glib suppressions
> + Andrew Beekhof (2 hours ago) ec74bf0: Fix: Fencing: Ensure API object is consistently free'd
> + Andrew Beekhof (2 hours ago) 6130d23: Fix: Free additional memory at exit
> + Andrew Beekhof (2 hours ago) b76d6be: Refactor: crmd: Allocate a mainloop before doing anything to help valgrind
> + Andrew Beekhof (3 hours ago) d4041de: Log: init: Remove unnecessary detail from shutdown message
> + Andrew Beekhof (3 hours ago) 282032b: Fix: Clean up internal mainloop structures at exit
> + Andrew Beekhof (4 hours ago) 0947721: Fix: Core: Correctly unreference GSource inputs
> + Andrew Beekhof (25 hours ago) d94140d: Fix: crmd: Clean up more memory before exit
> + Andrew Beekhof (25 hours ago) b44257c: Test: cman: Ignore additional valgrind errors
>
> If someone would like to run the cluster (no valgrind needed) for a while with
>
> export PCMK_trace_functions=mainloop_gio_destroy,mainloop_add_fd,mainloop_del_fd,crmd_exit,crm_peer_destroy,empty_uuid_cache,lrm_state_destroy_all,internal_lrm_state_destroy,do_stop,mainloop_destroy_trigger,mainloop_setup_trigger,do_startup,stonith_api_delete
>
> and then (after grabbing smaps) shut it down, we should have some information about any lists/hashes that are growing too large.
>
> Also, be sure to run with:
>
> export G_SLICE=always-malloc
>
> which will prevent glib from accumulating pools of memory and distorting any results.
I did this today with 2747e25 and it looks to me like there is no leak (anymore?)
For context, between smaps.5 and smaps.6, the 4 node cluster ran over 120 "standby" tests (lots of PE runs and resource activity).
So unless someone can show me otherwise, I'm going to move on :)
Note that the [heap] changes are actually the memory usage going _backwards_.
Raw results below.
[root at corosync-host-1 ~]# cat /proc/`pidof crmd`/smaps > smaps.6 ; diff -u smaps.5 smaps.6;
--- smaps.5 2013-05-29 02:39:25.032940230 -0400
+++ smaps.6 2013-05-29 03:48:51.278940819 -0400
@@ -40,16 +40,16 @@
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
-0226b000-02517000 rw-p 00000000 00:00 0 [heap]
-Size: 2736 kB
-Rss: 2268 kB
-Pss: 2268 kB
+0226b000-02509000 rw-p 00000000 00:00 0 [heap]
+Size: 2680 kB
+Rss: 2212 kB
+Pss: 2212 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
-Private_Dirty: 2268 kB
-Referenced: 2268 kB
-Anonymous: 2268 kB
+Private_Dirty: 2212 kB
+Referenced: 2212 kB
+Anonymous: 2212 kB
AnonHugePages: 0 kB
Swap: 0 kB
KernelPageSize: 4 kB
@@ -112,13 +112,13 @@
MMUPageSize: 4 kB
7f0c6e918000-7f0c6ee18000 rw-s 00000000 00:10 522579 /dev/shm/qb-pengine-event-27411-27412-6-data
Size: 5120 kB
-Rss: 3572 kB
-Pss: 1785 kB
+Rss: 4936 kB
+Pss: 2467 kB
Shared_Clean: 0 kB
-Shared_Dirty: 3572 kB
+Shared_Dirty: 4936 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
-Referenced: 3572 kB
+Referenced: 4936 kB
Anonymous: 0 kB
AnonHugePages: 0 kB
Swap: 0 kB
@@ -841,7 +841,7 @@
7f0c72b00000-7f0c72b1d000 r-xp 00000000 fd:00 119 /lib64/libselinux.so.1
Size: 116 kB
Rss: 36 kB
-Pss: 5 kB
+Pss: 4 kB
Shared_Clean: 36 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
@@ -1401,7 +1401,7 @@
7f0c740c6000-7f0c74250000 r-xp 00000000 fd:00 45 /lib64/libc-2.12.so
Size: 1576 kB
Rss: 588 kB
-Pss: 20 kB
+Pss: 19 kB
Shared_Clean: 588 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
>
>
>> Once we know all memory is being cleaned up, the next step is to check the size of things beforehand.
>>
>> I'm hoping one or more of them show up as unnaturally large, indicating things are being added but not removed.
>>
>>> (f.e like dlm_controld has(had???) for a
>>> debugging buffer or like glibc resolver had in EL3). This cannot be
>>> caught with valgrind if you use it in a standard way.
>>>
>>> I believe we have former one. To prove that, it would be very
>>> interesting to run under valgrind *debugger* (--vgdb=yes|full) for some
>>> long enough (2-3 weeks) period of time and periodically get memory
>>> allocation state from there (with 'monitor leak_check full reachable
>>> any' gdb command). I wanted to do that a long time ago, but
>>> unfortunately did not have enough spare time to even try that (although
>>> I tried to valgrind other programs that way).
>>>
>>> This is described in valgrind documentation:
>>> http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver
>>>
>>> We probably do not need to specify '--vgdb-error=0' because we do not
>>> need to install watchpoints at the start (and we do not need/want to
>>> immediately connect to crmd with gdb to tell it to continue), we just
>>> need to periodically get status of memory allocations
>>> (stop-leak_check-cont sequence). Probably that should be done in a
>>> 'fast' manner, so crmd does not stop for a long time, and the rest of
>>> pacemaker does not see it 'hanged'. Again, I did not try that, and I do
>>> not know if it's even possible to do that with crmd.
>>>
>>> And, as pacemaker heavily utilizes glib, which has own memory allocator
>>> (slices), it is better to switch it to a 'standard' malloc/free for
>>> debugging with G_SLICE=always-malloc env var.
>>>
>>> Last, I did memleak checks for a 'static' (i.e. no operations except
>>> monitors are performed) cluster for ~1.1.8, and did not find any. It
>>> would be interesting to see if that is true for an 'active' one, which
>>> starts/stops resources, handles failures, etc.
>>>
>>>>
>>>> Sincerely,
>>>> Yuichi
>>>>
>>>>>
>>>>>>
>>>>>> Also, the measurements are in pages... could you run "getconf PAGESIZE" and let us know the result?
>>>>>> I'm guessing 4096 bytes.
>>>>>>
>>>>>> On 23/05/2013, at 5:47 PM, Yuichi SEINO <seino.cluster2 at gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I retry the test after we updated packages to the latest tag and OS.
>>>>>>> glue and booth is latest.
>>>>>>>
>>>>>>> * Environment
>>>>>>> OS:RHEL 6.4
>>>>>>> cluster-glue:latest(commit:2755:8347e8c9b94f) +
>>>>>>> patch[detail:http://www.gossamer-threads.com/lists/linuxha/dev/85787]
>>>>>>> resource-agent:v3.9.5
>>>>>>> libqb:v0.14.4
>>>>>>> corosync:v2.3.0
>>>>>>> pacemaker:v1.1.10-rc2
>>>>>>> crmsh:v1.2.5
>>>>>>> booth:latest(commit:67e1208973de728958432aaba165766eac1ce3a0)
>>>>>>>
>>>>>>> * Test procedure
>>>>>>> we regularly switch a ticket. The previous test also used the same way.
>>>>>>> And, There was no a memory leak when we tested pacemaker-1.1 before
>>>>>>> pacemaker use libqb.
>>>>>>>
>>>>>>> * Result
>>>>>>> As a result, I think that crmd may cause the memory leak.
>>>>>>>
>>>>>>> crmd smaps(a total of each addresses)
>>>>>>> In detail, we attached smaps of start and end. And, I recorded smaps
>>>>>>> every 1 minutes.
>>>>>>>
>>>>>>> Start
>>>>>>> RSS: 7396
>>>>>>> SHR(Shared_Clean+Shared_Dirty):3560
>>>>>>> Private(Private_Clean+Private_Dirty):3836
>>>>>>>
>>>>>>> Interbal(about 30h later)
>>>>>>> RSS:18464
>>>>>>> SHR:14276
>>>>>>> Private:4188
>>>>>>>
>>>>>>> End(about 70h later)
>>>>>>> RSS:19104
>>>>>>> SHR:14336
>>>>>>> Private:4768
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Yuichi
>>>>>>>
>>>>>>> 2013/5/15 Yuichi SEINO <seino.cluster2 at gmail.com>:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I ran the test for about two days.
>>>>>>>>
>>>>>>>> Environment
>>>>>>>>
>>>>>>>> OS:RHEL 6.3
>>>>>>>> pacemaker-1.1.9-devel (commit 138556cb0b375a490a96f35e7fbeccc576a22011)
>>>>>>>> corosync-2.3.0
>>>>>>>> cluster-glue latest+patch(detail:http://www.gossamer-threads.com/lists/linuxha/dev/85787)
>>>>>>>> libqb- 0.14.4
>>>>>>>>
>>>>>>>> There may be a memory leak in crmd and lrmd. I regularly got rss of ps.
>>>>>>>>
>>>>>>>> start-up
>>>>>>>> crmd:5332
>>>>>>>> lrmd:3625
>>>>>>>>
>>>>>>>> interval(about 30h later)
>>>>>>>> crmd:7716
>>>>>>>> lrmd:3744
>>>>>>>>
>>>>>>>> ending(about 60h later)
>>>>>>>> crmd:8336
>>>>>>>> lrmd:3780
>>>>>>>>
>>>>>>>> I still don't run a test that pacemaker-1.1.10-rc2 use. So, I will run its test.
>>>>>>>>
>>>>>>>> Sincerely,
>>>>>>>> Yuichi
>>>>>>>>
>>>>>>>> --
>>>>>>>> Yuichi SEINO
>>>>>>>> METROSYSTEMS CORPORATION
>>>>>>>> E-mail:seino.cluster2 at gmail.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Yuichi SEINO
>>>>>>> METROSYSTEMS CORPORATION
>>>>>>> E-mail:seino.cluster2 at gmail.com
>>>>>>> <smaps_log.tar.gz>_______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>> --
>>>> Yuichi SEINO
>>>> METROSYSTEMS CORPORATION
>>>> E-mail:seino.cluster2 at gmail.com
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>
More information about the Pacemaker
mailing list