[Pacemaker] Pacemaker still may include memory leaks

Yuichi SEINO seino.cluster2 at gmail.com
Tue Jun 4 10:26:17 UTC 2013


2013/6/4 Andrew Beekhof <andrew at beekhof.net>:
>
> On 03/06/2013, at 8:55 PM, Yuichi SEINO <seino.cluster2 at gmail.com> wrote:
>
>> Hi,
>>
>> I run the test after we updated pacemaker.
>>
>> I tested the same way as the previous test. However, I think that the
>> memory leak still may be caused.
>>
>> I attached the result(smaps and crm_mon and env). And, I also make the
>> chart of the total of each address.
>> RSS and SHR(Shared_Clean+Shared_Dirty) and PRI(Private_Clean+Private_Dirty)
>>
>> The change of PRI is [heap], because the difference of  Private_Dirty
>> is only [heap] and there is no the difference of Private_Clean.
>>
>>>> --- smaps.5     2013-05-29 02:39:25.032940230 -0400
>>>> +++ smaps.6     2013-05-29 03:48:51.278940819 -0400
>>
>> I think that your test is about 1h. However, there are intervals that
>> the size of memory doesn't change when I tested.
>> There are intervals over 1h in those intervals.
>>
>> The change of PRI
>> ...
>> Time:2013/5/30 12:28 PRI:3740
>> ...
>> Time:2013/5/30 14:16 PRI:3740
>> ...
>>
>> And, There is the part that the size of memory fluctuate a little in.
>> However, as a whole,
>> the size of memory continues to increase.
>>
>> The change of PRI
>> ...
>> Time:2013/5/30 17:51 PRI:3792
>
> Ok, so what happened at this time?  Logs?
>
> There is no timer in pacemaker that runs this long (and the 1 hour of my test was equivalent to a few months in real life).

I attached the log to bugzilla because the size of log is big.
http://bugs.clusterlabs.org/show_bug.cgi?id=5161

Sincerely,
Yuichi

>
>> ...
>> Time:2013/5/30 17:53 PRI:3844
>> ...
>> Time:2013/5/30 17:55 PRI:3792
>> ...
>>
>> Perhaps, the difference of the resource structure and the test way
>> affect the result.
>> I want to run the same test as you. Would you tell me about the detail of test?
>
> I ran cts with:
>
>   cts clean run --stack cman --stonith rhevm --ip 11.0.0.1 --choose Standby 500
>
> Your stonith would be different though.
>
>>
>> Sincerely,
>> Yuichi
>>
>> 2013/5/29 Yuichi SEINO <seino.cluster2 at gmail.com>:
>>> 2013/5/29 Andrew Beekhof <andrew at beekhof.net>:
>>>>
>>>> On 28/05/2013, at 4:30 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
>>>>
>>>>>
>>>>> On 28/05/2013, at 10:12 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
>>>>>
>>>>>>
>>>>>> On 27/05/2013, at 5:08 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>>>>>
>>>>>>> 27.05.2013 04:20, Yuichi SEINO wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> 2013/5/24 Vladislav Bogdanov <bubble at hoster-ok.com>:
>>>>>>>>> 24.05.2013 06:34, Andrew Beekhof wrote:
>>>>>>>>>> Any help figuring out where the leaks might be would be very much appreciated :)
>>>>>>>>>
>>>>>>>>> One (and the only) suspect is unfortunately crmd itself.
>>>>>>>>> It has private heap grown from 2708 to 3680 kB.
>>>>>>>>>
>>>>>>>>> All other relevant differences are in qb shm buffers, which are
>>>>>>>>> controlled and may grow until they reach configured size.
>>>>>>>>>
>>>>>>>>> @Yuichi
>>>>>>>>> I would recommend to try running under valgrind on a testing cluster to
>>>>>>>>> figure out is that a memleak (lost memory) or some history data
>>>>>>>>> (referenced memory). Latter may be a logical memleak though. You may
>>>>>>>>> look in /etc/sysconfig/pacemaker for details.
>>>>>>>>
>>>>>>>> I got valgrind for about 2 days. And, I attached valgrind in ACT node
>>>>>>>> and SBY node.
>>>>>>>
>>>>>>>
>>>>>>> I do not see any "direct" memory leaks (repeating 'definitely-lost'
>>>>>>> allocations) there.
>>>>>>>
>>>>>>> So what we see is probably one of:
>>>>>>> * Cache/history/etc, which grows up to some limit (or expired at the
>>>>>>> some point in time).
>>>>>>> * Unlimited/not-expirable lists/hashes of data structures, which are
>>>>>>> correctly freed at exit
>>>>>>
>>>>>> There is still plenty of memory chunks not free'd at exit, I'm slowly working through those.
>>>>>
>>>>> I've pushed the following to my repo:
>>>>>
>>>>> + Andrew Beekhof (2 hours ago) d070092: Test: More glib suppressions
>>>>> + Andrew Beekhof (2 hours ago) ec74bf0: Fix: Fencing: Ensure API object is consistently free'd
>>>>> + Andrew Beekhof (2 hours ago) 6130d23: Fix: Free additional memory at exit
>>>>> + Andrew Beekhof (2 hours ago) b76d6be: Refactor: crmd: Allocate a mainloop before doing anything to help valgrind
>>>>> + Andrew Beekhof (3 hours ago) d4041de: Log: init: Remove unnecessary detail from shutdown message
>>>>> + Andrew Beekhof (3 hours ago) 282032b: Fix: Clean up internal mainloop structures at exit
>>>>> + Andrew Beekhof (4 hours ago) 0947721: Fix: Core: Correctly unreference GSource inputs
>>>>> + Andrew Beekhof (25 hours ago) d94140d: Fix: crmd: Clean up more memory before exit
>>>>> + Andrew Beekhof (25 hours ago) b44257c: Test: cman: Ignore additional valgrind errors
>>>>>
>>>>> If someone would like to run the cluster (no valgrind needed) for a while with
>>>>>
>>>>> export PCMK_trace_functions=mainloop_gio_destroy,mainloop_add_fd,mainloop_del_fd,crmd_exit,crm_peer_destroy,empty_uuid_cache,lrm_state_destroy_all,internal_lrm_state_destroy,do_stop,mainloop_destroy_trigger,mainloop_setup_trigger,do_startup,stonith_api_delete
>>>>>
>>>>> and then (after grabbing smaps) shut it down, we should have some information about any lists/hashes that are growing too large.
>>>>>
>>>>> Also, be sure to run with:
>>>>>
>>>>> export G_SLICE=always-malloc
>>>>>
>>>>> which will prevent glib from accumulating pools of memory and distorting any results.
>>>>
>>>>
>>>> I did this today with 2747e25 and it looks to me like there is no leak (anymore?)
>>>> For context, between smaps.5 and smaps.6, the 4 node cluster ran over 120 "standby" tests (lots of PE runs and resource activity).
>>>> So unless someone can show me otherwise, I'm going to move on :)
>>>
>>> I see. I also want to test a leak. I will report the result after the test.
>>>
>>>>
>>>> Note that the [heap] changes are actually the memory usage going _backwards_.
>>>>
>>>> Raw results below.
>>>>
>>>> [root at corosync-host-1 ~]# cat /proc/`pidof crmd`/smaps  > smaps.6 ; diff -u smaps.5 smaps.6;
>>>> --- smaps.5     2013-05-29 02:39:25.032940230 -0400
>>>> +++ smaps.6     2013-05-29 03:48:51.278940819 -0400
>>>> @@ -40,16 +40,16 @@
>>>> Swap:                  0 kB
>>>> KernelPageSize:        4 kB
>>>> MMUPageSize:           4 kB
>>>> -0226b000-02517000 rw-p 00000000 00:00 0                                  [heap]
>>>> -Size:               2736 kB
>>>> -Rss:                2268 kB
>>>> -Pss:                2268 kB
>>>> +0226b000-02509000 rw-p 00000000 00:00 0                                  [heap]
>>>> +Size:               2680 kB
>>>> +Rss:                2212 kB
>>>> +Pss:                2212 kB
>>>> Shared_Clean:          0 kB
>>>> Shared_Dirty:          0 kB
>>>> Private_Clean:         0 kB
>>>> -Private_Dirty:      2268 kB
>>>> -Referenced:         2268 kB
>>>> -Anonymous:          2268 kB
>>>> +Private_Dirty:      2212 kB
>>>> +Referenced:         2212 kB
>>>> +Anonymous:          2212 kB
>>>> AnonHugePages:         0 kB
>>>> Swap:                  0 kB
>>>> KernelPageSize:        4 kB
>>>> @@ -112,13 +112,13 @@
>>>> MMUPageSize:           4 kB
>>>> 7f0c6e918000-7f0c6ee18000 rw-s 00000000 00:10 522579                     /dev/shm/qb-pengine-event-27411-27412-6-data
>>>> Size:               5120 kB
>>>> -Rss:                3572 kB
>>>> -Pss:                1785 kB
>>>> +Rss:                4936 kB
>>>> +Pss:                2467 kB
>>>> Shared_Clean:          0 kB
>>>> -Shared_Dirty:       3572 kB
>>>> +Shared_Dirty:       4936 kB
>>>> Private_Clean:         0 kB
>>>> Private_Dirty:         0 kB
>>>> -Referenced:         3572 kB
>>>> +Referenced:         4936 kB
>>>> Anonymous:             0 kB
>>>> AnonHugePages:         0 kB
>>>> Swap:                  0 kB
>>>> @@ -841,7 +841,7 @@
>>>> 7f0c72b00000-7f0c72b1d000 r-xp 00000000 fd:00 119                        /lib64/libselinux.so.1
>>>> Size:                116 kB
>>>> Rss:                  36 kB
>>>> -Pss:                   5 kB
>>>> +Pss:                   4 kB
>>>> Shared_Clean:         36 kB
>>>> Shared_Dirty:          0 kB
>>>> Private_Clean:         0 kB
>>>> @@ -1401,7 +1401,7 @@
>>>> 7f0c740c6000-7f0c74250000 r-xp 00000000 fd:00 45                         /lib64/libc-2.12.so
>>>> Size:               1576 kB
>>>> Rss:                 588 kB
>>>> -Pss:                  20 kB
>>>> +Pss:                  19 kB
>>>> Shared_Clean:        588 kB
>>>> Shared_Dirty:          0 kB
>>>> Private_Clean:         0 kB
>>>>
>>>>
>>>>>
>>>>>
>>>>>> Once we know all memory is being cleaned up, the next step is to check the size of things beforehand.
>>>>>>
>>>>>> I'm hoping one or more of them show up as unnaturally large, indicating things are being added but not removed.
>>>>>>
>>>>>>> (f.e like dlm_controld has(had???) for a
>>>>>>> debugging buffer or like glibc resolver had in EL3). This cannot be
>>>>>>> caught with valgrind if you use it in a standard way.
>>>>>>>
>>>>>>> I believe we have former one. To prove that, it would be very
>>>>>>> interesting to run under valgrind *debugger* (--vgdb=yes|full) for some
>>>>>>> long enough (2-3 weeks) period of time and periodically get memory
>>>>>>> allocation state from there (with 'monitor leak_check full reachable
>>>>>>> any' gdb command). I wanted to do that a long time ago, but
>>>>>>> unfortunately did not have enough spare time to even try that (although
>>>>>>> I tried to valgrind other programs that way).
>>>>>>>
>>>>>>> This is described in valgrind documentation:
>>>>>>> http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver
>>>>>>>
>>>>>>> We probably do not need to specify '--vgdb-error=0' because we do not
>>>>>>> need to install watchpoints at the start (and we do not need/want to
>>>>>>> immediately connect to crmd with gdb to tell it to continue), we just
>>>>>>> need to periodically get status of memory allocations
>>>>>>> (stop-leak_check-cont sequence). Probably that should be done in a
>>>>>>> 'fast' manner, so crmd does not stop for a long time, and the rest of
>>>>>>> pacemaker does not see it 'hanged'. Again, I did not try that, and I do
>>>>>>> not know if it's even possible to do that with crmd.
>>>>>>>
>>>>>>> And, as pacemaker heavily utilizes glib, which has own memory allocator
>>>>>>> (slices), it is better to switch it to a 'standard' malloc/free for
>>>>>>> debugging with G_SLICE=always-malloc env var.
>>>>>>>
>>>>>>> Last, I did memleak checks for a 'static' (i.e. no operations except
>>>>>>> monitors are performed) cluster for ~1.1.8, and did not find any. It
>>>>>>> would be interesting to see if that is true for an 'active' one, which
>>>>>>> starts/stops resources, handles failures, etc.
>>>>>>>
>>>>>>>>
>>>>>>>> Sincerely,
>>>>>>>> Yuichi
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Also, the measurements are in pages... could you run "getconf PAGESIZE" and let us know the result?
>>>>>>>>>> I'm guessing 4096 bytes.
>>>>>>>>>>
>>>>>>>>>> On 23/05/2013, at 5:47 PM, Yuichi SEINO <seino.cluster2 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I retry the test after we updated packages to the latest tag and OS.
>>>>>>>>>>> glue and booth is latest.
>>>>>>>>>>>
>>>>>>>>>>> * Environment
>>>>>>>>>>> OS:RHEL 6.4
>>>>>>>>>>> cluster-glue:latest(commit:2755:8347e8c9b94f) +
>>>>>>>>>>> patch[detail:http://www.gossamer-threads.com/lists/linuxha/dev/85787]
>>>>>>>>>>> resource-agent:v3.9.5
>>>>>>>>>>> libqb:v0.14.4
>>>>>>>>>>> corosync:v2.3.0
>>>>>>>>>>> pacemaker:v1.1.10-rc2
>>>>>>>>>>> crmsh:v1.2.5
>>>>>>>>>>> booth:latest(commit:67e1208973de728958432aaba165766eac1ce3a0)
>>>>>>>>>>>
>>>>>>>>>>> * Test procedure
>>>>>>>>>>> we regularly switch a ticket. The previous test also used the same way.
>>>>>>>>>>> And, There was no a memory leak when we tested pacemaker-1.1 before
>>>>>>>>>>> pacemaker use libqb.
>>>>>>>>>>>
>>>>>>>>>>> * Result
>>>>>>>>>>> As a result, I think that crmd may cause the memory leak.
>>>>>>>>>>>
>>>>>>>>>>> crmd smaps(a total of each addresses)
>>>>>>>>>>> In detail, we attached smaps of  start and end. And, I recorded smaps
>>>>>>>>>>> every 1 minutes.
>>>>>>>>>>>
>>>>>>>>>>> Start
>>>>>>>>>>> RSS: 7396
>>>>>>>>>>> SHR(Shared_Clean+Shared_Dirty):3560
>>>>>>>>>>> Private(Private_Clean+Private_Dirty):3836
>>>>>>>>>>>
>>>>>>>>>>> Interbal(about 30h later)
>>>>>>>>>>> RSS:18464
>>>>>>>>>>> SHR:14276
>>>>>>>>>>> Private:4188
>>>>>>>>>>>
>>>>>>>>>>> End(about 70h later)
>>>>>>>>>>> RSS:19104
>>>>>>>>>>> SHR:14336
>>>>>>>>>>> Private:4768
>>>>>>>>>>>
>>>>>>>>>>> Sincerely,
>>>>>>>>>>> Yuichi
>>>>>>>>>>>
>>>>>>>>>>> 2013/5/15 Yuichi SEINO <seino.cluster2 at gmail.com>:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I ran the test for about two days.
>>>>>>>>>>>>
>>>>>>>>>>>> Environment
>>>>>>>>>>>>
>>>>>>>>>>>> OS:RHEL 6.3
>>>>>>>>>>>> pacemaker-1.1.9-devel (commit 138556cb0b375a490a96f35e7fbeccc576a22011)
>>>>>>>>>>>> corosync-2.3.0
>>>>>>>>>>>> cluster-glue latest+patch(detail:http://www.gossamer-threads.com/lists/linuxha/dev/85787)
>>>>>>>>>>>> libqb- 0.14.4
>>>>>>>>>>>>
>>>>>>>>>>>> There may be a memory leak in crmd and lrmd. I regularly got rss of ps.
>>>>>>>>>>>>
>>>>>>>>>>>> start-up
>>>>>>>>>>>> crmd:5332
>>>>>>>>>>>> lrmd:3625
>>>>>>>>>>>>
>>>>>>>>>>>> interval(about 30h later)
>>>>>>>>>>>> crmd:7716
>>>>>>>>>>>> lrmd:3744
>>>>>>>>>>>>
>>>>>>>>>>>> ending(about 60h later)
>>>>>>>>>>>> crmd:8336
>>>>>>>>>>>> lrmd:3780
>>>>>>>>>>>>
>>>>>>>>>>>> I still don't run a test that pacemaker-1.1.10-rc2 use. So, I will run its test.
>>>>>>>>>>>>
>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>> Yuichi
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Yuichi SEINO
>>>>>>>>>>>> METROSYSTEMS CORPORATION
>>>>>>>>>>>> E-mail:seino.cluster2 at gmail.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Yuichi SEINO
>>>>>>>>>>> METROSYSTEMS CORPORATION
>>>>>>>>>>> E-mail:seino.cluster2 at gmail.com
>>>>>>>>>>> <smaps_log.tar.gz>_______________________________________________
>>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>>
>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>
>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>
>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Yuichi SEINO
>>>>>>>> METROSYSTEMS CORPORATION
>>>>>>>> E-mail:seino.cluster2 at gmail.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>> --
>>> Yuichi SEINO
>>> METROSYSTEMS CORPORATION
>>> E-mail:seino.cluster2 at gmail.com
>>
>> --
>> Yuichi SEINO
>> METROSYSTEMS CORPORATION
>> E-mail:seino.cluster2 at gmail.com
>> <test_info.tar.bz>_______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



--
Yuichi SEINO
METROSYSTEMS CORPORATION
E-mail:seino.cluster2 at gmail.com




More information about the Pacemaker mailing list