[Pacemaker] Pacemaker still may include memory leaks

Mon Jul 22 00:28:57 EDT 2013

Hi, Andrew

seino since left the project, I took over this problem.

Since I wrote the comment to bugzilla, I want you to check.
http://bugs.clusterlabs.org/show_bug.cgi?id=5161#c10

Regards,
Yusuke

2013/6/4 Yuichi SEINO <seino.cluster2 at gmail.com>:
> 2013/6/4 Andrew Beekhof <andrew at beekhof.net>:
>>
>> On 03/06/2013, at 8:55 PM, Yuichi SEINO <seino.cluster2 at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I run the test after we updated pacemaker.
>>>
>>> I tested the same way as the previous test. However, I think that the
>>> memory leak still may be caused.
>>>
>>> I attached the result(smaps and crm_mon and env). And, I also make the
>>> chart of the total of each address.
>>> RSS and SHR(Shared_Clean+Shared_Dirty) and PRI(Private_Clean+Private_Dirty)
>>>
>>> The change of PRI is [heap], because the difference of  Private_Dirty
>>> is only [heap] and there is no the difference of Private_Clean.
>>>
>>>>> --- smaps.5     2013-05-29 02:39:25.032940230 -0400
>>>>> +++ smaps.6     2013-05-29 03:48:51.278940819 -0400
>>>
>>> I think that your test is about 1h. However, there are intervals that
>>> the size of memory doesn't change when I tested.
>>> There are intervals over 1h in those intervals.
>>>
>>> The change of PRI
>>> ...
>>> Time:2013/5/30 12:28 PRI:3740
>>> ...
>>> Time:2013/5/30 14:16 PRI:3740
>>> ...
>>>
>>> And, There is the part that the size of memory fluctuate a little in.
>>> However, as a whole,
>>> the size of memory continues to increase.
>>>
>>> The change of PRI
>>> ...
>>> Time:2013/5/30 17:51 PRI:3792
>>
>> Ok, so what happened at this time?  Logs?
>>
>> There is no timer in pacemaker that runs this long (and the 1 hour of my test was equivalent to a few months in real life).
>
> I attached the log to bugzilla because the size of log is big.
> http://bugs.clusterlabs.org/show_bug.cgi?id=5161
>
> Sincerely,
> Yuichi
>
>>
>>> ...
>>> Time:2013/5/30 17:53 PRI:3844
>>> ...
>>> Time:2013/5/30 17:55 PRI:3792
>>> ...
>>>
>>> Perhaps, the difference of the resource structure and the test way
>>> affect the result.
>>> I want to run the same test as you. Would you tell me about the detail of test?
>>
>> I ran cts with:
>>
>>   cts clean run --stack cman --stonith rhevm --ip 11.0.0.1 --choose Standby 500
>>
>> Your stonith would be different though.
>>
>>>
>>> Sincerely,
>>> Yuichi
>>>
>>> 2013/5/29 Yuichi SEINO <seino.cluster2 at gmail.com>:
>>>> 2013/5/29 Andrew Beekhof <andrew at beekhof.net>:
>>>>>
>>>>> On 28/05/2013, at 4:30 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
>>>>>
>>>>>>
>>>>>> On 28/05/2013, at 10:12 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
>>>>>>
>>>>>>>
>>>>>>> On 27/05/2013, at 5:08 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>>>>>>
>>>>>>>> 27.05.2013 04:20, Yuichi SEINO wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> 2013/5/24 Vladislav Bogdanov <bubble at hoster-ok.com>:
>>>>>>>>>> 24.05.2013 06:34, Andrew Beekhof wrote:
>>>>>>>>>>> Any help figuring out where the leaks might be would be very much appreciated :)
>>>>>>>>>>
>>>>>>>>>> One (and the only) suspect is unfortunately crmd itself.
>>>>>>>>>> It has private heap grown from 2708 to 3680 kB.
>>>>>>>>>>
>>>>>>>>>> All other relevant differences are in qb shm buffers, which are
>>>>>>>>>> controlled and may grow until they reach configured size.
>>>>>>>>>>
>>>>>>>>>> @Yuichi
>>>>>>>>>> I would recommend to try running under valgrind on a testing cluster to
>>>>>>>>>> figure out is that a memleak (lost memory) or some history data
>>>>>>>>>> (referenced memory). Latter may be a logical memleak though. You may
>>>>>>>>>> look in /etc/sysconfig/pacemaker for details.
>>>>>>>>>
>>>>>>>>> I got valgrind for about 2 days. And, I attached valgrind in ACT node
>>>>>>>>> and SBY node.
>>>>>>>>
>>>>>>>>
>>>>>>>> I do not see any "direct" memory leaks (repeating 'definitely-lost'
>>>>>>>> allocations) there.
>>>>>>>>
>>>>>>>> So what we see is probably one of:
>>>>>>>> * Cache/history/etc, which grows up to some limit (or expired at the
>>>>>>>> some point in time).
>>>>>>>> * Unlimited/not-expirable lists/hashes of data structures, which are
>>>>>>>> correctly freed at exit
>>>>>>>
>>>>>>> There is still plenty of memory chunks not free'd at exit, I'm slowly working through those.
>>>>>>
>>>>>> I've pushed the following to my repo:
>>>>>>
>>>>>> + Andrew Beekhof (2 hours ago) d070092: Test: More glib suppressions
>>>>>> + Andrew Beekhof (2 hours ago) ec74bf0: Fix: Fencing: Ensure API object is consistently free'd
>>>>>> + Andrew Beekhof (2 hours ago) 6130d23: Fix: Free additional memory at exit
>>>>>> + Andrew Beekhof (2 hours ago) b76d6be: Refactor: crmd: Allocate a mainloop before doing anything to help valgrind
>>>>>> + Andrew Beekhof (3 hours ago) d4041de: Log: init: Remove unnecessary detail from shutdown message
>>>>>> + Andrew Beekhof (3 hours ago) 282032b: Fix: Clean up internal mainloop structures at exit
>>>>>> + Andrew Beekhof (4 hours ago) 0947721: Fix: Core: Correctly unreference GSource inputs
>>>>>> + Andrew Beekhof (25 hours ago) d94140d: Fix: crmd: Clean up more memory before exit
>>>>>> + Andrew Beekhof (25 hours ago) b44257c: Test: cman: Ignore additional valgrind errors
>>>>>>
>>>>>> If someone would like to run the cluster (no valgrind needed) for a while with
>>>>>>
>>>>>> export PCMK_trace_functions=mainloop_gio_destroy,mainloop_add_fd,mainloop_del_fd,crmd_exit,crm_peer_destroy,empty_uuid_cache,lrm_state_destroy_all,internal_lrm_state_destroy,do_stop,mainloop_destroy_trigger,mainloop_setup_trigger,do_startup,stonith_api_delete
>>>>>>
>>>>>> and then (after grabbing smaps) shut it down, we should have some information about any lists/hashes that are growing too large.
>>>>>>
>>>>>> Also, be sure to run with:
>>>>>>
>>>>>> export G_SLICE=always-malloc
>>>>>>
>>>>>> which will prevent glib from accumulating pools of memory and distorting any results.
>>>>>
>>>>>
>>>>> I did this today with 2747e25 and it looks to me like there is no leak (anymore?)
>>>>> For context, between smaps.5 and smaps.6, the 4 node cluster ran over 120 "standby" tests (lots of PE runs and resource activity).
>>>>> So unless someone can show me otherwise, I'm going to move on :)
>>>>
>>>> I see. I also want to test a leak. I will report the result after the test.
>>>>
>>>>>
>>>>> Note that the [heap] changes are actually the memory usage going _backwards_.
>>>>>
>>>>> Raw results below.
>>>>>
>>>>> [root at corosync-host-1 ~]# cat /proc/`pidof crmd`/smaps  > smaps.6 ; diff -u smaps.5 smaps.6;
>>>>> --- smaps.5     2013-05-29 02:39:25.032940230 -0400
>>>>> +++ smaps.6     2013-05-29 03:48:51.278940819 -0400
>>>>> @@ -40,16 +40,16 @@
>>>>> Swap:                  0 kB
>>>>> KernelPageSize:        4 kB
>>>>> MMUPageSize:           4 kB
>>>>> -0226b000-02517000 rw-p 00000000 00:00 0                                  [heap]
>>>>> -Size:               2736 kB
>>>>> -Rss:                2268 kB
>>>>> -Pss:                2268 kB
>>>>> +0226b000-02509000 rw-p 00000000 00:00 0                                  [heap]
>>>>> +Size:               2680 kB
>>>>> +Rss:                2212 kB
>>>>> +Pss:                2212 kB
>>>>> Shared_Clean:          0 kB
>>>>> Shared_Dirty:          0 kB
>>>>> Private_Clean:         0 kB
>>>>> -Private_Dirty:      2268 kB
>>>>> -Referenced:         2268 kB
>>>>> -Anonymous:          2268 kB
>>>>> +Private_Dirty:      2212 kB
>>>>> +Referenced:         2212 kB
>>>>> +Anonymous:          2212 kB
>>>>> AnonHugePages:         0 kB
>>>>> Swap:                  0 kB
>>>>> KernelPageSize:        4 kB
>>>>> @@ -112,13 +112,13 @@
>>>>> MMUPageSize:           4 kB
>>>>> 7f0c6e918000-7f0c6ee18000 rw-s 00000000 00:10 522579                     /dev/shm/qb-pengine-event-27411-27412-6-data
>>>>> Size:               5120 kB
>>>>> -Rss:                3572 kB
>>>>> -Pss:                1785 kB
>>>>> +Rss:                4936 kB
>>>>> +Pss:                2467 kB
>>>>> Shared_Clean:          0 kB
>>>>> -Shared_Dirty:       3572 kB
>>>>> +Shared_Dirty:       4936 kB
>>>>> Private_Clean:         0 kB
>>>>> Private_Dirty:         0 kB
>>>>> -Referenced:         3572 kB
>>>>> +Referenced:         4936 kB
>>>>> Anonymous:             0 kB
>>>>> AnonHugePages:         0 kB
>>>>> Swap:                  0 kB
>>>>> @@ -841,7 +841,7 @@
>>>>> 7f0c72b00000-7f0c72b1d000 r-xp 00000000 fd:00 119                        /lib64/libselinux.so.1
>>>>> Size:                116 kB
>>>>> Rss:                  36 kB
>>>>> -Pss:                   5 kB
>>>>> +Pss:                   4 kB
>>>>> Shared_Clean:         36 kB
>>>>> Shared_Dirty:          0 kB
>>>>> Private_Clean:         0 kB
>>>>> @@ -1401,7 +1401,7 @@
>>>>> 7f0c740c6000-7f0c74250000 r-xp 00000000 fd:00 45                         /lib64/libc-2.12.so
>>>>> Size:               1576 kB
>>>>> Rss:                 588 kB
>>>>> -Pss:                  20 kB
>>>>> +Pss:                  19 kB
>>>>> Shared_Clean:        588 kB
>>>>> Shared_Dirty:          0 kB
>>>>> Private_Clean:         0 kB
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> Once we know all memory is being cleaned up, the next step is to check the size of things beforehand.
>>>>>>>
>>>>>>> I'm hoping one or more of them show up as unnaturally large, indicating things are being added but not removed.
>>>>>>>
>>>>>>>> (f.e like dlm_controld has(had???) for a
>>>>>>>> debugging buffer or like glibc resolver had in EL3). This cannot be
>>>>>>>> caught with valgrind if you use it in a standard way.
>>>>>>>>
>>>>>>>> I believe we have former one. To prove that, it would be very
>>>>>>>> interesting to run under valgrind *debugger* (--vgdb=yes|full) for some
>>>>>>>> long enough (2-3 weeks) period of time and periodically get memory
>>>>>>>> allocation state from there (with 'monitor leak_check full reachable
>>>>>>>> any' gdb command). I wanted to do that a long time ago, but
>>>>>>>> unfortunately did not have enough spare time to even try that (although
>>>>>>>> I tried to valgrind other programs that way).
>>>>>>>>
>>>>>>>> This is described in valgrind documentation:
>>>>>>>> http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver
>>>>>>>>
>>>>>>>> We probably do not need to specify '--vgdb-error=0' because we do not
>>>>>>>> need to install watchpoints at the start (and we do not need/want to
>>>>>>>> immediately connect to crmd with gdb to tell it to continue), we just
>>>>>>>> need to periodically get status of memory allocations
>>>>>>>> (stop-leak_check-cont sequence). Probably that should be done in a
>>>>>>>> 'fast' manner, so crmd does not stop for a long time, and the rest of
>>>>>>>> pacemaker does not see it 'hanged'. Again, I did not try that, and I do
>>>>>>>> not know if it's even possible to do that with crmd.
>>>>>>>>
>>>>>>>> And, as pacemaker heavily utilizes glib, which has own memory allocator
>>>>>>>> (slices), it is better to switch it to a 'standard' malloc/free for
>>>>>>>> debugging with G_SLICE=always-malloc env var.
>>>>>>>>
>>>>>>>> Last, I did memleak checks for a 'static' (i.e. no operations except
>>>>>>>> monitors are performed) cluster for ~1.1.8, and did not find any. It
>>>>>>>> would be interesting to see if that is true for an 'active' one, which
>>>>>>>> starts/stops resources, handles failures, etc.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sincerely,
>>>>>>>>> Yuichi
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Also, the measurements are in pages... could you run "getconf PAGESIZE" and let us know the result?
>>>>>>>>>>> I'm guessing 4096 bytes.
>>>>>>>>>>>
>>>>>>>>>>> On 23/05/2013, at 5:47 PM, Yuichi SEINO <seino.cluster2 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I retry the test after we updated packages to the latest tag and OS.
>>>>>>>>>>>> glue and booth is latest.
>>>>>>>>>>>>
>>>>>>>>>>>> * Environment
>>>>>>>>>>>> OS:RHEL 6.4
>>>>>>>>>>>> cluster-glue:latest(commit:2755:8347e8c9b94f) +
>>>>>>>>>>>> patch[detail:http://www.gossamer-threads.com/lists/linuxha/dev/85787]
>>>>>>>>>>>> resource-agent:v3.9.5
>>>>>>>>>>>> libqb:v0.14.4
>>>>>>>>>>>> corosync:v2.3.0
>>>>>>>>>>>> pacemaker:v1.1.10-rc2
>>>>>>>>>>>> crmsh:v1.2.5
>>>>>>>>>>>> booth:latest(commit:67e1208973de728958432aaba165766eac1ce3a0)
>>>>>>>>>>>>
>>>>>>>>>>>> * Test procedure
>>>>>>>>>>>> we regularly switch a ticket. The previous test also used the same way.
>>>>>>>>>>>> And, There was no a memory leak when we tested pacemaker-1.1 before
>>>>>>>>>>>> pacemaker use libqb.
>>>>>>>>>>>>
>>>>>>>>>>>> * Result
>>>>>>>>>>>> As a result, I think that crmd may cause the memory leak.
>>>>>>>>>>>>
>>>>>>>>>>>> crmd smaps(a total of each addresses)
>>>>>>>>>>>> In detail, we attached smaps of  start and end. And, I recorded smaps
>>>>>>>>>>>> every 1 minutes.
>>>>>>>>>>>>
>>>>>>>>>>>> Start
>>>>>>>>>>>> RSS: 7396
>>>>>>>>>>>> SHR(Shared_Clean+Shared_Dirty):3560
>>>>>>>>>>>> Private(Private_Clean+Private_Dirty):3836
>>>>>>>>>>>>
>>>>>>>>>>>> Interbal(about 30h later)
>>>>>>>>>>>> RSS:18464
>>>>>>>>>>>> SHR:14276
>>>>>>>>>>>> Private:4188
>>>>>>>>>>>>
>>>>>>>>>>>> End(about 70h later)
>>>>>>>>>>>> RSS:19104
>>>>>>>>>>>> SHR:14336
>>>>>>>>>>>> Private:4768
>>>>>>>>>>>>
>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>> Yuichi
>>>>>>>>>>>>
>>>>>>>>>>>> 2013/5/15 Yuichi SEINO <seino.cluster2 at gmail.com>:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I ran the test for about two days.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Environment
>>>>>>>>>>>>>
>>>>>>>>>>>>> OS:RHEL 6.3
>>>>>>>>>>>>> pacemaker-1.1.9-devel (commit 138556cb0b375a490a96f35e7fbeccc576a22011)
>>>>>>>>>>>>> corosync-2.3.0
>>>>>>>>>>>>> cluster-glue latest+patch(detail:http://www.gossamer-threads.com/lists/linuxha/dev/85787)
>>>>>>>>>>>>> libqb- 0.14.4
>>>>>>>>>>>>>
>>>>>>>>>>>>> There may be a memory leak in crmd and lrmd. I regularly got rss of ps.
>>>>>>>>>>>>>
>>>>>>>>>>>>> start-up
>>>>>>>>>>>>> crmd:5332
>>>>>>>>>>>>> lrmd:3625
>>>>>>>>>>>>>
>>>>>>>>>>>>> interval(about 30h later)
>>>>>>>>>>>>> crmd:7716
>>>>>>>>>>>>> lrmd:3744
>>>>>>>>>>>>>
>>>>>>>>>>>>> ending(about 60h later)
>>>>>>>>>>>>> crmd:8336
>>>>>>>>>>>>> lrmd:3780
>>>>>>>>>>>>>
>>>>>>>>>>>>> I still don't run a test that pacemaker-1.1.10-rc2 use. So, I will run its test.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>> Yuichi
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Yuichi SEINO
>>>>>>>>>>>>> METROSYSTEMS CORPORATION
>>>>>>>>>>>>> E-mail:seino.cluster2 at gmail.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Yuichi SEINO
>>>>>>>>>>>> METROSYSTEMS CORPORATION
>>>>>>>>>>>> E-mail:seino.cluster2 at gmail.com
>>>>>>>>>>>> <smaps_log.tar.gz>_______________________________________________
>>>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>>>
>>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>>
>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>
>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Yuichi SEINO
>>>>>>>>> METROSYSTEMS CORPORATION
>>>>>>>>> E-mail:seino.cluster2 at gmail.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>
>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>> --
>>>> Yuichi SEINO
>>>> METROSYSTEMS CORPORATION
>>>> E-mail:seino.cluster2 at gmail.com
>>>
>>> --
>>> Yuichi SEINO
>>> METROSYSTEMS CORPORATION
>>> E-mail:seino.cluster2 at gmail.com
>>> <test_info.tar.bz>_______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> --
> Yuichi SEINO
> METROSYSTEMS CORPORATION
> E-mail:seino.cluster2 at gmail.com
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
----------------------------------------
METRO SYSTEMS CO., LTD

Yusuke Iida
Mail: yusk.iida at gmail.com
----------------------------------------