[Pacemaker] [corosync] Corosync memory usage rising

Mon Feb 4 23:36:50 UTC 2013

On Tue, Feb 5, 2013 at 9:26 AM, Yves Trudeau <y.trudeau at videotron.ca> wrote:
> Hi,
>
>
>> Are you running pacemaker (if so plugin or cpg version)? OpenAIS
>> services loaded? Is it clean corosync or corosync executed via cman?
>
> [root at mys001 ~]# rpm -qa | grep pacem
> pacemaker-libs-1.1.7-6.el6.x86_64
> pacemaker-cli-1.1.7-6.el6.x86_64
> pacemaker-1.1.7-6.el6.x86_64
> pacemaker-cluster-libs-1.1.7-6.el6.x86_64
>
> I am not using cman and openais, just regular Pacemaker setup with Corosync.

Oh, so you're loading the pacemaker plugin?
Its possible that this is the part that is leaking...

>
> [root at mys001 ~]# rpm -qa | grep openais
> [root at mys001 ~]#
>
> Although from crm status:
>
> [root at mys001 ~]# crm status
> ============
> Last updated: Mon Feb  4 14:24:03 2013
> Last change: Wed Jan 30 09:29:41 2013 via crm_attribute on mys002
> Stack: openais
> Current DC: mys001 - partition with quorum
> Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
>
> Regards,
>
> Yves
>
> Le 2013-02-04 05:06, Jan Friesse a écrit :
>
>> Andrew Beekhof napsal(a):
>>>
>>> On Thu, Jan 31, 2013 at 8:10 AM, Yves Trudeau <y.trudeau at videotron.ca>
>>> wrote:
>>>>
>>>> Hi,
>>>>     Is there any known memory leak issue corosync 1.4.1.  I have a setup
>>>> here
>>>> where corosync eats memory at a few kB a minute:
>>
>>
>> 1.4.1 for sure. But it looks you are using 1.4.1-7 (EL 6.3.z), and I
>> must say no, there is no known bug like this.
>>
>> Are you running pacemaker (if so plugin or cpg version)? OpenAIS
>> services loaded? Is it clean corosync or corosync executed via cman?
>>
>> Honza
>>
>>>>
>>>> [root at mys002 mysql]# while [ 1 ]; do ps faxu | grep corosync | grep -v
>>>> grep;
>>>> sleep 60; done
>>>> root     11071  0.2  0.0 624256  8840 ?        Ssl  09:14   0:02
>>>> corosync
>>>> root     11071  0.2  0.0 624344  9144 ?        Ssl  09:14   0:02
>>>> corosync
>>>> root     11071  0.2  0.0 624344  9424 ?        Ssl  09:14   0:02
>>>> corosync
>>>>
>>>> It goes on like that until no more memory which is still a long time.
>>>> Another has corosync running for a long time:
>>>>
>>>> [root at mys001 mysql]# ps faxu | grep corosync | grep -v grep
>>>> root     15735  0.2 21.5 4038664 3429592 ?     Ssl   2012 184:19
>>>> corosync
>>>>
>>>> which is nearly 3.4GB.
>>>
>>>
>>> Holy heck!
>>> Bouncing to the corosync ML for comment.
>>>
>>>>
>>>> [root at mys002 mysql]# rpm -qa | grep -i coro
>>>> corosynclib-1.4.1-7.el6_3.1.x86_64
>>>> corosync-1.4.1-7.el6_3.1.x86_64
>>>> [root at mys002 mysql]# uname -a
>>>> Linux mys002 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6 19:48:22 GMT 2011
>>>> x86_64
>>>> x86_64 x86_64 GNU/Linux
>>>>
>>>> looking at smaps of the process, I found this:
>>>>
>>>> 020b6000-d2b34000 rw-p 00000000 00:00 0
>>>> Size:            3418616 kB
>>>> Rss:             3417756 kB
>>>> Pss:             3417756 kB
>>>> Shared_Clean:          0 kB
>>>> Shared_Dirty:          0 kB
>>>> Private_Clean:         0 kB
>>>> Private_Dirty:   3417756 kB
>>>> Referenced:      3417064 kB
>>>> Anonymous:       3417756 kB
>>>> AnonHugePages:   3416064 kB
>>>> Swap:                  0 kB
>>>> KernelPageSize:        4 kB
>>>> MMUPageSize:           4 kB
>>>>
>>>>
>>>> this setup is using udpu
>>>>
>>>> totem {
>>>>          version: 2
>>>>          secauth: on
>>>>          threads: 0
>>>>
>>>>           window_size: 5
>>>>           max_messages: 5
>>>>           netmtu: 1000
>>>>
>>>>           token: 5000
>>>>           join: 1000
>>>>           consensus: 5000
>>>>
>>>>          interface {
>>>>                   member {
>>>>                          memberaddr: 10.103.7.91
>>>>                  }
>>>>                  member {
>>>>                          memberaddr: 10.103.7.92
>>>>                  }
>>>>                  ringnumber: 0
>>>>                  bindnetaddr: 10.103.7.91
>>>>                  mcastport: 5405
>>>>                  ttl: 1
>>>>          }
>>>>           transport: udpu
>>>> }
>>>>
>>>> with special timings because of issues with the vmware setup.
>>>>
>>>> Any idea of what could be causing this?
>>>>
>>>> Regards,
>>>>
>>>> Yves
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>> _______________________________________________
>>> discuss mailing list
>>> discuss at corosync.org
>>> http://lists.corosync.org/mailman/listinfo/discuss
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org