[ClusterLabs] Memory leak in crm_mon ?

Sun Aug 16 22:08:58 UTC 2015

> On 16 Aug 2015, at 9:41 pm, Attila Megyeri <amegyeri at minerva-soft.com> wrote:
> 
> Hi Andrew,
> 
> I managed to isolate / reproduce the issue. You might want to take a look, as it might be present in 1.1.12 as well.
> 
> I monitor my cluster from putty, mainly this way:
> - I have a putty (Windows client) session, that connects via SSH to the box, authenticates using public key as a non-root user.
> - It immediately sends a "sudo crm_mon -Af" command, so with a single click I have a nice view of what the cluster is doing.

Perhaps add -1 to the option list.
The root cause seems to be that closing the putty window doesn’t actually kill the process running inside it.

> 
> Whenever I close this putty window (terminate the app), crm_mon process gets to 100% cpu usage, starts to leak, in a few hours consumes all memory and then destroys the whole cluster.
> This does not happen if I leave crm_mon with Ctrl-C.
> 
> I can reproduce this 100% with crm_mon 1.1.10, with the mainstream ubuntu trusty packages.
> This might be related on how sudo executes crm_mon, and what it signalls to crm_mon when it gets terminated.
> 
> Now I know what I need to pay attention to in order to avoid this problem, but you might want to check whether this issue is still present.
> 
> 
> Thanks,
> Attila 
> 
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Attila Megyeri [mailto:amegyeri at minerva-soft.com] 
> Sent: Friday, August 14, 2015 12:40 AM
> To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
> Subject: Re: [ClusterLabs] Memory leak in crm_mon ?
> 
> 
> 
> -----Original Message-----
> From: Andrew Beekhof [mailto:andrew at beekhof.net] 
> Sent: Tuesday, August 11, 2015 2:49 AM
> To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
> Subject: Re: [ClusterLabs] Memory leak in crm_mon ?
> 
> 
>> On 10 Aug 2015, at 5:33 pm, Attila Megyeri <amegyeri at minerva-soft.com> wrote:
>> 
>> Hi!
>> 
>> We are building a new cluster on top of pacemaker/corosync and several times during the past days we noticed that „crm_mon -Af” used up all the memory+swap and caused high CPU usage. Killing the process solves the issue.
>> 
>> We are using the binary package versions available in the latest ubuntu trusty, namely:
>> 
>> crmsh                                                  1.2.5+hg1034-1ubuntu4                 
>> pacemaker                                        1.1.10+git20130802-1ubuntu2.3  
>> pacemaker-cli-utils                        1.1.10+git20130802-1ubuntu2.3  
>> corosync                                             2.3.3-1ubuntu1   
>> 
>> Kernel is                                             3.13.0-46-generic
>> 
>> Looking back some „atop” data, the CPU went to 100% many times during the last couple of days, at various times, more often around midnight exaclty (strange).
>> 
>> 08.05     14:00
>> 08.06     21:41
>> 08.07     00:00
>> 08.07     00:00
>> 08.08     00:00
>> 08.09     06:27
>> 
>> Checked the corosync log and syslog, but did not find any correlation between the entries int he logs around the specific times.
>> For most of the time, the node running the crm_mon was the DC as well – not running any resources (e.g. a pairless node for quorum).
>> 
>> 
>> We have another running system, where everything works perfecly, whereas it is almost the same:
>> 
>> crmsh                                                  1.2.5+hg1034-1ubuntu4                              
>> pacemaker                                        1.1.10+git20130802-1ubuntu2.1 
>> pacemaker-cli-utils                        1.1.10+git20130802-1ubuntu2.1 
>> corosync                                             2.3.3-1ubuntu1      
>> 
>> Kernel is                                             3.13.0-8-generic
>> 
>> 
>> Is this perhaps a known issue?
> 
> Possibly, that version is over 2 years old.
> 
>> Any hints?
> 
> Getting something a little more recent would be the best place to start
> 
> Thanks Andew,
> 
> I tried to upgrade to 1.1.12 using the packages availabe at https://launchpad.net/~syseleven-platform . Int he first attept I upgraded a single node, to see how it works out but I ended up with errors like
> 
> Could not establish cib_rw connection: Connection refused (111)
> 
> I have disabled the firewall, no changes. The node appears to be running but does not see any of the other nodes. On the other nodes I see this node as an UNCLEAN one. (I assume corosync is fine, but pacemaker not)
> I use udpu for the transport.
> 
> Am I doing something wrong? I tried to look for some howtos on upgrade, but the only thing I found was the rather outdated   http://clusterlabs.org/wiki/Upgrade
> 
> Could you please direct me to some howto/guide on how to perform the upgrade?
> 
> Or am I facing some compatibility issue, so I should extract the whole cib, upgrade all nodes and reconfigure the cluster from the scratch? (The cluster is meant to go live in 2 days... :) )
> 
> Thanks a lot in advance
> 
> 
> 
> 
>> 
>> Thanks!
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org 
>> http://clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org