[ClusterLabs] Antw: Re: Memory leak in crm_mon ?

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Mon Aug 17 06:35:59 UTC 2015


>>> Andrew Beekhof <andrew at beekhof.net> schrieb am 17.08.2015 um 00:08 in
Nachricht
<FF78BE4F-173C-4A74-A989-92EA6C540A6B at beekhof.net>:

>> On 16 Aug 2015, at 9:41 pm, Attila Megyeri <amegyeri at minerva-soft.com>
wrote:
>> 
>> Hi Andrew,
>> 
>> I managed to isolate / reproduce the issue. You might want to take a look,

> as it might be present in 1.1.12 as well.
>> 
>> I monitor my cluster from putty, mainly this way:
>> - I have a putty (Windows client) session, that connects via SSH to the
box, 
> authenticates using public key as a non-root user.
>> - It immediately sends a "sudo crm_mon -Af" command, so with a single click

> I have a nice view of what the cluster is doing.
> 
> Perhaps add -1 to the option list.
> The root cause seems to be that closing the putty window doesn’t actually

> kill the process running inside it.

Sorry, the root cause seems to be that cm_mon happily writes to a closed
filehandle (I guess). If crm_mon would handle that error by exiting the loop,
ther would be no need for putty  to kill any process.

> 
>> 
>> Whenever I close this putty window (terminate the app), crm_mon process
gets 
> to 100% cpu usage, starts to leak, in a few hours consumes all memory and 
> then destroys the whole cluster.
>> This does not happen if I leave crm_mon with Ctrl-C.
>> 
>> I can reproduce this 100% with crm_mon 1.1.10, with the mainstream ubuntu 
> trusty packages.
>> This might be related on how sudo executes crm_mon, and what it signalls to

> crm_mon when it gets terminated.
>> 
>> Now I know what I need to pay attention to in order to avoid this problem,

> but you might want to check whether this issue is still present.
>> 
>> 
>> Thanks,
>> Attila 
>> 
>> 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Attila Megyeri [mailto:amegyeri at minerva-soft.com] 
>> Sent: Friday, August 14, 2015 12:40 AM
>> To: Cluster Labs - All topics related to open-source clustering welcomed 
> <users at clusterlabs.org>
>> Subject: Re: [ClusterLabs] Memory leak in crm_mon ?
>> 
>> 
>> 
>> -----Original Message-----
>> From: Andrew Beekhof [mailto:andrew at beekhof.net] 
>> Sent: Tuesday, August 11, 2015 2:49 AM
>> To: Cluster Labs - All topics related to open-source clustering welcomed 
> <users at clusterlabs.org>
>> Subject: Re: [ClusterLabs] Memory leak in crm_mon ?
>> 
>> 
>>> On 10 Aug 2015, at 5:33 pm, Attila Megyeri <amegyeri at minerva-soft.com>
wrote:
>>> 
>>> Hi!
>>> 
>>> We are building a new cluster on top of pacemaker/corosync and several
times 
> during the past days we noticed that „crm_mon -Af” used up all the 
> memory+swap and caused high CPU usage. Killing the process solves the
issue.
>>> 
>>> We are using the binary package versions available in the latest ubuntu 
> trusty, namely:
>>> 
>>> crmsh                                                 
1.2.5+hg1034-1ubuntu4 
>                 
>>> pacemaker                                        
> 1.1.10+git20130802-1ubuntu2.3  
>>> pacemaker-cli-utils                        1.1.10+git20130802-1ubuntu2.3 

>>> corosync                                             2.3.3-1ubuntu1   
>>> 
>>> Kernel is                                             3.13.0-46-generic
>>> 
>>> Looking back some „atop” data, the CPU went to 100% many times during
the 
> last couple of days, at various times, more often around midnight exaclty 
> (strange).
>>> 
>>> 08.05     14:00
>>> 08.06     21:41
>>> 08.07     00:00
>>> 08.07     00:00
>>> 08.08     00:00
>>> 08.09     06:27
>>> 
>>> Checked the corosync log and syslog, but did not find any correlation 
> between the entries int he logs around the specific times.
>>> For most of the time, the node running the crm_mon was the DC as well –
not 
> running any resources (e.g. a pairless node for quorum).
>>> 
>>> 
>>> We have another running system, where everything works perfecly, whereas
it 
> is almost the same:
>>> 
>>> crmsh                                                 
1.2.5+hg1034-1ubuntu4 
>                              
>>> pacemaker                                        
> 1.1.10+git20130802-1ubuntu2.1 
>>> pacemaker-cli-utils                        1.1.10+git20130802-1ubuntu2.1 
>>> corosync                                             2.3.3-1ubuntu1      
>>> 
>>> Kernel is                                             3.13.0-8-generic
>>> 
>>> 
>>> Is this perhaps a known issue?
>> 
>> Possibly, that version is over 2 years old.
>> 
>>> Any hints?
>> 
>> Getting something a little more recent would be the best place to start
>> 
>> Thanks Andew,
>> 
>> I tried to upgrade to 1.1.12 using the packages availabe at 
> https://launchpad.net/~syseleven-platform . Int he first attept I upgraded a

> single node, to see how it works out but I ended up with errors like
>> 
>> Could not establish cib_rw connection: Connection refused (111)
>> 
>> I have disabled the firewall, no changes. The node appears to be running
but 
> does not see any of the other nodes. On the other nodes I see this node as
an 
> UNCLEAN one. (I assume corosync is fine, but pacemaker not)
>> I use udpu for the transport.
>> 
>> Am I doing something wrong? I tried to look for some howtos on upgrade, but

> the only thing I found was the rather outdated   
> http://clusterlabs.org/wiki/Upgrade 
>> 
>> Could you please direct me to some howto/guide on how to perform the 
> upgrade?
>> 
>> Or am I facing some compatibility issue, so I should extract the whole cib,

> upgrade all nodes and reconfigure the cluster from the scratch? (The cluster

> is meant to go live in 2 days... :) )
>> 
>> Thanks a lot in advance
>> 
>> 
>> 
>> 
>>> 
>>> Thanks!
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org 
>>> http://clusterlabs.org/mailman/listinfo/users 
>>> 
>>> Project Home: http://www.clusterlabs.org Getting started: 
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>> Bugs: http://bugs.clusterlabs.org 
>> 
>> 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
>> 
>> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org 
>> http://clusterlabs.org/mailman/listinfo/users 
>> 
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org 
>> http://clusterlabs.org/mailman/listinfo/users 
>> 
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 







More information about the Users mailing list