[ClusterLabs] Memory leak in crm_mon ?
Attila Megyeri
amegyeri at minerva-soft.com
Sun Aug 16 11:41:34 UTC 2015
Hi Andrew,
I managed to isolate / reproduce the issue. You might want to take a look, as it might be present in 1.1.12 as well.
I monitor my cluster from putty, mainly this way:
- I have a putty (Windows client) session, that connects via SSH to the box, authenticates using public key as a non-root user.
- It immediately sends a "sudo crm_mon -Af" command, so with a single click I have a nice view of what the cluster is doing.
Whenever I close this putty window (terminate the app), crm_mon process gets to 100% cpu usage, starts to leak, in a few hours consumes all memory and then destroys the whole cluster.
This does not happen if I leave crm_mon with Ctrl-C.
I can reproduce this 100% with crm_mon 1.1.10, with the mainstream ubuntu trusty packages.
This might be related on how sudo executes crm_mon, and what it signalls to crm_mon when it gets terminated.
Now I know what I need to pay attention to in order to avoid this problem, but you might want to check whether this issue is still present.
Thanks,
Attila
-----Original Message-----
From: Attila Megyeri [mailto:amegyeri at minerva-soft.com]
Sent: Friday, August 14, 2015 12:40 AM
To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
Subject: Re: [ClusterLabs] Memory leak in crm_mon ?
-----Original Message-----
From: Andrew Beekhof [mailto:andrew at beekhof.net]
Sent: Tuesday, August 11, 2015 2:49 AM
To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
Subject: Re: [ClusterLabs] Memory leak in crm_mon ?
> On 10 Aug 2015, at 5:33 pm, Attila Megyeri <amegyeri at minerva-soft.com> wrote:
>
> Hi!
>
> We are building a new cluster on top of pacemaker/corosync and several times during the past days we noticed that „crm_mon -Af” used up all the memory+swap and caused high CPU usage. Killing the process solves the issue.
>
> We are using the binary package versions available in the latest ubuntu trusty, namely:
>
> crmsh 1.2.5+hg1034-1ubuntu4
> pacemaker 1.1.10+git20130802-1ubuntu2.3
> pacemaker-cli-utils 1.1.10+git20130802-1ubuntu2.3
> corosync 2.3.3-1ubuntu1
>
> Kernel is 3.13.0-46-generic
>
> Looking back some „atop” data, the CPU went to 100% many times during the last couple of days, at various times, more often around midnight exaclty (strange).
>
> 08.05 14:00
> 08.06 21:41
> 08.07 00:00
> 08.07 00:00
> 08.08 00:00
> 08.09 06:27
>
> Checked the corosync log and syslog, but did not find any correlation between the entries int he logs around the specific times.
> For most of the time, the node running the crm_mon was the DC as well – not running any resources (e.g. a pairless node for quorum).
>
>
> We have another running system, where everything works perfecly, whereas it is almost the same:
>
> crmsh 1.2.5+hg1034-1ubuntu4
> pacemaker 1.1.10+git20130802-1ubuntu2.1
> pacemaker-cli-utils 1.1.10+git20130802-1ubuntu2.1
> corosync 2.3.3-1ubuntu1
>
> Kernel is 3.13.0-8-generic
>
>
> Is this perhaps a known issue?
Possibly, that version is over 2 years old.
> Any hints?
Getting something a little more recent would be the best place to start
Thanks Andew,
I tried to upgrade to 1.1.12 using the packages availabe at https://launchpad.net/~syseleven-platform . Int he first attept I upgraded a single node, to see how it works out but I ended up with errors like
Could not establish cib_rw connection: Connection refused (111)
I have disabled the firewall, no changes. The node appears to be running but does not see any of the other nodes. On the other nodes I see this node as an UNCLEAN one. (I assume corosync is fine, but pacemaker not)
I use udpu for the transport.
Am I doing something wrong? I tried to look for some howtos on upgrade, but the only thing I found was the rather outdated http://clusterlabs.org/wiki/Upgrade
Could you please direct me to some howto/guide on how to perform the upgrade?
Or am I facing some compatibility issue, so I should extract the whole cib, upgrade all nodes and reconfigure the cluster from the scratch? (The cluster is meant to go live in 2 days... :) )
Thanks a lot in advance
>
> Thanks!
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
_______________________________________________
Users mailing list: Users at clusterlabs.org http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Users mailing list: Users at clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list