[ClusterLabs] Memory leak in crm_mon ?
Andrew Beekhof
andrew at beekhof.net
Sun Aug 16 22:08:58 UTC 2015
> On 16 Aug 2015, at 9:41 pm, Attila Megyeri <amegyeri at minerva-soft.com> wrote:
>
> Hi Andrew,
>
> I managed to isolate / reproduce the issue. You might want to take a look, as it might be present in 1.1.12 as well.
>
> I monitor my cluster from putty, mainly this way:
> - I have a putty (Windows client) session, that connects via SSH to the box, authenticates using public key as a non-root user.
> - It immediately sends a "sudo crm_mon -Af" command, so with a single click I have a nice view of what the cluster is doing.
Perhaps add -1 to the option list.
The root cause seems to be that closing the putty window doesn’t actually kill the process running inside it.
>
> Whenever I close this putty window (terminate the app), crm_mon process gets to 100% cpu usage, starts to leak, in a few hours consumes all memory and then destroys the whole cluster.
> This does not happen if I leave crm_mon with Ctrl-C.
>
> I can reproduce this 100% with crm_mon 1.1.10, with the mainstream ubuntu trusty packages.
> This might be related on how sudo executes crm_mon, and what it signalls to crm_mon when it gets terminated.
>
> Now I know what I need to pay attention to in order to avoid this problem, but you might want to check whether this issue is still present.
>
>
> Thanks,
> Attila
>
>
>
>
>
>
> -----Original Message-----
> From: Attila Megyeri [mailto:amegyeri at minerva-soft.com]
> Sent: Friday, August 14, 2015 12:40 AM
> To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
> Subject: Re: [ClusterLabs] Memory leak in crm_mon ?
>
>
>
> -----Original Message-----
> From: Andrew Beekhof [mailto:andrew at beekhof.net]
> Sent: Tuesday, August 11, 2015 2:49 AM
> To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
> Subject: Re: [ClusterLabs] Memory leak in crm_mon ?
>
>
>> On 10 Aug 2015, at 5:33 pm, Attila Megyeri <amegyeri at minerva-soft.com> wrote:
>>
>> Hi!
>>
>> We are building a new cluster on top of pacemaker/corosync and several times during the past days we noticed that „crm_mon -Af” used up all the memory+swap and caused high CPU usage. Killing the process solves the issue.
>>
>> We are using the binary package versions available in the latest ubuntu trusty, namely:
>>
>> crmsh 1.2.5+hg1034-1ubuntu4
>> pacemaker 1.1.10+git20130802-1ubuntu2.3
>> pacemaker-cli-utils 1.1.10+git20130802-1ubuntu2.3
>> corosync 2.3.3-1ubuntu1
>>
>> Kernel is 3.13.0-46-generic
>>
>> Looking back some „atop” data, the CPU went to 100% many times during the last couple of days, at various times, more often around midnight exaclty (strange).
>>
>> 08.05 14:00
>> 08.06 21:41
>> 08.07 00:00
>> 08.07 00:00
>> 08.08 00:00
>> 08.09 06:27
>>
>> Checked the corosync log and syslog, but did not find any correlation between the entries int he logs around the specific times.
>> For most of the time, the node running the crm_mon was the DC as well – not running any resources (e.g. a pairless node for quorum).
>>
>>
>> We have another running system, where everything works perfecly, whereas it is almost the same:
>>
>> crmsh 1.2.5+hg1034-1ubuntu4
>> pacemaker 1.1.10+git20130802-1ubuntu2.1
>> pacemaker-cli-utils 1.1.10+git20130802-1ubuntu2.1
>> corosync 2.3.3-1ubuntu1
>>
>> Kernel is 3.13.0-8-generic
>>
>>
>> Is this perhaps a known issue?
>
> Possibly, that version is over 2 years old.
>
>> Any hints?
>
> Getting something a little more recent would be the best place to start
>
> Thanks Andew,
>
> I tried to upgrade to 1.1.12 using the packages availabe at https://launchpad.net/~syseleven-platform . Int he first attept I upgraded a single node, to see how it works out but I ended up with errors like
>
> Could not establish cib_rw connection: Connection refused (111)
>
> I have disabled the firewall, no changes. The node appears to be running but does not see any of the other nodes. On the other nodes I see this node as an UNCLEAN one. (I assume corosync is fine, but pacemaker not)
> I use udpu for the transport.
>
> Am I doing something wrong? I tried to look for some howtos on upgrade, but the only thing I found was the rather outdated http://clusterlabs.org/wiki/Upgrade
>
> Could you please direct me to some howto/guide on how to perform the upgrade?
>
> Or am I facing some compatibility issue, so I should extract the whole cib, upgrade all nodes and reconfigure the cluster from the scratch? (The cluster is meant to go live in 2 days... :) )
>
> Thanks a lot in advance
>
>
>
>
>>
>> Thanks!
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list