[ClusterLabs] Antw: Re: Memory leak in crm_mon ?
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Mon Aug 17 06:35:59 UTC 2015
>>> Andrew Beekhof <andrew at beekhof.net> schrieb am 17.08.2015 um 00:08 in
Nachricht
<FF78BE4F-173C-4A74-A989-92EA6C540A6B at beekhof.net>:
>> On 16 Aug 2015, at 9:41 pm, Attila Megyeri <amegyeri at minerva-soft.com>
wrote:
>>
>> Hi Andrew,
>>
>> I managed to isolate / reproduce the issue. You might want to take a look,
> as it might be present in 1.1.12 as well.
>>
>> I monitor my cluster from putty, mainly this way:
>> - I have a putty (Windows client) session, that connects via SSH to the
box,
> authenticates using public key as a non-root user.
>> - It immediately sends a "sudo crm_mon -Af" command, so with a single click
> I have a nice view of what the cluster is doing.
>
> Perhaps add -1 to the option list.
> The root cause seems to be that closing the putty window doesn’t actually
> kill the process running inside it.
Sorry, the root cause seems to be that cm_mon happily writes to a closed
filehandle (I guess). If crm_mon would handle that error by exiting the loop,
ther would be no need for putty to kill any process.
>
>>
>> Whenever I close this putty window (terminate the app), crm_mon process
gets
> to 100% cpu usage, starts to leak, in a few hours consumes all memory and
> then destroys the whole cluster.
>> This does not happen if I leave crm_mon with Ctrl-C.
>>
>> I can reproduce this 100% with crm_mon 1.1.10, with the mainstream ubuntu
> trusty packages.
>> This might be related on how sudo executes crm_mon, and what it signalls to
> crm_mon when it gets terminated.
>>
>> Now I know what I need to pay attention to in order to avoid this problem,
> but you might want to check whether this issue is still present.
>>
>>
>> Thanks,
>> Attila
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Attila Megyeri [mailto:amegyeri at minerva-soft.com]
>> Sent: Friday, August 14, 2015 12:40 AM
>> To: Cluster Labs - All topics related to open-source clustering welcomed
> <users at clusterlabs.org>
>> Subject: Re: [ClusterLabs] Memory leak in crm_mon ?
>>
>>
>>
>> -----Original Message-----
>> From: Andrew Beekhof [mailto:andrew at beekhof.net]
>> Sent: Tuesday, August 11, 2015 2:49 AM
>> To: Cluster Labs - All topics related to open-source clustering welcomed
> <users at clusterlabs.org>
>> Subject: Re: [ClusterLabs] Memory leak in crm_mon ?
>>
>>
>>> On 10 Aug 2015, at 5:33 pm, Attila Megyeri <amegyeri at minerva-soft.com>
wrote:
>>>
>>> Hi!
>>>
>>> We are building a new cluster on top of pacemaker/corosync and several
times
> during the past days we noticed that „crm_mon -Af” used up all the
> memory+swap and caused high CPU usage. Killing the process solves the
issue.
>>>
>>> We are using the binary package versions available in the latest ubuntu
> trusty, namely:
>>>
>>> crmsh
1.2.5+hg1034-1ubuntu4
>
>>> pacemaker
> 1.1.10+git20130802-1ubuntu2.3
>>> pacemaker-cli-utils 1.1.10+git20130802-1ubuntu2.3
>>> corosync 2.3.3-1ubuntu1
>>>
>>> Kernel is 3.13.0-46-generic
>>>
>>> Looking back some „atop” data, the CPU went to 100% many times during
the
> last couple of days, at various times, more often around midnight exaclty
> (strange).
>>>
>>> 08.05 14:00
>>> 08.06 21:41
>>> 08.07 00:00
>>> 08.07 00:00
>>> 08.08 00:00
>>> 08.09 06:27
>>>
>>> Checked the corosync log and syslog, but did not find any correlation
> between the entries int he logs around the specific times.
>>> For most of the time, the node running the crm_mon was the DC as well –
not
> running any resources (e.g. a pairless node for quorum).
>>>
>>>
>>> We have another running system, where everything works perfecly, whereas
it
> is almost the same:
>>>
>>> crmsh
1.2.5+hg1034-1ubuntu4
>
>>> pacemaker
> 1.1.10+git20130802-1ubuntu2.1
>>> pacemaker-cli-utils 1.1.10+git20130802-1ubuntu2.1
>>> corosync 2.3.3-1ubuntu1
>>>
>>> Kernel is 3.13.0-8-generic
>>>
>>>
>>> Is this perhaps a known issue?
>>
>> Possibly, that version is over 2 years old.
>>
>>> Any hints?
>>
>> Getting something a little more recent would be the best place to start
>>
>> Thanks Andew,
>>
>> I tried to upgrade to 1.1.12 using the packages availabe at
> https://launchpad.net/~syseleven-platform . Int he first attept I upgraded a
> single node, to see how it works out but I ended up with errors like
>>
>> Could not establish cib_rw connection: Connection refused (111)
>>
>> I have disabled the firewall, no changes. The node appears to be running
but
> does not see any of the other nodes. On the other nodes I see this node as
an
> UNCLEAN one. (I assume corosync is fine, but pacemaker not)
>> I use udpu for the transport.
>>
>> Am I doing something wrong? I tried to look for some howtos on upgrade, but
> the only thing I found was the rather outdated
> http://clusterlabs.org/wiki/Upgrade
>>
>> Could you please direct me to some howto/guide on how to perform the
> upgrade?
>>
>> Or am I facing some compatibility issue, so I should extract the whole cib,
> upgrade all nodes and reconfigure the cluster from the scratch? (The cluster
> is meant to go live in 2 days... :) )
>>
>> Thanks a lot in advance
>>
>>
>>
>>
>>>
>>> Thanks!
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list