[ClusterLabs] Antw: Re: Memory leak in crm_mon ?
Andrew Beekhof
andrew at beekhof.net
Mon Aug 17 06:59:28 UTC 2015
> On 17 Aug 2015, at 4:35 pm, Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de> wrote:
>
>>>> Andrew Beekhof <andrew at beekhof.net> schrieb am 17.08.2015 um 00:08 in
> Nachricht
> <FF78BE4F-173C-4A74-A989-92EA6C540A6B at beekhof.net>:
>
>>> On 16 Aug 2015, at 9:41 pm, Attila Megyeri <amegyeri at minerva-soft.com>
> wrote:
>>>
>>> Hi Andrew,
>>>
>>> I managed to isolate / reproduce the issue. You might want to take a look,
>
>> as it might be present in 1.1.12 as well.
>>>
>>> I monitor my cluster from putty, mainly this way:
>>> - I have a putty (Windows client) session, that connects via SSH to the
> box,
>> authenticates using public key as a non-root user.
>>> - It immediately sends a "sudo crm_mon -Af" command, so with a single click
>
>> I have a nice view of what the cluster is doing.
>>
>> Perhaps add -1 to the option list.
>> The root cause seems to be that closing the putty window doesn’t actually
>
>> kill the process running inside it.
>
> Sorry, the root cause seems to be that cm_mon happily writes to a closed
> filehandle (I guess). If crm_mon would handle that error by exiting the loop,
> ther would be no need for putty to kill any process.
No, if you want a process to die you need to kill it.
>
>>
>>>
>>> Whenever I close this putty window (terminate the app), crm_mon process
> gets
>> to 100% cpu usage, starts to leak, in a few hours consumes all memory and
>> then destroys the whole cluster.
>>> This does not happen if I leave crm_mon with Ctrl-C.
>>>
>>> I can reproduce this 100% with crm_mon 1.1.10, with the mainstream ubuntu
>> trusty packages.
>>> This might be related on how sudo executes crm_mon, and what it signalls to
>
>> crm_mon when it gets terminated.
>>>
>>> Now I know what I need to pay attention to in order to avoid this problem,
>
>> but you might want to check whether this issue is still present.
>>>
>>>
>>> Thanks,
>>> Attila
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Attila Megyeri [mailto:amegyeri at minerva-soft.com]
>>> Sent: Friday, August 14, 2015 12:40 AM
>>> To: Cluster Labs - All topics related to open-source clustering welcomed
>> <users at clusterlabs.org>
>>> Subject: Re: [ClusterLabs] Memory leak in crm_mon ?
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Andrew Beekhof [mailto:andrew at beekhof.net]
>>> Sent: Tuesday, August 11, 2015 2:49 AM
>>> To: Cluster Labs - All topics related to open-source clustering welcomed
>> <users at clusterlabs.org>
>>> Subject: Re: [ClusterLabs] Memory leak in crm_mon ?
>>>
>>>
>>>> On 10 Aug 2015, at 5:33 pm, Attila Megyeri <amegyeri at minerva-soft.com>
> wrote:
>>>>
>>>> Hi!
>>>>
>>>> We are building a new cluster on top of pacemaker/corosync and several
> times
>> during the past days we noticed that „crm_mon -Af” used up all the
>> memory+swap and caused high CPU usage. Killing the process solves the
> issue.
>>>>
>>>> We are using the binary package versions available in the latest ubuntu
>> trusty, namely:
>>>>
>>>> crmsh
> 1.2.5+hg1034-1ubuntu4
>>
>>>> pacemaker
>> 1.1.10+git20130802-1ubuntu2.3
>>>> pacemaker-cli-utils 1.1.10+git20130802-1ubuntu2.3
>
>>>> corosync 2.3.3-1ubuntu1
>>>>
>>>> Kernel is 3.13.0-46-generic
>>>>
>>>> Looking back some „atop” data, the CPU went to 100% many times during
> the
>> last couple of days, at various times, more often around midnight exaclty
>> (strange).
>>>>
>>>> 08.05 14:00
>>>> 08.06 21:41
>>>> 08.07 00:00
>>>> 08.07 00:00
>>>> 08.08 00:00
>>>> 08.09 06:27
>>>>
>>>> Checked the corosync log and syslog, but did not find any correlation
>> between the entries int he logs around the specific times.
>>>> For most of the time, the node running the crm_mon was the DC as well –
> not
>> running any resources (e.g. a pairless node for quorum).
>>>>
>>>>
>>>> We have another running system, where everything works perfecly, whereas
> it
>> is almost the same:
>>>>
>>>> crmsh
> 1.2.5+hg1034-1ubuntu4
>>
>>>> pacemaker
>> 1.1.10+git20130802-1ubuntu2.1
>>>> pacemaker-cli-utils 1.1.10+git20130802-1ubuntu2.1
>>>> corosync 2.3.3-1ubuntu1
>>>>
>>>> Kernel is 3.13.0-8-generic
>>>>
>>>>
>>>> Is this perhaps a known issue?
>>>
>>> Possibly, that version is over 2 years old.
>>>
>>>> Any hints?
>>>
>>> Getting something a little more recent would be the best place to start
>>>
>>> Thanks Andew,
>>>
>>> I tried to upgrade to 1.1.12 using the packages availabe at
>> https://launchpad.net/~syseleven-platform . Int he first attept I upgraded a
>
>> single node, to see how it works out but I ended up with errors like
>>>
>>> Could not establish cib_rw connection: Connection refused (111)
>>>
>>> I have disabled the firewall, no changes. The node appears to be running
> but
>> does not see any of the other nodes. On the other nodes I see this node as
> an
>> UNCLEAN one. (I assume corosync is fine, but pacemaker not)
>>> I use udpu for the transport.
>>>
>>> Am I doing something wrong? I tried to look for some howtos on upgrade, but
>
>> the only thing I found was the rather outdated
>> http://clusterlabs.org/wiki/Upgrade
>>>
>>> Could you please direct me to some howto/guide on how to perform the
>> upgrade?
>>>
>>> Or am I facing some compatibility issue, so I should extract the whole cib,
>
>> upgrade all nodes and reconfigure the cluster from the scratch? (The cluster
>
>> is meant to go live in 2 days... :) )
>>>
>>> Thanks a lot in advance
>>>
>>>
>>>
>>>
>>>>
>>>> Thanks!
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list