[Pacemaker] lrmd Memory Usage

Wed Apr 30 07:40:05 EDT 2014

I've also just started capturing the process' mapped size every 5 minutes, and can see that on the active node this grows by exactly 132KB every 10 minutes. (Or at least I can see the growth every other time I capture the data):

2014.Apr.30 12:35:00    1175072768
2014.Apr.30 12:29:59    1175072768
2014.Apr.30 12:25:00    1174937600
2014.Apr.30 12:19:59    1174937600
2014.Apr.30 12:14:59    1174802432
2014.Apr.30 12:09:59    1174802432
2014.Apr.30 12:04:59    1174667264
2014.Apr.30 11:59:59    1174667264
2014.Apr.30 11:54:59    1174532096
2014.Apr.30 11:49:59    1174532096
2014.Apr.30 11:44:59    1174396928
2014.Apr.30 11:40:00    1174396928

From: Greg Murphy <greg.murphy at gamesparks.com<mailto:greg.murphy at gamesparks.com>>
Reply-To: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org<mailto:pacemaker at oss.clusterlabs.org>>
Date: Wednesday, 30 April 2014 12:01
To: "pacemaker at oss.clusterlabs.org<mailto:pacemaker at oss.clusterlabs.org>" <pacemaker at oss.clusterlabs.org<mailto:pacemaker at oss.clusterlabs.org>>
Subject: [Pacemaker] lrmd Memory Usage

Hi

I'm running a two-node Pacemaker cluster on Ubuntu Saucy (13.10), kernel 3.11.0-17-generic and the Ubuntu Pacemaker package, version 1.1.10+git20130802-1ubuntu1. The cluster is configured with a DRBD master/slave set and then a failover resource group containing MySQL (along with its DRBD filesystem) and a Zabbix Proxy and Agent.

Since I built the cluster around two months ago I've noticed that on the the active node the memory footprint of lrmd gradually grows to quite a significant size. The cluster was last restarted three weeks ago, and now lrmd has over 1GB of mapped memory on the active node and only 151MB on the passive node. Current excerpts from /proc/PID/status are:

Active node

VmPeak:1146740 kB

VmSize:1146740 kB

VmLck:      0 kB

VmPin:      0 kB

VmHWM:  267680 kB

VmRSS:  188764 kB

VmData:1065860 kB

VmStk:    136 kB

VmExe:      32 kB

VmLib:  10416 kB

VmPTE:    2164 kB

VmSwap:  822752 kB

Passive node

VmPeak:  220832 kB

VmSize:  155428 kB

VmLck:      0 kB

VmPin:      0 kB

VmHWM:    4568 kB

VmRSS:    3880 kB

VmData:  74548 kB

VmStk:    136 kB

VmExe:      32 kB

VmLib:  10416 kB

VmPTE:    172 kB

VmSwap:      0 kB

During the last week or so I've taken a couple of snapshots of /proc/PID/smaps on the active node, and the heap particularly stands out as growing: (I have the full outputs captured if they'll help)

20140422

7f92e1578000-7f92f218b000 rw-p 00000000 00:00 0                          [heap]

Size:             274508 kB

Rss:              180152 kB

Pss:              180152 kB

Shared_Clean:          0 kB

Shared_Dirty:          0 kB

Private_Clean:         0 kB

Private_Dirty:    180152 kB

Referenced:       120472 kB

Anonymous:        180152 kB

AnonHugePages:         0 kB

Swap:              91568 kB

KernelPageSize:        4 kB

MMUPageSize:           4 kB

Locked:                0 kB

VmFlags: rd wr mr mw me ac

20140423

7f92e1578000-7f92f305e000 rw-p 00000000 00:00 0                          [heap]

Size:             289688 kB

Rss:              184136 kB

Pss:              184136 kB

Shared_Clean:          0 kB

Shared_Dirty:          0 kB

Private_Clean:         0 kB

Private_Dirty:    184136 kB

Referenced:        69748 kB

Anonymous:        184136 kB

AnonHugePages:         0 kB

Swap:             103112 kB

KernelPageSize:        4 kB

MMUPageSize:           4 kB

Locked:                0 kB

VmFlags: rd wr mr mw me ac

20140430

7f92e1578000-7f92fc01d000 rw-p 00000000 00:00 0                          [heap]

Size:             436884 kB

Rss:              140812 kB

Pss:              140812 kB

Shared_Clean:          0 kB

Shared_Dirty:          0 kB

Private_Clean:       744 kB

Private_Dirty:    140068 kB

Referenced:        43600 kB

Anonymous:        140812 kB

AnonHugePages:         0 kB

Swap:             287392 kB

KernelPageSize:        4 kB

MMUPageSize:           4 kB

Locked:                0 kB

VmFlags: rd wr mr mw me ac

I noticed in the release notes for 1.1.10-rc1 (https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.10-rc1) that there was work done to fix "crmd: lrmd: stonithd: fixed memory leaks" but I'm not sure which particular bug this was related to. (And those fixes should be in the version I'm running anyway).

I've also spotted a few memory leak fixes in https://github.com/beekhof/pacemaker, but I'm not sure whether they relate to my issue (assuming I have a memory leak and this isn't expected behaviour).

Is there additional debugging that I can perform to check whether I have a leak, or is there enough evidence to justify upgrading to 1.1.11?

Thanks in advance

Greg Murphy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140430/e8f82ee5/attachment-0003.html>