[Pacemaker] lrmd Memory Usage

Wed Apr 30 07:01:46 EDT 2014

Hi

I'm running a two-node Pacemaker cluster on Ubuntu Saucy (13.10), kernel 3.11.0-17-generic and the Ubuntu Pacemaker package, version 1.1.10+git20130802-1ubuntu1. The cluster is configured with a DRBD master/slave set and then a failover resource group containing MySQL (along with its DRBD filesystem) and a Zabbix Proxy and Agent.

Since I built the cluster around two months ago I've noticed that on the the active node the memory footprint of lrmd gradually grows to quite a significant size. The cluster was last restarted three weeks ago, and now lrmd has over 1GB of mapped memory on the active node and only 151MB on the passive node. Current excerpts from /proc/PID/status are:

Active node

VmPeak: 1146740 kB

VmSize: 1146740 kB

VmLck:       0 kB

VmPin:       0 kB

VmHWM:   267680 kB

VmRSS:   188764 kB

VmData: 1065860 kB

VmStk:     136 kB

VmExe:       32 kB

VmLib:   10416 kB

VmPTE:     2164 kB

VmSwap:   822752 kB

Passive node

VmPeak:   220832 kB

VmSize:   155428 kB

VmLck:       0 kB

VmPin:       0 kB

VmHWM:     4568 kB

VmRSS:     3880 kB

VmData:   74548 kB

VmStk:     136 kB

VmExe:       32 kB

VmLib:   10416 kB

VmPTE:     172 kB

VmSwap:       0 kB

During the last week or so I've taken a couple of snapshots of /proc/PID/smaps on the active node, and the heap particularly stands out as growing: (I have the full outputs captured if they'll help)

20140422

7f92e1578000-7f92f218b000 rw-p 00000000 00:00 0                          [heap]

Size:             274508 kB

Rss:              180152 kB

Pss:              180152 kB

Shared_Clean:          0 kB

Shared_Dirty:          0 kB

Private_Clean:         0 kB

Private_Dirty:    180152 kB

Referenced:       120472 kB

Anonymous:        180152 kB

AnonHugePages:         0 kB

Swap:              91568 kB

KernelPageSize:        4 kB

MMUPageSize:           4 kB

Locked:                0 kB

VmFlags: rd wr mr mw me ac

20140423

7f92e1578000-7f92f305e000 rw-p 00000000 00:00 0                          [heap]

Size:             289688 kB

Rss:              184136 kB

Pss:              184136 kB

Shared_Clean:          0 kB

Shared_Dirty:          0 kB

Private_Clean:         0 kB

Private_Dirty:    184136 kB

Referenced:        69748 kB

Anonymous:        184136 kB

AnonHugePages:         0 kB

Swap:             103112 kB

KernelPageSize:        4 kB

MMUPageSize:           4 kB

Locked:                0 kB

VmFlags: rd wr mr mw me ac

20140430

7f92e1578000-7f92fc01d000 rw-p 00000000 00:00 0                          [heap]

Size:             436884 kB

Rss:              140812 kB

Pss:              140812 kB

Shared_Clean:          0 kB

Shared_Dirty:          0 kB

Private_Clean:       744 kB

Private_Dirty:    140068 kB

Referenced:        43600 kB

Anonymous:        140812 kB

AnonHugePages:         0 kB

Swap:             287392 kB

KernelPageSize:        4 kB

MMUPageSize:           4 kB

Locked:                0 kB

VmFlags: rd wr mr mw me ac

I noticed in the release notes for 1.1.10-rc1 (https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.10-rc1) that there was work done to fix "crmd: lrmd: stonithd: fixed memory leaks" but I'm not sure which particular bug this was related to. (And those fixes should be in the version I'm running anyway).

I've also spotted a few memory leak fixes in https://github.com/beekhof/pacemaker, but I'm not sure whether they relate to my issue (assuming I have a memory leak and this isn't expected behaviour).

Is there additional debugging that I can perform to check whether I have a leak, or is there enough evidence to justify upgrading to 1.1.11?

Thanks in advance

Greg Murphy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140430/e7a39899/attachment-0002.html>