[Pacemaker] crmd used all its file descriptors

Piotr Jewiec piotr at jewiec.net
Fri Dec 7 16:54:38 UTC 2012


Hi,

I have a corosync/pacemaker cluster running on Ubuntu 10.04.2. The 
following error is getting appended to the syslog:

Dec  6 20:44:46 filer-1 crmd: [2970]: ERROR: socket_client_channel_new: 
socket: Too many open files
Dec  6 20:44:46 filer-1 crmd: [2970]: ERROR: 
init_client_ipc_comms_nodispatch: Could not access channel on: 
/var/run/crm/pengine
Dec  6 20:44:46 filer-1 crmd: [2970]: WARN: do_pe_control: Setup of 
client connection failed, not adding channel to mainloop
Dec  6 20:44:46 filer-1 crmd: [2970]: WARN: do_log: FSA: Input I_FAIL 
from do_pe_control() received in state S_INTEGRATION
Dec  6 20:44:46 filer-1 crmd: [2970]: info: do_dc_join_offer_all: 
join-24: Waiting on 2 outstanding join acks
Dec  6 20:44:46 filer-1 crmd: [2970]: info: do_dc_takeover: Taking over 
DC status for this partition


root at filer-1:~# lsof -p `pidof crmd` | grep socket | wc -l
1019

root at filer-1:~# cat /proc/2970/limits | grep 'open files'
Max open files            1024                 1024                 
files

I almost fainted when I saw this one :)

crm(live)# status
============
Last updated: Fri Dec  7 06:38:48 2012
Stack: openais
Current DC: filer-1 - partition with quorum
Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
2 Nodes configured, 2 expected votes
11 Resources configured.
============

OFFLINE: [ filer-2 filer-1 ]

As far as I'm concerned killall -9 crmd will release used FDs. Does 
anyone has any idea how this will work? I tested killing crmd on another 
cluster (without this problem) and all resources were migrated to second 
node. What can possibly happen in this case where cluster communication 
is busted? Anyone ever dealt with similar problem? Resources are 
currently running on filer-1, a node which had been MASTER nefore this 
problem occurred.

Packages:

pacemaker - Version: 1.0.8+hg15494-2ubuntu2
corosync - Version: 1.2.0-0ubuntu1
cluster-glue - Version: 1.0.5-1
libcorosync4 - Version: 1.2.0-0ubuntu1
libheartbeat2 - Version: 1:3.0.3-1ubuntu1

Any help/advice would be really appreciated :)
-- 
--
Piotr Jewiec




More information about the Pacemaker mailing list