[Pacemaker] crmd used all its file descriptors
Piotr Jewiec
piotr at jewiec.net
Fri Dec 7 16:54:38 UTC 2012
Hi,
I have a corosync/pacemaker cluster running on Ubuntu 10.04.2. The
following error is getting appended to the syslog:
Dec 6 20:44:46 filer-1 crmd: [2970]: ERROR: socket_client_channel_new:
socket: Too many open files
Dec 6 20:44:46 filer-1 crmd: [2970]: ERROR:
init_client_ipc_comms_nodispatch: Could not access channel on:
/var/run/crm/pengine
Dec 6 20:44:46 filer-1 crmd: [2970]: WARN: do_pe_control: Setup of
client connection failed, not adding channel to mainloop
Dec 6 20:44:46 filer-1 crmd: [2970]: WARN: do_log: FSA: Input I_FAIL
from do_pe_control() received in state S_INTEGRATION
Dec 6 20:44:46 filer-1 crmd: [2970]: info: do_dc_join_offer_all:
join-24: Waiting on 2 outstanding join acks
Dec 6 20:44:46 filer-1 crmd: [2970]: info: do_dc_takeover: Taking over
DC status for this partition
root at filer-1:~# lsof -p `pidof crmd` | grep socket | wc -l
1019
root at filer-1:~# cat /proc/2970/limits | grep 'open files'
Max open files 1024 1024
files
I almost fainted when I saw this one :)
crm(live)# status
============
Last updated: Fri Dec 7 06:38:48 2012
Stack: openais
Current DC: filer-1 - partition with quorum
Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
2 Nodes configured, 2 expected votes
11 Resources configured.
============
OFFLINE: [ filer-2 filer-1 ]
As far as I'm concerned killall -9 crmd will release used FDs. Does
anyone has any idea how this will work? I tested killing crmd on another
cluster (without this problem) and all resources were migrated to second
node. What can possibly happen in this case where cluster communication
is busted? Anyone ever dealt with similar problem? Resources are
currently running on filer-1, a node which had been MASTER nefore this
problem occurred.
Packages:
pacemaker - Version: 1.0.8+hg15494-2ubuntu2
corosync - Version: 1.2.0-0ubuntu1
cluster-glue - Version: 1.0.5-1
libcorosync4 - Version: 1.2.0-0ubuntu1
libheartbeat2 - Version: 1:3.0.3-1ubuntu1
Any help/advice would be really appreciated :)
--
--
Piotr Jewiec
More information about the Pacemaker
mailing list