[Pacemaker] crmd used all its file descriptors

emmanuel segura emi2fast at gmail.com
Fri Dec 7 12:11:01 EST 2012


If i remember well, this is old bug, has been fixed

2012/12/7 Piotr Jewiec <piotr at jewiec.net>

> Hi,
>
> I have a corosync/pacemaker cluster running on Ubuntu 10.04.2. The
> following error is getting appended to the syslog:
>
> Dec  6 20:44:46 filer-1 crmd: [2970]: ERROR: socket_client_channel_new:
> socket: Too many open files
> Dec  6 20:44:46 filer-1 crmd: [2970]: ERROR: init_client_ipc_comms_**nodispatch:
> Could not access channel on: /var/run/crm/pengine
> Dec  6 20:44:46 filer-1 crmd: [2970]: WARN: do_pe_control: Setup of client
> connection failed, not adding channel to mainloop
> Dec  6 20:44:46 filer-1 crmd: [2970]: WARN: do_log: FSA: Input I_FAIL from
> do_pe_control() received in state S_INTEGRATION
> Dec  6 20:44:46 filer-1 crmd: [2970]: info: do_dc_join_offer_all: join-24:
> Waiting on 2 outstanding join acks
> Dec  6 20:44:46 filer-1 crmd: [2970]: info: do_dc_takeover: Taking over DC
> status for this partition
>
>
> root at filer-1:~# lsof -p `pidof crmd` | grep socket | wc -l
> 1019
>
> root at filer-1:~# cat /proc/2970/limits | grep 'open files'
> Max open files            1024                 1024                 files
>
> I almost fainted when I saw this one :)
>
> crm(live)# status
> ============
> Last updated: Fri Dec  7 06:38:48 2012
> Stack: openais
> Current DC: filer-1 - partition with quorum
> Version: 1.0.8-**042548a451fce8400660f6031f4da6**f0223dd5dd
> 2 Nodes configured, 2 expected votes
> 11 Resources configured.
> ============
>
> OFFLINE: [ filer-2 filer-1 ]
>
> As far as I'm concerned killall -9 crmd will release used FDs. Does anyone
> has any idea how this will work? I tested killing crmd on another cluster
> (without this problem) and all resources were migrated to second node. What
> can possibly happen in this case where cluster communication is busted?
> Anyone ever dealt with similar problem? Resources are currently running on
> filer-1, a node which had been MASTER nefore this problem occurred.
>
> Packages:
>
> pacemaker - Version: 1.0.8+hg15494-2ubuntu2
> corosync - Version: 1.2.0-0ubuntu1
> cluster-glue - Version: 1.0.5-1
> libcorosync4 - Version: 1.2.0-0ubuntu1
> libheartbeat2 - Version: 1:3.0.3-1ubuntu1
>
> Any help/advice would be really appreciated :)
> --
> --
> Piotr Jewiec
>
> ______________________________**_________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/**mailman/listinfo/pacemaker<http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/**doc/Cluster_from_Scratch.pdf<http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20121207/6b0d1fee/attachment-0003.html>


More information about the Pacemaker mailing list