[Pacemaker] The problem with which queue between cib and stonith-ng overflows

Mon Jun 2 07:05:27 CEST 2014

Hi, Andrew

I use the newest of 1.1 brunches and am testing by eight sets of nodes.

Although the problem was settled once,
Now, the problem with which queue overflows between cib and stonithd
has recurred.

As an example, I paste the log of the DC node.
The problem is occurring on all nodes.

Jun  2 11:34:02 vm04 cib[3940]:    error: crm_ipcs_flush_events:
Evicting slow client 0x250afe0[3941]: event queue reached 638 entries
Jun  2 11:34:02 vm04 stonith-ng[3941]:    error: crm_ipc_read:
Connection to cib_rw failed
Jun  2 11:34:02 vm04 stonith-ng[3941]:    error:
mainloop_gio_callback: Connection to cib_rw[0x662510] closed (I/O
condition=17)
Jun  2 11:34:02 vm04 stonith-ng[3941]:   notice:
cib_connection_destroy: Connection to the CIB terminated. Shutting
down.
Jun  2 11:34:02 vm04 stonith-ng[3941]:     info: stonith_shutdown:
Terminating with  2 clients
Jun  2 11:34:02 vm04 stonith-ng[3941]:     info: qb_ipcs_us_withdraw:
withdrawing server sockets

After loading a resource setup, time for stonithd to build device
information is long.
It has taken the time for about about 15 seconds.
It seems that the diff message of cib accumulates between them.

Are there any plans to improve on this issue?

I attach a report when a problem occurs.
https://drive.google.com/file/d/0BwMFJItoO-fVUEFEN1NlelNWRjg/edit?usp=sharing

Regards,
Yusuke
-- 
----------------------------------------
METRO SYSTEMS CO., LTD

Yusuke Iida
Mail: yusk.iida at gmail.com
----------------------------------------