[Pacemaker] The problem with which queue between cib and stonith-ng overflows
Andrew Beekhof
andrew at beekhof.net
Wed Jun 4 01:18:44 CEST 2014
On 4 Jun 2014, at 8:11 am, Andrew Beekhof <andrew at beekhof.net> wrote:
>
> On 3 Jun 2014, at 11:26 am, Yusuke Iida <yusk.iida at gmail.com> wrote:
>
>> Hi, Andrew
>>
>> About 15 seconds are the time taken in the whole device construction.
>> I think that it cannot receive the message from cib during device
>> construction since stonith-ng does not return to mainloop.
>
> I'm reasonably sure this is because we do synchronous metadata calls when a device is added.
> I'll have a patch which creates a per-agent metadata cache (instead of per device) for you to test later today.
Can you try this please:
http://paste.fedoraproject.org/106995/18374851
>
>>
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info: init_cib_cache_cb:
>> Updating device list from the cib: init
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info: stonith_level_remove:
>> Node vm01 not found (0 active entries)
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info:
>> stonith_level_register: Node vm01 has 1 active fencing levels
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info:
>> stonith_level_register: Node vm01 has 2 active fencing levels
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info: stonith_level_remove:
>> Node vm02 not found (1 active entries)
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info:
>> stonith_level_register: Node vm02 has 1 active fencing levels
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info:
>> stonith_level_register: Node vm02 has 2 active fencing levels
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info: stonith_level_remove:
>> Node vm03 not found (2 active entries)
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info:
>> stonith_level_register: Node vm03 has 1 active fencing levels
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info:
>> stonith_level_register: Node vm03 has 2 active fencing levels
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info: stonith_level_remove:
>> Node vm04 not found (3 active entries)
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info:
>> stonith_level_register: Node vm04 has 1 active fencing levels
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info:
>> stonith_level_register: Node vm04 has 2 active fencing levels
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info: stonith_level_remove:
>> Node vm05 not found (4 active entries)
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info:
>> stonith_level_register: Node vm05 has 1 active fencing levels
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info:
>> stonith_level_register: Node vm05 has 2 active fencing levels
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info: stonith_level_remove:
>> Node vm06 not found (5 active entries)
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info:
>> stonith_level_register: Node vm06 has 1 active fencing levels
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info:
>> stonith_level_register: Node vm06 has 2 active fencing levels
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info: stonith_level_remove:
>> Node vm07 not found (6 active entries)
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info:
>> stonith_level_register: Node vm07 has 1 active fencing levels
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info:
>> stonith_level_register: Node vm07 has 2 active fencing levels
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info: stonith_level_remove:
>> Node vm08 not found (7 active entries)
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info:
>> stonith_level_register: Node vm08 has 1 active fencing levels
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info:
>> stonith_level_register: Node vm08 has 2 active fencing levels
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: info: cib_devices_update:
>> Updating devices to version 0.4.937
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: notice: unpack_config: On
>> loss of CCM Quorum: Ignore
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: warning:
>> handle_startup_fencing: Blind faith: not fencing unseen nodes
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: warning:
>> handle_startup_fencing: Blind faith: not fencing unseen nodes
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: warning:
>> handle_startup_fencing: Blind faith: not fencing unseen nodes
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: warning:
>> handle_startup_fencing: Blind faith: not fencing unseen nodes
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: warning:
>> handle_startup_fencing: Blind faith: not fencing unseen nodes
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: warning:
>> handle_startup_fencing: Blind faith: not fencing unseen nodes
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: warning:
>> handle_startup_fencing: Blind faith: not fencing unseen nodes
>> Jun 2 11:34:02 vm04 stonith-ng[4891]: warning:
>> handle_startup_fencing: Blind faith: not fencing unseen nodes
>> Jun 2 11:34:03 vm04 stonith-ng[4891]: notice:
>> stonith_device_register: Added 'prmStonith_helper01' to the device
>> list (1 active devices)
>> Jun 2 11:34:04 vm04 stonith-ng[4891]: notice:
>> stonith_device_register: Added 'prmStonith_libvirt01' to the device
>> list (2 active devices)
>> Jun 2 11:34:05 vm04 stonith-ng[4891]: notice:
>> stonith_device_register: Added 'prmStonith_helper02' to the device
>> list (3 active devices)
>> Jun 2 11:34:06 vm04 stonith-ng[4891]: notice:
>> stonith_device_register: Added 'prmStonith_libvirt02' to the device
>> list (4 active devices)
>> Jun 2 11:34:07 vm04 stonith-ng[4891]: notice:
>> stonith_device_register: Added 'prmStonith_helper03' to the device
>> list (5 active devices)
>> Jun 2 11:34:08 vm04 stonith-ng[4891]: notice:
>> stonith_device_register: Added 'prmStonith_libvirt03' to the device
>> list (6 active devices)
>> Jun 2 11:34:08 vm04 stonith-ng[4891]: info: cib_device_update:
>> Device prmStonith_helper04 has been disabled on vm04: score=-INFINITY
>> Jun 2 11:34:08 vm04 stonith-ng[4891]: info: cib_device_update:
>> Device prmStonith_libvirt04 has been disabled on vm04: score=-INFINITY
>> Jun 2 11:34:09 vm04 stonith-ng[4891]: notice:
>> stonith_device_register: Added 'prmStonith_helper05' to the device
>> list (7 active devices)
>> Jun 2 11:34:10 vm04 stonith-ng[4891]: notice:
>> stonith_device_register: Added 'prmStonith_libvirt05' to the device
>> list (8 active devices)
>> Jun 2 11:34:11 vm04 stonith-ng[4891]: notice:
>> stonith_device_register: Added 'prmStonith_helper06' to the device
>> list (9 active devices)
>> Jun 2 11:34:12 vm04 stonith-ng[4891]: notice:
>> stonith_device_register: Added 'prmStonith_libvirt06' to the device
>> list (10 active devices)
>> Jun 2 11:34:13 vm04 stonith-ng[4891]: notice:
>> stonith_device_register: Added 'prmStonith_helper07' to the device
>> list (11 active devices)
>> Jun 2 11:34:14 vm04 stonith-ng[4891]: notice:
>> stonith_device_register: Added 'prmStonith_libvirt07' to the device
>> list (12 active devices)
>> Jun 2 11:34:15 vm04 stonith-ng[4891]: notice:
>> stonith_device_register: Added 'prmStonith_helper08' to the device
>> list (13 active devices)
>> Jun 2 11:34:16 vm04 stonith-ng[4891]: notice:
>> stonith_device_register: Added 'prmStonith_libvirt08' to the device
>> list (14 active devices)
>>
>> Regards,
>> Yusuke
>>
>> 2014-06-02 20:31 GMT+09:00 Andrew Beekhof <andrew at beekhof.net>:
>>>
>>> On 2 Jun 2014, at 3:05 pm, Yusuke Iida <yusk.iida at gmail.com> wrote:
>>>
>>>> Hi, Andrew
>>>>
>>>> I use the newest of 1.1 brunches and am testing by eight sets of nodes.
>>>>
>>>> Although the problem was settled once,
>>>> Now, the problem with which queue overflows between cib and stonithd
>>>> has recurred.
>>>>
>>>> As an example, I paste the log of the DC node.
>>>> The problem is occurring on all nodes.
>>>>
>>>> Jun 2 11:34:02 vm04 cib[3940]: error: crm_ipcs_flush_events:
>>>> Evicting slow client 0x250afe0[3941]: event queue reached 638 entries
>>>> Jun 2 11:34:02 vm04 stonith-ng[3941]: error: crm_ipc_read:
>>>> Connection to cib_rw failed
>>>> Jun 2 11:34:02 vm04 stonith-ng[3941]: error:
>>>> mainloop_gio_callback: Connection to cib_rw[0x662510] closed (I/O
>>>> condition=17)
>>>> Jun 2 11:34:02 vm04 stonith-ng[3941]: notice:
>>>> cib_connection_destroy: Connection to the CIB terminated. Shutting
>>>> down.
>>>> Jun 2 11:34:02 vm04 stonith-ng[3941]: info: stonith_shutdown:
>>>> Terminating with 2 clients
>>>> Jun 2 11:34:02 vm04 stonith-ng[3941]: info: qb_ipcs_us_withdraw:
>>>> withdrawing server sockets
>>>>
>>>> After loading a resource setup, time for stonithd to build device
>>>> information is long.
>>>> It has taken the time for about about 15 seconds.
>>>
>>> 15 seconds!! Yikes. I'll investigate tomorrow.
>>>
>>>> It seems that the diff message of cib accumulates between them.
>>>>
>>>> Are there any plans to improve on this issue?
>>>>
>>>> I attach a report when a problem occurs.
>>>> https://drive.google.com/file/d/0BwMFJItoO-fVUEFEN1NlelNWRjg/edit?usp=sharing
>>>>
>>>> Regards,
>>>> Yusuke
>>>> --
>>>> ----------------------------------------
>>>> METRO SYSTEMS CO., LTD
>>>>
>>>> Yusuke Iida
>>>> Mail: yusk.iida at gmail.com
>>>> ----------------------------------------
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>>
>> --
>> ----------------------------------------
>> METRO SYSTEMS CO., LTD
>>
>> Yusuke Iida
>> Mail: yusk.iida at gmail.com
>> ----------------------------------------
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140604/a8511545/attachment.sig>
More information about the Pacemaker
mailing list