[Pacemaker] The problem with which queue between cib and stonith-ng overflows

Yusuke Iida yusk.iida at gmail.com
Tue Jun 3 23:57:14 EDT 2014


Hi, Andrew

I tested, where a patch is applied to the newest of 1.1 brunches.
metadata of fence_legacy came to be performed only once.
The time concerning device information construction was shortened at
about 2 seconds.
Therefore, the problem is solved.
It was satisfactory although the composition of 16 nodes was also checked.

I want this correction to be included in 1.1.12.

Thanks,
Yusuke

Jun  4 10:47:08 vm02 stonith-ng[2971]:     info:
update_fencing_topology: Re-initializing fencing topology after
top-level create operation
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info: stonith_level_remove:
Node vm01 not found (0 active entries)
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info:
stonith_level_register: Node vm01 has 1 active fencing levels
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info:
stonith_level_register: Node vm01 has 2 active fencing levels
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info: stonith_level_remove:
Node vm02 not found (1 active entries)
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info:
stonith_level_register: Node vm02 has 1 active fencing levels
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info:
stonith_level_register: Node vm02 has 2 active fencing levels
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info: stonith_level_remove:
Node vm03 not found (2 active entries)
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info:
stonith_level_register: Node vm03 has 1 active fencing levels
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info:
stonith_level_register: Node vm03 has 2 active fencing levels
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info: stonith_level_remove:
Node vm04 not found (3 active entries)
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info:
stonith_level_register: Node vm04 has 1 active fencing levels
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info:
stonith_level_register: Node vm04 has 2 active fencing levels
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info: stonith_level_remove:
Node vm05 not found (4 active entries)
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info:
stonith_level_register: Node vm05 has 1 active fencing levels
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info:
stonith_level_register: Node vm05 has 2 active fencing levels
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info: stonith_level_remove:
Node vm06 not found (5 active entries)
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info:
stonith_level_register: Node vm06 has 1 active fencing levels
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info:
stonith_level_register: Node vm06 has 2 active fencing levels
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info: stonith_level_remove:
Node vm07 not found (6 active entries)
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info:
stonith_level_register: Node vm07 has 1 active fencing levels
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info:
stonith_level_register: Node vm07 has 2 active fencing levels
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info: stonith_level_remove:
Node vm08 not found (7 active entries)
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info:
stonith_level_register: Node vm08 has 1 active fencing levels
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info:
stonith_level_register: Node vm08 has 2 active fencing levels
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info:
update_cib_stonith_devices_v2: Updating device list from the cib:
create resources
Jun  4 10:47:08 vm02 stonith-ng[2971]:     info: cib_devices_update:
Updating devices to version 0.4.0
Jun  4 10:47:08 vm02 stonith-ng[2971]:   notice: unpack_config: On
loss of CCM Quorum: Ignore
Jun  4 10:47:08 vm02 stonith-ng[2971]:  warning:
handle_startup_fencing: Blind faith: not fencing unseen nodes
Jun  4 10:47:08 vm02 stonith-ng[2971]:  warning:
handle_startup_fencing: Blind faith: not fencing unseen nodes
Jun  4 10:47:08 vm02 stonith-ng[2971]:  warning:
handle_startup_fencing: Blind faith: not fencing unseen nodes
Jun  4 10:47:08 vm02 stonith-ng[2971]:  warning:
handle_startup_fencing: Blind faith: not fencing unseen nodes
Jun  4 10:47:08 vm02 stonith-ng[2971]:  warning:
handle_startup_fencing: Blind faith: not fencing unseen nodes
Jun  4 10:47:08 vm02 stonith-ng[2971]:  warning:
handle_startup_fencing: Blind faith: not fencing unseen nodes
Jun  4 10:47:08 vm02 stonith-ng[2971]:  warning:
handle_startup_fencing: Blind faith: not fencing unseen nodes
Jun  4 10:47:08 vm02 stonith-ng[2971]:  warning:
handle_startup_fencing: Blind faith: not fencing unseen nodes
Jun  4 10:47:09 vm02 stonith-ng[2971]:   notice:
stonith_device_register: Added 'prmStonith_helper01' to the device
list (1 active devices)
Jun  4 10:47:09 vm02 stonith-ng[2971]:   notice:
stonith_device_register: Added 'prmStonith_libvirt01' to the device
list (2 active devices)
Jun  4 10:47:09 vm02 stonith-ng[2971]:     info: cib_device_update:
Device prmStonith_helper02 has been disabled on vm02: score=-INFINITY
Jun  4 10:47:09 vm02 stonith-ng[2971]:     info: cib_device_update:
Device prmStonith_libvirt02 has been disabled on vm02: score=-INFINITY
Jun  4 10:47:09 vm02 stonith-ng[2971]:   notice:
stonith_device_register: Added 'prmStonith_helper03' to the device
list (3 active devices)
Jun  4 10:47:09 vm02 stonith-ng[2971]:   notice:
stonith_device_register: Added 'prmStonith_libvirt03' to the device
list (4 active devices)
Jun  4 10:47:09 vm02 stonith-ng[2971]:   notice:
stonith_device_register: Added 'prmStonith_helper04' to the device
list (5 active devices)
Jun  4 10:47:09 vm02 stonith-ng[2971]:   notice:
stonith_device_register: Added 'prmStonith_libvirt04' to the device
list (6 active devices)
Jun  4 10:47:09 vm02 stonith-ng[2971]:   notice:
stonith_device_register: Added 'prmStonith_helper05' to the device
list (7 active devices)
Jun  4 10:47:09 vm02 stonith-ng[2971]:   notice:
stonith_device_register: Added 'prmStonith_libvirt05' to the device
list (8 active devices)
Jun  4 10:47:09 vm02 stonith-ng[2971]:   notice:
stonith_device_register: Added 'prmStonith_helper06' to the device
list (9 active devices)
Jun  4 10:47:09 vm02 stonith-ng[2971]:   notice:
stonith_device_register: Added 'prmStonith_libvirt06' to the device
list (10 active devices)
Jun  4 10:47:09 vm02 stonith-ng[2971]:   notice:
stonith_device_register: Added 'prmStonith_helper07' to the device
list (11 active devices)
Jun  4 10:47:09 vm02 stonith-ng[2971]:   notice:
stonith_device_register: Added 'prmStonith_libvirt07' to the device
list (12 active devices)
Jun  4 10:47:09 vm02 stonith-ng[2971]:   notice:
stonith_device_register: Added 'prmStonith_helper08' to the device
list (13 active devices)
Jun  4 10:47:09 vm02 stonith-ng[2971]:   notice:
stonith_device_register: Added 'prmStonith_libvirt08' to the device
list (14 active devices)

2014-06-04 8:18 GMT+09:00 Andrew Beekhof <andrew at beekhof.net>:
>
> On 4 Jun 2014, at 8:11 am, Andrew Beekhof <andrew at beekhof.net> wrote:
>
>>
>> On 3 Jun 2014, at 11:26 am, Yusuke Iida <yusk.iida at gmail.com> wrote:
>>
>>> Hi, Andrew
>>>
>>> About 15 seconds are the time taken in the whole device construction.
>>> I think that it cannot receive the message from cib during device
>>> construction since stonith-ng does not return to mainloop.
>>
>> I'm reasonably sure this is because we do synchronous metadata calls when a device is added.
>> I'll have a patch which creates a per-agent metadata cache (instead of per device) for you to test later today.
>
> Can you try this please:
>
>    http://paste.fedoraproject.org/106995/18374851
>
>>
>>>
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info: init_cib_cache_cb:
>>> Updating device list from the cib: init
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info: stonith_level_remove:
>>> Node vm01 not found (0 active entries)
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info:
>>> stonith_level_register: Node vm01 has 1 active fencing levels
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info:
>>> stonith_level_register: Node vm01 has 2 active fencing levels
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info: stonith_level_remove:
>>> Node vm02 not found (1 active entries)
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info:
>>> stonith_level_register: Node vm02 has 1 active fencing levels
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info:
>>> stonith_level_register: Node vm02 has 2 active fencing levels
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info: stonith_level_remove:
>>> Node vm03 not found (2 active entries)
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info:
>>> stonith_level_register: Node vm03 has 1 active fencing levels
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info:
>>> stonith_level_register: Node vm03 has 2 active fencing levels
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info: stonith_level_remove:
>>> Node vm04 not found (3 active entries)
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info:
>>> stonith_level_register: Node vm04 has 1 active fencing levels
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info:
>>> stonith_level_register: Node vm04 has 2 active fencing levels
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info: stonith_level_remove:
>>> Node vm05 not found (4 active entries)
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info:
>>> stonith_level_register: Node vm05 has 1 active fencing levels
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info:
>>> stonith_level_register: Node vm05 has 2 active fencing levels
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info: stonith_level_remove:
>>> Node vm06 not found (5 active entries)
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info:
>>> stonith_level_register: Node vm06 has 1 active fencing levels
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info:
>>> stonith_level_register: Node vm06 has 2 active fencing levels
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info: stonith_level_remove:
>>> Node vm07 not found (6 active entries)
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info:
>>> stonith_level_register: Node vm07 has 1 active fencing levels
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info:
>>> stonith_level_register: Node vm07 has 2 active fencing levels
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info: stonith_level_remove:
>>> Node vm08 not found (7 active entries)
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info:
>>> stonith_level_register: Node vm08 has 1 active fencing levels
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info:
>>> stonith_level_register: Node vm08 has 2 active fencing levels
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:     info: cib_devices_update:
>>> Updating devices to version 0.4.937
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:   notice: unpack_config: On
>>> loss of CCM Quorum: Ignore
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:  warning:
>>> handle_startup_fencing: Blind faith: not fencing unseen nodes
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:  warning:
>>> handle_startup_fencing: Blind faith: not fencing unseen nodes
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:  warning:
>>> handle_startup_fencing: Blind faith: not fencing unseen nodes
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:  warning:
>>> handle_startup_fencing: Blind faith: not fencing unseen nodes
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:  warning:
>>> handle_startup_fencing: Blind faith: not fencing unseen nodes
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:  warning:
>>> handle_startup_fencing: Blind faith: not fencing unseen nodes
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:  warning:
>>> handle_startup_fencing: Blind faith: not fencing unseen nodes
>>> Jun  2 11:34:02 vm04 stonith-ng[4891]:  warning:
>>> handle_startup_fencing: Blind faith: not fencing unseen nodes
>>> Jun  2 11:34:03 vm04 stonith-ng[4891]:   notice:
>>> stonith_device_register: Added 'prmStonith_helper01' to the device
>>> list (1 active devices)
>>> Jun  2 11:34:04 vm04 stonith-ng[4891]:   notice:
>>> stonith_device_register: Added 'prmStonith_libvirt01' to the device
>>> list (2 active devices)
>>> Jun  2 11:34:05 vm04 stonith-ng[4891]:   notice:
>>> stonith_device_register: Added 'prmStonith_helper02' to the device
>>> list (3 active devices)
>>> Jun  2 11:34:06 vm04 stonith-ng[4891]:   notice:
>>> stonith_device_register: Added 'prmStonith_libvirt02' to the device
>>> list (4 active devices)
>>> Jun  2 11:34:07 vm04 stonith-ng[4891]:   notice:
>>> stonith_device_register: Added 'prmStonith_helper03' to the device
>>> list (5 active devices)
>>> Jun  2 11:34:08 vm04 stonith-ng[4891]:   notice:
>>> stonith_device_register: Added 'prmStonith_libvirt03' to the device
>>> list (6 active devices)
>>> Jun  2 11:34:08 vm04 stonith-ng[4891]:     info: cib_device_update:
>>> Device prmStonith_helper04 has been disabled on vm04: score=-INFINITY
>>> Jun  2 11:34:08 vm04 stonith-ng[4891]:     info: cib_device_update:
>>> Device prmStonith_libvirt04 has been disabled on vm04: score=-INFINITY
>>> Jun  2 11:34:09 vm04 stonith-ng[4891]:   notice:
>>> stonith_device_register: Added 'prmStonith_helper05' to the device
>>> list (7 active devices)
>>> Jun  2 11:34:10 vm04 stonith-ng[4891]:   notice:
>>> stonith_device_register: Added 'prmStonith_libvirt05' to the device
>>> list (8 active devices)
>>> Jun  2 11:34:11 vm04 stonith-ng[4891]:   notice:
>>> stonith_device_register: Added 'prmStonith_helper06' to the device
>>> list (9 active devices)
>>> Jun  2 11:34:12 vm04 stonith-ng[4891]:   notice:
>>> stonith_device_register: Added 'prmStonith_libvirt06' to the device
>>> list (10 active devices)
>>> Jun  2 11:34:13 vm04 stonith-ng[4891]:   notice:
>>> stonith_device_register: Added 'prmStonith_helper07' to the device
>>> list (11 active devices)
>>> Jun  2 11:34:14 vm04 stonith-ng[4891]:   notice:
>>> stonith_device_register: Added 'prmStonith_libvirt07' to the device
>>> list (12 active devices)
>>> Jun  2 11:34:15 vm04 stonith-ng[4891]:   notice:
>>> stonith_device_register: Added 'prmStonith_helper08' to the device
>>> list (13 active devices)
>>> Jun  2 11:34:16 vm04 stonith-ng[4891]:   notice:
>>> stonith_device_register: Added 'prmStonith_libvirt08' to the device
>>> list (14 active devices)
>>>
>>> Regards,
>>> Yusuke
>>>
>>> 2014-06-02 20:31 GMT+09:00 Andrew Beekhof <andrew at beekhof.net>:
>>>>
>>>> On 2 Jun 2014, at 3:05 pm, Yusuke Iida <yusk.iida at gmail.com> wrote:
>>>>
>>>>> Hi, Andrew
>>>>>
>>>>> I use the newest of 1.1 brunches and am testing by eight sets of nodes.
>>>>>
>>>>> Although the problem was settled once,
>>>>> Now, the problem with which queue overflows between cib and stonithd
>>>>> has recurred.
>>>>>
>>>>> As an example, I paste the log of the DC node.
>>>>> The problem is occurring on all nodes.
>>>>>
>>>>> Jun  2 11:34:02 vm04 cib[3940]:    error: crm_ipcs_flush_events:
>>>>> Evicting slow client 0x250afe0[3941]: event queue reached 638 entries
>>>>> Jun  2 11:34:02 vm04 stonith-ng[3941]:    error: crm_ipc_read:
>>>>> Connection to cib_rw failed
>>>>> Jun  2 11:34:02 vm04 stonith-ng[3941]:    error:
>>>>> mainloop_gio_callback: Connection to cib_rw[0x662510] closed (I/O
>>>>> condition=17)
>>>>> Jun  2 11:34:02 vm04 stonith-ng[3941]:   notice:
>>>>> cib_connection_destroy: Connection to the CIB terminated. Shutting
>>>>> down.
>>>>> Jun  2 11:34:02 vm04 stonith-ng[3941]:     info: stonith_shutdown:
>>>>> Terminating with  2 clients
>>>>> Jun  2 11:34:02 vm04 stonith-ng[3941]:     info: qb_ipcs_us_withdraw:
>>>>> withdrawing server sockets
>>>>>
>>>>> After loading a resource setup, time for stonithd to build device
>>>>> information is long.
>>>>> It has taken the time for about about 15 seconds.
>>>>
>>>> 15 seconds!! Yikes. I'll investigate tomorrow.
>>>>
>>>>> It seems that the diff message of cib accumulates between them.
>>>>>
>>>>> Are there any plans to improve on this issue?
>>>>>
>>>>> I attach a report when a problem occurs.
>>>>> https://drive.google.com/file/d/0BwMFJItoO-fVUEFEN1NlelNWRjg/edit?usp=sharing
>>>>>
>>>>> Regards,
>>>>> Yusuke
>>>>> --
>>>>> ----------------------------------------
>>>>> METRO SYSTEMS CO., LTD
>>>>>
>>>>> Yusuke Iida
>>>>> Mail: yusk.iida at gmail.com
>>>>> ----------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>>
>>> --
>>> ----------------------------------------
>>> METRO SYSTEMS CO., LTD
>>>
>>> Yusuke Iida
>>> Mail: yusk.iida at gmail.com
>>> ----------------------------------------
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
----------------------------------------
METRO SYSTEMS CO., LTD

Yusuke Iida
Mail: yusk.iida at gmail.com
----------------------------------------




More information about the Pacemaker mailing list