[Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?
Andrew Beekhof
andrew at beekhof.net
Fri Mar 7 02:43:20 UTC 2014
On 26 Feb 2014, at 5:25 pm, yusuke iida <yusk.iida at gmail.com> wrote:
> Hi, Andrew
>
> 2014-02-21 10:47 GMT+09:00 Andrew Beekhof <andrew at beekhof.net>:
>>
>> On 20 Feb 2014, at 8:39 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>>
>>> Hi, Andrew
>>>
>>> 2014-02-20 17:28 GMT+09:00 Andrew Beekhof <andrew at beekhof.net>:
>>>> Who was pid 16243?
>>>> Doesn't look like a pacemaker daemon.
>>> pid 16243 is crm_mon.
>>
>> That means that the state displayed by crm_mon was > 500 updates behind.
>> At that point, what its displaying is horribly out of date and evicting it seems like a pretty good idea.
>>
>>> In vm01, crm_mon was started and the state was checked.
>>>
>>> If there is information required for analysis to other, I get it.
>>
>> Some idea of what crm_mon is doing would be a good start.
>> Adding a few -V options in addition to --disable-ncurses might be the best approach.
> Run the following command, I get a log of crm_mon.
> crm_mon -VVVV --disable-ncurses >crm_mon.log 2>&1
> I attach it.
>
> BTW,
> I checked operation with the application of the following patches you made.
> https://github.com/beekhof/pacemaker/commit/4002e4ab6a50ceb44e484613f2abd33e490492a7
>
> The load of stonithd fell and queue stopped generating overflow.
> This patch looks very effective.
>
> Is it possible to implement the crm_mon a process similar to this?
I don't understand... crm_mon doesn't look for changes to resources or constraints and it should already be using the new faster diff format.
[/me reads attachment]
Ah, but perhaps I do understand afterall :-)
This is repeated over and over:
notice: crm_diff_update: [cib_diff_notify] Patch aborted: Application of an update diff failed (-206)
notice: xml_patch_version_check: Current num_updates is too high (885 > 67)
That would certainly drive up CPU usage and cause crm_mon to get left behind.
Happily the fix for that should be: https://github.com/beekhof/pacemaker/commit/6c33820
>
> Regards,
> Yusuke
>>
>>>
>>> Regards,
>>> Yusuke
>>>>
>>>>>
>>>>> Overflow of queue of vm09 has taken place between cib and stonithd.
>>>>> Feb 20 14:20:22 [15519] vm09 cib: ( ipc.c:506 )
>>>>> trace: crm_ipcs_flush_events: Sent 36 events (530 remaining) for
>>>>> 0x105ec10[15520]: Resource temporarily unavailable (-11)
>>>>> Feb 20 14:20:22 [15519] vm09 cib: ( ipc.c:515 )
>>>>> error: crm_ipcs_flush_events: Evicting slow client 0x105ec10[15520]:
>>>>> event queue reached 530 entries
>>>>>
>>>>> Although I checked the code of the problem part, it was not understood
>>>>> by which it would be solved.
>>>>>
>>>>> Is it less likelihood of sending a message of 100 at a time?
>>>>> Does calculation of the waiting time after message transmission have a problem?
>>>>> Threshold of 500 may be too low?
>>>>
>>>> being 500 behind is really quite a long way.
>>>
>>>
>>>
>>>
>>> --
>>> ----------------------------------------
>>> METRO SYSTEMS CO., LTD
>>>
>>> Yusuke Iida
>>> Mail: yusk.iida at gmail.com
>>> ----------------------------------------
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
>
> --
> ----------------------------------------
> METRO SYSTEMS CO., LTD
>
> Yusuke Iida
> Mail: yusk.iida at gmail.com
> ----------------------------------------
> <crm_mon.log>_______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140307/e3062902/attachment-0004.sig>
More information about the Pacemaker
mailing list