[ClusterLabs] Antw: Troubleshooting Faulty Networks / Heartbeat Rings
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Wed Oct 26 15:54:17 CEST 2016
>>> Martin Schlegel <martin at nuboreto.org> schrieb am 26.10.2016 um 13:55 in
Nachricht
<1875006565.5761.eadc80df-ed0f-4dcf-bc75-89991bd8c2a1.open-xchange at email.1und1.d
>:
> Hello all
>
> One one of our test clusters the network seems to be dropping messages at
> different times of the day - we know it was not a network latency issue. We
> could prove it via iperf - a local network test utility.
>
> However, I wish there was some more detailed logs than the retransmit log
> messages we are seeing. Even with debug enabled in Corosync it was next to
> impossible for me to get confirmation from the logs about what is causing it
> and
> how it affects the heartbeat ring.
>
> How can I can track the heartbeat ring in action using time stamps to first
> understand how it operates in detail and finally to tune it's configuration
> parameters and trouble shoot it adequately ?
>
> It seems there is little documentation on this topic (besides the source
> code).
> Could somebody please point me to some useful sources of information ?
The best thing I ever found was corosync-blackbox ;-)
>
>
> Regards,
> Martin Schlegel
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list