[ClusterLabs] Antw: SBD Latency Warnings

Mon Jan 11 07:38:18 UTC 2016

>>> Jorge Fábregas <jorge.fabregas at gmail.com> schrieb am 30.12.2015 um 17:53
in
Nachricht <56840C21.1050209 at gmail.com>:
> Hi,
> 
> We're having some issues with a particular oversubscribed hypervisor
> (cpu-wise) where we run SLES 11 SP4 guests.  I had to increase many
> timeouts on the cluster to cope with this:

Hi!

(I'm late)

As Kai pointed out, Domain-0 will be scheduled like any Dom-U, so either never
oversubscribe CPUs, or reserve a few CPUs for Domain-0. See Domain-0 as virtual
I/O server; then it's obvious that the I/O server needs CPU cycles for guest
I/O to complete.

Regards,
Ulrich

> 
> - Corosync's token timeout (from the default of 5 secs to 30 seconds)
> - SBD's watchdog & msgwait (from 15/30 to 30/60 respectively)
> - Pacemaker's resource-monitoring timeouts
> 
> I know the consequence for doing all this will be *slow reaction times*
>  but it's all I can do in the meantime.
> 
> However, when the hypervisor is at 100% full CPU utilization I still get
> these messages:
> 
> sbd: :WARN: Latency: 4 exceeded threshold 3 on disk /dev/mapper/clustersbd
> logd: WARN: G_CH_prepare_int: working on IPC channel took 220 ms (> 100 ms)
> sbd: WARN: Pacemaker state outdated (age: 4)
> sbd: info: Pacemaker health check: OK
> sbd: WARN: Latency: 4 exceeded threshold 3 on disk /dev/mapper/clustersbd
> logd: WARN: G_CH_check_int: working on IPC channel took 150 ms (> 100 ms)
> sbd: WARN: Latency: 4 exceeded threshold 3 on disk /dev/mapper/clustersbd
> sbd: WARN: Servant for /dev/mapper/clustersbd outdated (age: 5)
> sbd: WARN: Majority of devices lost - surviving on pacemaker
> 
> Is this latency configurable? It keeps mentioning "threshold 3". Is that
> 3 seconds? How does it relates to the following parameters ?
> 
> ==Dumping header on disk /dev/mapper/clustersbd
> Header version     : 2.1
> UUID               : 54597871-2392-475f-ba2d-71bdf92c36b5
> Number of slots    : 255
> Sector size        : 512
> Timeout (watchdog) : 30
> Timeout (allocate) : 2
> Timeout (loop)     : 1
> Timeout (msgwait)  : 60
> ==Header on disk /dev/mapper/clustersbd is dumped
> 
> I'm using the -P option with sbd so I know it will not fence the system
> as long as the node's health is ok (as reported by Pacemaker).  I'd
> still like to find out if the latency mentioned is configurable or is it
> safe to ignore.
> 
> Thanks!
> 
> Regards,
> Jorge
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org