[Pacemaker] change sbd watchdog timeout in a running cluster

Tue Mar 26 16:24:58 UTC 2013

On 2013-03-26T17:13:34, emmanuel segura <emi2fast at gmail.com> wrote:

> Hello Lars
> 
> Because we have a vm(suse 11) cluster on a esx cluster, as datastore we are
> using a netapp in cluster, the last night we had a netapp failover, no
> problem with other vm servers, but all vm in cluster with pacemaker+sbd get
> has rebooted
> 
> This beacuse the watchdog time is 5 seconds

To protect against that, you should use multiple disks. As long as the
majority of them remains within the latency limits, you will not
experience a fail-over.

Admittedly, 5s is on the short side for these. But 90s for watchdog
means you'll end up with 120+ seconds for msgwait, meaning all
fail-overs will be delayed accordingly. That's not going to be helpful.

And yes, you need to increase stonith-timeout to be approx. 50% larger
than msgwait, at least.

Regards,
    Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde