[Pacemaker] SBD Fencing daemon: explain me more clear

Aleksey Zholdak aleksey at zholdak.com
Tue Jun 15 09:32:12 EDT 2010


>> Can anybody explain me more clear than on official and (IMHO)
>> outdated page http://www.linux-ha.org/wiki/SBD_Fencing next:
>>
>> What timeouts I must specify, if my multipath needs from 90 to 160
>> secs to be switched off the dead path... Timeouts below are maybe
>> wrong because sometime node1 kills node2 (or vice versa) or some
>> node makes suicide...
>>
>>> Timeout (watchdog) : 90
>>> Timeout (allocate) : 2
>>> Timeout (loop)     : 10
>>> Timeout (msgwait)  : 180
>>
>> And what logic in the calculation of the above timeouts?
>
> Well, 90-160s is a very long time; that effectively could make SBD
> unusable in your environment, basically you're introducing a delay of at
> least 160s on each fail-over. (At least with the current sbd
> implementation.)
>
> You need to increase the watchdog timeout to>160s - probably 180s
> should be good in your environment, if you completely want to eliminate
> spurious self-fencing.
>
> msgwait should be larger than watchdog timeout; so probably 200s, which
> will imply a 200s latency on fail-over.
>
> You may want to make the timeouts lower, leading to a faster fail-over,
> since the work-load is paused during the MPIO downtime too I assume, so
> fail-over may actually be faster than waiting for MPIO to recover.
So, that we have:

 > Timeout (watchdog) : 180
 > Timeout (allocate) : 2
 > Timeout (loop)     : 10
 > Timeout (msgwait)  : 200

But I see, that node1 resets node2 (or vice versa, or each other) when it 
does not update its slot for 10 seconds...

> But with a ~160s MPIO latency, I'd personally be wary to use sbd
> fencing.
Hm...

 > Why is the MPIO scenario so slow?
These questions needs to be asked to developers mptsas (novell + hp)

-- 
Aleksey




More information about the Pacemaker mailing list