[Pacemaker] [Openais] Pacemaker on OpenAIS, RRP, and link failure

Thu Jun 4 16:23:04 UTC 2009

On Thu, 2009-06-04 at 17:54 +0200, Lars Marowsky-Bree wrote:
> On 2009-05-26T12:50:34, Andrew Beekhof <andrew at beekhof.net> wrote:
> 
> > >> try all the time also after failure like was done before failure.
> > >
> > > Complete Totem amateur behind the keyboard, but I'd second that. Since
> > > you're constantly checking the link status while it's up, why not keep
> > > doing so after it's gone down, to see if it has recovered?
> > 
> > Perhaps even at a decreased (user configurable) interval/rate.
> 
> I think that was actually discussed on the openais list and on IRC in
> the past and never completely explained why it wouldn't work ;-)
> 
> 
> 

The problem with checking the link status with the current code is that
the protocol blocks I/O waiting for a response from the failed ring.
This could of course be modified to behave differently.  So the act of
failing a link is expensive and we dont want to retest that it is valid
very often.  The obvious solution to this is to redesign the protocol to
not have this constraint.  No patch has been written and I don't have
time to do such work at the present time.

Regards
-steve

> Regards,
>     Lars
>