[Pacemaker] [Openais] Pacemaker on OpenAIS, RRP, and link failure
Steven Dake
sdake at redhat.com
Thu Jun 4 12:32:21 EDT 2009
On Thu, 2009-06-04 at 18:30 +0200, Lars Marowsky-Bree wrote:
> On 2009-06-04T09:23:04, Steven Dake <sdake at redhat.com> wrote:
>
> > The problem with checking the link status with the current code is that
> > the protocol blocks I/O waiting for a response from the failed ring.
> > This could of course be modified to behave differently.
>
> Right, so the rechecking could possibly be a separate thread, sending an
> occasional liveness packet on the failed ring and trigger the RRP
> recovery after it has heard from other nodes on it?
Well I prefer totem to remain nonthreaded except for encrypted xmit
operations, but in general, that is the basic idea.
> Some smarts would be needed of course to not constantly retrigger
> partially active rings (which would fail again immediately).
>
> > So the act of failing a link is expensive and we dont want to retest
> > that it is valid very often.
>
> Does "expensive" mean that it'll actually slow down the healthy
> ring(s)?
>
At the moment it blocks until the problem counter reaches the threshold
at which point the ring is declared failed and normal communication
continues.
>
> Regards,
> Lars
>
More information about the Pacemaker
mailing list