[ClusterLabs] Pacemake/Corosync good fit for embedded product?

Wed Apr 11 04:44:17 EDT 2018

David,

> Hi,
> 
> We are planning on creating a HA product in an active/standby configuration
> whereby the standby unit needs to take over from the active unit very fast
> (<50ms including all services restored).
> 
> We are able to do very fast signaling (say 1000Hz) between the two units to
> detect failures so detecting a failure isn't really an issue.
> 
> Pacemaker looks to be a very useful piece of software for managing
> resources so rather than roll our own it would make sense to reuse
> pacemaker.
> 
> So my initial questions are:
> 
>     1. Do people think pacemaker is the right thing to use? Everything I
>     read seem to be talking about multiple seconds for failure detection etc.
>     Feature wise it looks pretty similar to what we would want.
>     2. Has anyone done anything similar to this?
>     3. Any pointers on where/how to add additional failure detection inputs
>     to pacemaker?
>     4.
>     5. For a new design would you go with pacemaker+corosync,
>     pacemaker+corosync+knet or something different?
> 

I will just share my point of view about Corosync side.

Corosync is using it's own mechanism for detecting failure, based on 
token rotation. Default timeout for detecting lost of token is 1 second, 
so detecting failure takes hugely more than 50ms. It can be lowered, but 
that is not really tested.

That means it's not currently possible to use different signaling 
mechanism without significant Corosync change.

So I don't think Corosync can be really used for described scenario.

Honza

> 
> Thanks
> 
> David
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>