[ClusterLabs] Pacemake/Corosync good fit for embedded product?

Wed Apr 11 10:56:36 EDT 2018

On 04/11/2018 10:44 AM, Jan Friesse wrote:
> David,
>
>> Hi,
>>
>> We are planning on creating a HA product in an active/standby
>> configuration
>> whereby the standby unit needs to take over from the active unit very
>> fast
>> (<50ms including all services restored).
>>
>> We are able to do very fast signaling (say 1000Hz) between the two
>> units to
>> detect failures so detecting a failure isn't really an issue.
>>
>> Pacemaker looks to be a very useful piece of software for managing
>> resources so rather than roll our own it would make sense to reuse
>> pacemaker.
>>
>> So my initial questions are:
>>
>>     1. Do people think pacemaker is the right thing to use? Everything I
>>     read seem to be talking about multiple seconds for failure
>> detection etc.
>>     Feature wise it looks pretty similar to what we would want.
>>     2. Has anyone done anything similar to this?
>>     3. Any pointers on where/how to add additional failure detection
>> inputs
>>     to pacemaker?
>>     4.
>>     5. For a new design would you go with pacemaker+corosync,
>>     pacemaker+corosync+knet or something different?
>>
>
>
> I will just share my point of view about Corosync side.
>
> Corosync is using it's own mechanism for detecting failure, based on
> token rotation. Default timeout for detecting lost of token is 1
> second, so detecting failure takes hugely more than 50ms. It can be
> lowered, but that is not really tested.
>
> That means it's not currently possible to use different signaling
> mechanism without significant Corosync change.
>
> So I don't think Corosync can be really used for described scenario.
>
> Honza

On the other hand if a fail-over is triggered by loosing a node or anything
that is being detected by corosync this is probably already the fast-path
in a pacemaker-cluster.

Detection of other types of failures (like a resource failing on
an otherwise functional node) is probably even way slower.
When a failure is detected by corosync, pacemaker has some kind of
an event driven way to react on that.
We even have to add some delay to the mere corosync detection time
mentioned by Honza as pacemaker will have to run e.g. a selection
cycle for the designated coordinator to be able to do decisions again.

For other failures the base principle is rather probing a resource at a
fixed rate (multiple seconds usually) for detection of failures instead
of an event-driven mechanism.
There might be trickery possible though using attributes to achieve
event-driven-like reaction on certain failures. But I haven't done
anything concrete to exploit these possibilities. Others might have
more info (which I personally would be interested in as well ;-) ).

Approaches to realize event-driven mechanisms for resource-failure-
detection are under investigation/development (systemd-resources,
IP resources sitting on interfaces, ...) but afaik there is nothing
available out of the box by now.

Having that all said I can add some personal experiences from
having implemented an embedded product based on a
pacemaker-cluster myself in the past:

As reaction time based on pacemaker would be too slow for e.g.
many communication-protocols (e.g. things like SIP) or realtime-
streams it seems advisable to solve these issues on the
application-layer inside a service (respectively distributed service
in a cluster).
Pacemaker and it's decision engine can then be used to bring
up this distributed service in a cluster in some kind of an ordered
way.
Any additional services that might be less demanding regarding
switch-over timeout can be made available via pacemaker
directly.

Otherwise pacemaker configuration is very flexible so that you
can implement merely anything. It might be advisable to avoid
certain approaches which are common in cases where a cluster
is operated by somebody who can be informed quickly and
has to react under certain SLAs. Thinking of e.g. fencing a node
to be switched off instead of rebooting it might not be desirable
with kind of an appliance that is expected to just sit there and
work without merely any admin effort/expense at all.
But that is of course just an example and configuration (incl.
configuration concept) has to be tailored to your requirements.

Regards,
Klaus

>
>>
>> Thanks
>>
>> David
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org