[Pacemaker] Speeding up failover

Andrew Beekhof andrew at beekhof.net
Wed Aug 7 03:11:12 EDT 2013


On 25/07/2013, at 4:31 PM, Devdas Bhagat <devdas.bhagat at booking.com> wrote:

> We have a master-slave setup for Redis, running 6 instances of Redis on
> each physical host, and one floating IP between them.
> 
> Each redis instance is part of a single group.
> 
> When we fail over the IP in production, I'm observing this sequence of
> events:
> Pacemaker brings down the floating IP
> Pacemaker demotes the master redis instance
> Pacemaker stops each running redis process in sequence (essentially
> stopping the group)
> Pacemaker promotes the slave
> Pacemaker brings up the floating IP on the former slave
> 
> (This follows documented behaviour as I understand it, see
> http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg05344.html
> for someone else with a similar problem).
> 
> Under production traffic load, each redis process takes about 4 to 5
> seconds to sync to disk and cleanup.

Can they be stopped and/or started in parallel?
If so, don't put them in a group - problem solved

> This means that a simple failover
> takes between 24 and 30 seconds, which is a bit too long for us.
> Acceptable failover times would be less than 5 seconds (the lower the
> better).
> 
> Is there a configuration option to change the failover process to *not*
> stop the group before promoting the secondary? Alternatively,
> suggestions on how to get pacemaker to manage only the state of the
> redis process but not the process itself are welcome (A process failure
> can be diagnosed by monitoring the response or lack thereof from redis
> itself, so a dead or non responding process can be treated alike as far
> as monitoring it goes).
> 
> Devdas Bhagat
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Pacemaker mailing list