[Pacemaker] Speeding up failover

Devdas Bhagat devdas.bhagat at booking.com
Thu Jul 25 06:31:35 UTC 2013


We have a master-slave setup for Redis, running 6 instances of Redis on
each physical host, and one floating IP between them.

Each redis instance is part of a single group.

When we fail over the IP in production, I'm observing this sequence of
events:
Pacemaker brings down the floating IP
Pacemaker demotes the master redis instance
Pacemaker stops each running redis process in sequence (essentially
stopping the group)
Pacemaker promotes the slave
Pacemaker brings up the floating IP on the former slave

(This follows documented behaviour as I understand it, see
http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg05344.html
for someone else with a similar problem).

Under production traffic load, each redis process takes about 4 to 5
seconds to sync to disk and cleanup. This means that a simple failover
takes between 24 and 30 seconds, which is a bit too long for us.
Acceptable failover times would be less than 5 seconds (the lower the
better).

Is there a configuration option to change the failover process to *not*
stop the group before promoting the secondary? Alternatively,
suggestions on how to get pacemaker to manage only the state of the
redis process but not the process itself are welcome (A process failure
can be diagnosed by monitoring the response or lack thereof from redis
itself, so a dead or non responding process can be treated alike as far
as monitoring it goes).

Devdas Bhagat




More information about the Pacemaker mailing list