[Pacemaker] Adding VIP support for the MySQL RA

Sat Nov 12 20:59:39 EST 2011

On 11-11-12 02:51 PM, Florian Haas wrote:
> Hi Yves and Michael,
>
> On 2011-11-12 19:22, Yves Trudeau wrote:
>> lol... How many large databases have you managed?  Once evicted, MySQL
>> will be restarted by Pacemaker so all the caches will be cold.
> If I may say so, before you start laughing at people on the list, it may
> be a good idea to actually get your facts straight and check what
> evict_outdated_slaves does. For a too-far-behind slave it bails out of
> monitor with $OCF_ERR_INSTALLED, which Pacemaker considers a hard error.
> Thus, that instance will _not_ be restarted by Pacemaker on this node
> unless an administrator intervenes.
Yes, I will not be restarted but stopped which, in my mind is similar 
since the restarted instance is almost stopped (too slow) .  Sorry for 
my familiar language, my goal was not to offense anyone, I was just 
imagining a poor DBA looking his db server restarting (or stopping).
> Still, Michael, Yves has a point that evict_outdated_slaves is not
> optimal (and I'm saying this as the guy that wrote that part of the
> agent). It's fine for a temporary problem that affects a single slave,
> but please consider this scenario:
>
> - High load on the database, across several instances.
> - Slaves start lagging behind.
> - We shut down a slave that is too far behind.
> - We now have _fewer_ instances to handle the same load.
> - Slaves fall further behind.
> - We shut down more slaves.
>
> This can turn into a cascading failure. Note, specifically, that the
> lagging slave has no real option to catch up even when the database
> isn't being hammered anymore, unless an admin has intervened and
> recovered/restarted the instance manually. And, of course, Yves' point
> about cold caches is entirely valid.
>
> In Yves' approach, we wouldn't shut down MySQL, but merely shift away
> the slave's virtual IP. So while clients can't connect to the slave via
> its virtual IP anymore, the slave can still fetch updates from the
> master -- and thus, actually has a chance to catch up. Once it's
> sufficiently caught up, it gets the VIP back and clients can talk to
> that slave again. And since we never stopped MySQL, we also don't have
> the cold cache problem.

Maybe to add more background.  There is a popular tool in the MySQL 
world, MMM doing about the same logic.  The tool has many serious flaws 
but in spite of these, people are still using it.  Many at Percona, 
myself included, believe MMM is too broken to be repaired and that's why 
I am looking for Pacemaker based solution that does the behavior our 
customers are expecting in a reliable way.

> Yves' patches are not perfect (and they're not expected to be, that's
> what a review is for), but I think his approach is sound and shouldn't
> be shot down simply because evict_outdated_slaves is already there.
>
> Cheers,
> Florian
>
Regards,

Yves