[ClusterLabs Developers] MariaDB resource-agent - help with choosing a master

Wed Feb 15 09:12:40 CET 2017

----- Original Message -----
> On 02/14/2017 02:51 PM, Nils Carlson wrote:
> > Hi,
> > 
> > I'm working on implementing a MariaDB resource-agent based on the mysql
> > one.
> > The idea is to take advantage of new features in MariaDB, especially
> > semi-synchronous replication and GTID.
> > 
> > GTID (Global Transaction ID) means that there is a counter that applies
> > to the replicated databases, which is unique within the cluster (there
> > can be multiple replication clusters with overlapping ID's).
> > 
> > Semi-synchronous replication means that the master will replicate
> > synchronously to AT LEAST ONE slave, before actually performing the
> > transaction. In theory there can be no data-loss due to a single node
> > failure, a big improvement compared to the normal async replication in
> > MariaDB.
> > 
> > These two sets of technologies should allow for quite a straightforward
> > set of semantics in the resource-agent.
> > On master failure, the node with the highest GTID must be the one that
> > was replicating synchronously, and should be promoted to be the new
> > master. The question is how to relay the information to crmd.
> > 

So it looks like you have the same requirements as the galera resource agent.
Galera is a "virtual synchronous" replication library for Mysql, where each
node records the latest version of the cluster state. (akin to GTID). 

The galera resource agent is a Master/Slave resource, having the same 
bootstrapping requirement to restart a cluster from scratch:

. during the "start" operation, all nodes store their local state in the CIB 
  with crm-attribute

. once all nodes have stored their state, the next "monitor" operation ran on
  any node will be able to determine the node to bootstrap the cluster from.

. the "promote" operation takes care of starting the mysql server on the node

. symmetrically, the "demote" stops the server.

There are a bunch of edge cases specific to the way Galera works, but you
should get the idea. 

> > My current working hypothesis is that I can place the GTID as a
> > crm-attribute both when starting the resource-agent and in a post-demote
> > notify. During the subsequent monitor operation the resource-agents can
> > then scan the the crm-attributes from other nodes and simply prioritise
> > themselves in relation to others (some relative scoring?).
> 
> A bit of a tangent: you can set attributes from a resource agent using
> either crm_attribute or attrd_updater. Each has advantages and
> disadvantages.
> 
> crm_attribute can set a permanent or transient attribute, while
> attrd_updater only sets transient attributes. (A node's transient
> attributes go away when the node reboots or otherwise stops cluster
> services.)
> 
> crm_attribute can only set public attributes, while attrd_updater can
> set public or private attributes. Public attributes are recorded in the
> CIB, and when they are changed, it triggers a new transition (i.e. the
> cluster checks to see if any resources need to be
> started/stopped/moved). Private attributes are not saved to the CIB, and
> do not cause a new transition. Public attributes can be referenced in
> constraint rules, while private attributes cannot. Private attributes
> have been supported since Pacemaker 1.1.13.
> 
> attrd_updater works with Pacemaker Remote nodes only when the cluster
> nodes use the corosync 2 stack. It will silently be ignored for
> Pacemaker Remote nodes when the cluster nodes use a legacy stack
> (heartbeat/cman/corosync-plugin). crm_attribute works with remote nodes
> on legacy stacks since Pacemaker 1.1.15.
> 
> I'd prefer attrd_updater with private transient attributes if that works
> for your purposes, because it saves unnecessary recalculation of the
> cluster state plus disk I/O.
> 
> > This requires a few things though:
> > 
> > - If there is no master when the resource agent starts we need to wait
> > for all nodes to come online (i.e) the cluster is just starting before
> > promoting any to master, so they can read GTID from the attributes.
> > - There must be a monitor step after start and demote and before the
> > promotion of any resource to master, and this must execute on all nodes
> > so they can set their priority for promotion.
> > - The post-demote notifier must complete execution before a node can
> > start the monitor operation. I THINK that it is ok for not all nodes to
> > have completed the post-demote notifier before the monitor operation
> > starts, probably this can work by creating a sparse priority
> > distribution, i.e. First node to execute monitor sets a priority of 100
> > - the next one down 90 - the next one in the middle at 95, based on the
> > number of nodes etc.
> > 
> > I hope this doesn't sound too tangled, I will try this out, but I can't
> > find any clear documentation on the ordering and completion of start,
> > notifiers, monitor and promote operations as well as master selection,
> > so all pointers are very much welcome.
> > 
> > And completely alternative suggestions also very much welcome.
> > 
> > Thanks for any and all assistance,
> > Nils
> 
> You may want to look at the ocf:heartbeat:galera agent -- I believe it
> has some similar concerns.

As Ken said :)

> 
> _______________________________________________
> Developers mailing list
> Developers at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/developers
>