[ClusterLabs Developers] MariaDB resource-agent - help with choosing a master

Ken Gaillot kgaillot at redhat.com
Tue Feb 14 22:33:51 CET 2017


On 02/14/2017 02:51 PM, Nils Carlson wrote:
> Hi,
> 
> I'm working on implementing a MariaDB resource-agent based on the mysql
> one.
> The idea is to take advantage of new features in MariaDB, especially
> semi-synchronous replication and GTID.
> 
> GTID (Global Transaction ID) means that there is a counter that applies
> to the replicated databases, which is unique within the cluster (there
> can be multiple replication clusters with overlapping ID's).
> 
> Semi-synchronous replication means that the master will replicate
> synchronously to AT LEAST ONE slave, before actually performing the
> transaction. In theory there can be no data-loss due to a single node
> failure, a big improvement compared to the normal async replication in
> MariaDB.
> 
> These two sets of technologies should allow for quite a straightforward
> set of semantics in the resource-agent.
> On master failure, the node with the highest GTID must be the one that
> was replicating synchronously, and should be promoted to be the new
> master. The question is how to relay the information to crmd.
> 
> My current working hypothesis is that I can place the GTID as a
> crm-attribute both when starting the resource-agent and in a post-demote
> notify. During the subsequent monitor operation the resource-agents can
> then scan the the crm-attributes from other nodes and simply prioritise
> themselves in relation to others (some relative scoring?).

A bit of a tangent: you can set attributes from a resource agent using
either crm_attribute or attrd_updater. Each has advantages and
disadvantages.

crm_attribute can set a permanent or transient attribute, while
attrd_updater only sets transient attributes. (A node's transient
attributes go away when the node reboots or otherwise stops cluster
services.)

crm_attribute can only set public attributes, while attrd_updater can
set public or private attributes. Public attributes are recorded in the
CIB, and when they are changed, it triggers a new transition (i.e. the
cluster checks to see if any resources need to be
started/stopped/moved). Private attributes are not saved to the CIB, and
do not cause a new transition. Public attributes can be referenced in
constraint rules, while private attributes cannot. Private attributes
have been supported since Pacemaker 1.1.13.

attrd_updater works with Pacemaker Remote nodes only when the cluster
nodes use the corosync 2 stack. It will silently be ignored for
Pacemaker Remote nodes when the cluster nodes use a legacy stack
(heartbeat/cman/corosync-plugin). crm_attribute works with remote nodes
on legacy stacks since Pacemaker 1.1.15.

I'd prefer attrd_updater with private transient attributes if that works
for your purposes, because it saves unnecessary recalculation of the
cluster state plus disk I/O.

> This requires a few things though:
> 
> - If there is no master when the resource agent starts we need to wait
> for all nodes to come online (i.e) the cluster is just starting before
> promoting any to master, so they can read GTID from the attributes.
> - There must be a monitor step after start and demote and before the
> promotion of any resource to master, and this must execute on all nodes
> so they can set their priority for promotion.
> - The post-demote notifier must complete execution before a node can
> start the monitor operation. I THINK that it is ok for not all nodes to
> have completed the post-demote notifier before the monitor operation
> starts, probably this can work by creating a sparse priority
> distribution, i.e. First node to execute monitor sets a priority of 100
> - the next one down 90 - the next one in the middle at 95, based on the
> number of nodes etc.
> 
> I hope this doesn't sound too tangled, I will try this out, but I can't
> find any clear documentation on the ordering and completion of start,
> notifiers, monitor and promote operations as well as master selection,
> so all pointers are very much welcome.
> 
> And completely alternative suggestions also very much welcome.
> 
> Thanks for any and all assistance,
> Nils

You may want to look at the ocf:heartbeat:galera agent -- I believe it
has some similar concerns.



More information about the Developers mailing list