[Pacemaker] crm_master triggering assert section != NULL

Wed Oct 12 21:09:45 UTC 2011

Hi Florian,

On 11-10-12 04:09 PM, Florian Haas wrote:
> On 2011-10-12 21:46, Yves Trudeau wrote:
>> Hi Florian,
>>    sure, let me state the requirements.  If those requirements can be
>> met, pacemaker will be much more used to manage MySQL replication.
>> Right now, although at Percona I deal with many large MySQL deployments,
>> none are using the current agent.   Another tool, MMM is currently used
>> but it is currently orphan and suffers from many pretty fundamental
>> flaws (while implement about the same logic as below).
>>
>> Consider a pool of N identical MySQL servers.  In that case we need:
>> - N replication resources (it could be the MySQL RA)
>> - N Reader_vip
>> - 1 Writer_vip
>>
>> Reader vips are used by the application to run queries that do not
>> modify data, usually accessed is round-robin fashion.  When the
>> application needs to write something, it uses the writer_vip.  That's
>> how read/write splitting is implement in many many places.
>>
>> So, for the agent, here are the requirements:
>>
>> - No need to manage MySQL itself
>>
>> The resource we are interested in is replication, MySQL itself is at
>> another level.  If the RA is to manage MySQL, it must not interfere.
>>
>> - the writer_vip must be assigned only to the master, after it is promoted
>>
>> This, is easy with colocation
> Agreed.
>
>> - After the promotion of a new master, all slaves should be allowed to
>> complete the application of their relay logs prior to any change master
>>
>> The current RA does not do that but it should be fairly easy to implement.
> That's a use case for a pre-promote and post-promote notification. Like
> the mysql RA currently does.
>
>> - After its promotion and before allowing writes to it, a master should
>> publish its current master file and position.   I am using resource
>> parameters in the CIB for these (I am wondering if transient attributes
>> could be used instead)
> They could, and you should. Like the mysql RA currently does.
>

The RA I downloaded following instruction of the wiki stating it is the 
latest sources:

wget -O resource-agents.tar.bz2 
http://hg.linux-ha.org/agents/archive/tip.tar.bz2

has the following code to change the master:

     ocf_run $MYSQL $MYSQL_OPTIONS_LOCAL $MYSQL_OPTIONS_REPL \
         -e "CHANGE MASTER TO MASTER_HOST='$master_host', \
                              MASTER_USER='$OCF_RESKEY_replication_user', \

MASTER_PASSWORD='$OCF_RESKEY_replication_passwd'"

which does not include file and position.

>> - After the promotion of a new master, all slaves should be reconfigured
>> to point to the new master host with correct file and position as
>> published by the master when it was promoted
>>
>> The current RA does not set file and position.
> "The current RA" being ocf:heartbeat:mysql?
>
> A cursory grep for "CRM_ATTR" in ocf:heartbeat:mysql indicates that it
> does set those.

grep CRM_ATTR returned nothing.

yves at yves-desktop:/opt/pacemaker/Cluster-Resource-Agents-7a11934b142d/heartbeat$ 
grep -i CRM_ATTR mysql
yves at yves-desktop:/opt/pacemaker/Cluster-Resource-Agents-7a11934b142d/heartbeat$

and that is the latest from Mercurial...

>> Under any non-trivial
>> load this will fail.  The current RA is not designed to stores the
>> information.  The new RA uses the information stored in the cib along
>> with post-promote notification.
> Is this point moot considering my previous statement?
>
>> - each slave and the master may have one or more reader_vip provided
>> that they are replicating correctly (no lag beyond a threshold,
>> replication of course working).  If all slaves fails, all reader_vip
>> should be located on the master.
> Use a cloned IPaddr2 as a non-anonymous clone, thereby managing an IP
> range. Add a location constraint restricting the clone instance to run
> on only those nodes where a specific node attribute is set. Or
> conversely, forbid them from running on nodes where said attribute is
> not set. Manage that attribute from your RA.

That's clever, never thought about it.

>> The current RA either kills MySQL or does nothing, it doesn't care about
>> reader_vips.  Killling MySQL on a busy server with 256GB of buffer pool
>> is enough for someone to lose his job...  The new RA adjusts location
>> scores for the reader_vip resources dynamically.
> Like I said, that's managing one resource from another, which is a total
> nightmare. It's also not necessary, I dare say, given the approach I
> outlined above.
>
I'll explore the node attribute approach, I like it.

Is it possible to create an attribute that does not belong to a node but 
is cluster wide?
>> - the RA should implement a protection against flapping in case a slave
>> hovers around the replication lag threshold
> You should get plenty of inspiration there from how the dampen parameter
> is used in ocf:pacemaker:ping.
>
ok, I'll check
>> The current RA does implement that but it is not required giving the
>> context.  The new RA does implement flapping protection.
>>
>> - upon demote of a master, the RA _must_ attempt to kill all user
>> (non-system) connections
>>
>> The current RA does not do that but it is easy to implement
> Yeah, as I assume it would be in the other one.
>
>> - Slaves must be read-only
>>
>> That's fine, handled by the current RA.
> Correct.
>
>> - Monitor should test MySQL and replication.  If either is bad, vips
>> should be moved away.  Common errors should not trigger actions.
> Like I said, should be feasible with the node attribute approach
> outlined above. No reason to muck around with the resources directly.
>
>> That's handled by the current RA for most of if.  The error handling
>> could be added.
>>
>> - Slaves should update their master score according to the state of
>> their replication.
>>
>> Handled by both RA
> Right.
>
>> So, at the minimum, the RA needs to be able to store the master
>> coordinate information, either in the resource parameters or in
>> transient attributes and must be able to modify resources location
>> scores.  The script _was_ working before I got the cib issue, maybe it
>> was purely accidental but it proves the concept.  I was actually
>> implement/testing the relay_log completion stuff.  I chose not to use
>> the current agent because I didn't want to manage MySQL itself, just
>> replication.
>>
>> I am wide open to argue any Pacemaker or RA architecture/design part but
>> I don't want to argue the replication requirements, they are fundamental
>> in my mind.
> Yup, and I still believe that ocf:heartbeat:mysql either already
> addresses those, or they could be addressed in a much cleaner fashion
> than writing a new RA.
>
> Now, if the only remaining point is "but I want to write an agent that
> can do _less_ than an existing one" (namely, manage only replication,
> not the underlying daemon), then I guess I can't argue with that, but
> I'd still believe that would be a suboptimal approach.
Ohh...  don't get me wrong, I am not the kind of guy that takes pride in 
having re-invented the flat tire.  I want an opensource _solution_ I can 
offer to my customers.  I think part of the problem here is that we are 
not talking about the same ocf:heartbeat:mysql RA.  What is mainstream 
is what you can get with "apt-get install pacemaker" on 10.04 LTS for 
example.  This is 1.0.8.  I also tried 1.0.11 and still it is obviously 
not the same version.  I got my "latest" agent version as explained in 
the clusterlabs FAQ page from:

wget -O resource-agents.tar.bz2 
http://hg.linux-ha.org/agents/archive/tip.tar.bz2

Where can I get the version you are using :)

Regards,

Yves

> Cheers,
> Florian
>