[Pacemaker] Validate strategy for RA on DRBD standby node

Tue Mar 8 13:35:46 UTC 2011

Hi,

On Thu, Mar 03, 2011 at 09:58:59AM -0500, David McCurley wrote:
> Would it be appropriate to post the first result of coding here for review and recommendations?
> 
> If so, just post code in-line or as an attachment?  If not, no problemo.

The right place to post resource agents is
linux-ha-dev at lists.linux-ha.org

Thanks,

Dejan

> ----- Original Message -----
> > From: "Dejan Muhamedagic" <dejanmm at fastmail.fm>
> > To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> > Sent: Friday, February 25, 2011 10:53:34 AM
> > Subject: Re: [Pacemaker] Validate strategy for RA on DRBD standby node
> > On Thu, Feb 24, 2011 at 10:49:27AM -0500, David McCurley wrote:
> > > Thanks for the quick reply and especially the link. It was much
> > > better and more thorough in testing than the other shell ra ldap
> > > link I found.
> > >
> > > > That would be the first python RA. BTW, there was recently posted
> > > > slapd RA (implemented in shell), which I should review, but
> > > > haven't done that yet. /At any rate, that RA does not support
> > > > multi-state resources which I think would be essential. Did you
> > > > plan to do that? At any rate, I'd suggest that you check that
> > > > code too and then see if you need to do your own implementation.
> > > > The thread starts here:
> > > >
> > > > http://marc.info/?l=linux-ha-dev&m=129666245428850&w=2
> > >
> > > Great stuff and very thorough. But it doesn't look like it will work
> > > in our config because the files won't exist on the DRBD slave. I
> > > will use this as my new example.
> > >
> > > I'm not sure what you are referring to with "multi-state" resources.
> > > Is this in relation to "promote,demote, migrate_to, migrate_from" in
> > > the guide, i.e. master vs slave or is there more to it? I plan to
> > > support a master and slave later on down the road -- under pressure
> > > to get this rolled out now. Is there a good discussion / resource
> > > other than the dev guide? I had planned to try to wade through the
> > > DRBD RA scripts to figure it out.
> > 
> > migrate_to/from are for something else. It's just promote
> > (slave->master) and demote (master->slave).
> > 
> > > Why python? Because shell is harder for me to read and I have to
> > > make a good clear verbose example for some others who will also be
> > > doing some RA's (in python) for our custom apps.
> > 
> > No problem with python. Actually, there was an idea to provide a
> > python class (RA or so) which would make implementing resource
> > agents easier.
> > 
> > > > > The config file for OpenLDAP is stored in
> > > > > /etc/ldap/slapd.d/cn=config.ldif. This is on a DRBD
> > > > > active-passive system and the /etc/ldap directory is actually a
> > > > > symlink to the DRBD controlled share /vcoreshare/etc/ldap. The
> > > > > real config file is at
> > > > > /vcoreshare/etc/ldap/slapd.d/cn=config.ldif.
> > > >
> > > > What about the old style configuration? I assume that there are
> > > > still quite a few installations/distributions using those.
> > >
> > > Yes, I have some code for that, using the example slapd init script
> > > and the other examples I found, but no test environment for that.
> > 
> > IIRC, the old configuration style is quite easy to parse.
> > 
> > > > > So I'm trying to be very judicious with every function and
> > > > > validation, checking file permissions, etc. But the problem is
> > > > > that /etc/ldap/slapd.d/cn=config.ldif is only present on the
> > > > > active DRBD node. My validate function checks that the file is
> > > > > readable by the user/group that slapd is to run as. Now, as soon
> > > > > as I start ldap in the cluster, it starts fine, but validate
> > > > > fails
> > > > > on the standby node (because the DRBD volume isn't mounted) and
> > > > > crm_mon shows a failed action:
> > > >
> > > > On probes (monitor with interval 0), some parts of validation
> > > > which concern the local node and not the configuration should say
> > > > OCF_NOT_RUNNING instead of error. This is exactly that case. No
> > > > worries, because if the next action is start validation is
> > > > invoked again. Probes are issued by pacemaker to establish if the
> > > > resource is running and normally it is expected to be not running
> > > > (for instance on node startup).
> > >
> > > Ah! I have a function that checks for probes but wasn't using it
> > > because I didn't quite understand the semantics.
> > 
> > OK, hope that it's clear now.
> > 
> > Thanks,
> > 
> > Dejan
> > 
> > >
> > > Very helpful stuff, thanks!
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started:
> > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs:
> > > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> > 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs:
> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker