[Pacemaker] Validate strategy for RA on DRBD standby node
David McCurley
mac at fabric.com
Thu Mar 3 15:58:59 CET 2011
Would it be appropriate to post the first result of coding here for review and recommendations?
If so, just post code in-line or as an attachment? If not, no problemo.
----- Original Message -----
> From: "Dejan Muhamedagic" <dejanmm at fastmail.fm>
> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> Sent: Friday, February 25, 2011 10:53:34 AM
> Subject: Re: [Pacemaker] Validate strategy for RA on DRBD standby node
> On Thu, Feb 24, 2011 at 10:49:27AM -0500, David McCurley wrote:
> > Thanks for the quick reply and especially the link. It was much
> > better and more thorough in testing than the other shell ra ldap
> > link I found.
> >
> > > That would be the first python RA. BTW, there was recently posted
> > > slapd RA (implemented in shell), which I should review, but
> > > haven't done that yet. /At any rate, that RA does not support
> > > multi-state resources which I think would be essential. Did you
> > > plan to do that? At any rate, I'd suggest that you check that
> > > code too and then see if you need to do your own implementation.
> > > The thread starts here:
> > >
> > > http://marc.info/?l=linux-ha-dev&m=129666245428850&w=2
> >
> > Great stuff and very thorough. But it doesn't look like it will work
> > in our config because the files won't exist on the DRBD slave. I
> > will use this as my new example.
> >
> > I'm not sure what you are referring to with "multi-state" resources.
> > Is this in relation to "promote,demote, migrate_to, migrate_from" in
> > the guide, i.e. master vs slave or is there more to it? I plan to
> > support a master and slave later on down the road -- under pressure
> > to get this rolled out now. Is there a good discussion / resource
> > other than the dev guide? I had planned to try to wade through the
> > DRBD RA scripts to figure it out.
>
> migrate_to/from are for something else. It's just promote
> (slave->master) and demote (master->slave).
>
> > Why python? Because shell is harder for me to read and I have to
> > make a good clear verbose example for some others who will also be
> > doing some RA's (in python) for our custom apps.
>
> No problem with python. Actually, there was an idea to provide a
> python class (RA or so) which would make implementing resource
> agents easier.
>
> > > > The config file for OpenLDAP is stored in
> > > > /etc/ldap/slapd.d/cn=config.ldif. This is on a DRBD
> > > > active-passive system and the /etc/ldap directory is actually a
> > > > symlink to the DRBD controlled share /vcoreshare/etc/ldap. The
> > > > real config file is at
> > > > /vcoreshare/etc/ldap/slapd.d/cn=config.ldif.
> > >
> > > What about the old style configuration? I assume that there are
> > > still quite a few installations/distributions using those.
> >
> > Yes, I have some code for that, using the example slapd init script
> > and the other examples I found, but no test environment for that.
>
> IIRC, the old configuration style is quite easy to parse.
>
> > > > So I'm trying to be very judicious with every function and
> > > > validation, checking file permissions, etc. But the problem is
> > > > that /etc/ldap/slapd.d/cn=config.ldif is only present on the
> > > > active DRBD node. My validate function checks that the file is
> > > > readable by the user/group that slapd is to run as. Now, as soon
> > > > as I start ldap in the cluster, it starts fine, but validate
> > > > fails
> > > > on the standby node (because the DRBD volume isn't mounted) and
> > > > crm_mon shows a failed action:
> > >
> > > On probes (monitor with interval 0), some parts of validation
> > > which concern the local node and not the configuration should say
> > > OCF_NOT_RUNNING instead of error. This is exactly that case. No
> > > worries, because if the next action is start validation is
> > > invoked again. Probes are issued by pacemaker to establish if the
> > > resource is running and normally it is expected to be not running
> > > (for instance on node startup).
> >
> > Ah! I have a function that checks for probes but wasn't using it
> > because I didn't quite understand the semantics.
>
> OK, hope that it's clear now.
>
> Thanks,
>
> Dejan
>
> >
> > Very helpful stuff, thanks!
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs:
> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
More information about the Pacemaker
mailing list