[Pacemaker] Validate strategy for RA on DRBD standby node
Serge Dubrouski
sergeyfd at gmail.com
Thu Feb 24 11:26:22 EST 2011
Ahh! I see, you need to use ocf_is_probe function in your RA to
isolate that case.
On Thu, Feb 24, 2011 at 9:17 AM, David McCurley <mac at fabric.com> wrote:
> I'm not trying to start it. The problem is that my validate function was failing. Here is the case:
>
> Deploy RA on both nodes (master DRBD and slave).
> Edit crm config to add the ldap resource, co_location,etc.
> Save the config and Pacemaker attempts to start the LDAP, but it also runs a check on both the master and the slave, and my validate was failing on the slave since it didn't have the file system resources for ldap available.
>
> We are in active/passive case so it is problems with my code when PM runs the monitor/validate check on the slave. The live ldap instance is colocated with DRBD, filesystem, eg from crm configure show:
>
> node vcoresrv1 \
> attributes standby="off"
> node vcoresrv2 \
> attributes standby="off"
> primitive clusterip ocf:heartbeat:IPaddr2 \
> params ip="192.168.1.4" cidr_netmask="24" nic="eth0" iflabel="cip" \
> op monitor interval="30s"
> primitive clusteripsourcing ocf:heartbeat:IPsrcaddr \
> params ipaddress="192.168.1.4" \
> op monitor interval="10" timeout="20s" depth="0"
> primitive ldap ocf:fabric:openldap \
> op monitor interval="10"
> primitive drbd_vcoreshare ocf:linbit:drbd \
> params drbd_resource="r0" \
> op start interval="0" timeout="240s" \
> op stop interval="0" timeout="100s" \
> op promote interval="0" timeout="90s" \
> op demote interval="0" timeout="90s" \
> op monitor interval="15s"
> primitive fs_vcoreshare ocf:heartbeat:Filesystem \
> params device="/dev/drbd/by-res/r0" directory="/vcoreshare" fstype="ext4" \
> op start interval="0" timeout="60s" \
> op stop interval="0" timeout="60s"
> ms ms_drbd_vcoreshare drbd_vcoreshare \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
> colocation clusterip_with_vcoreshare inf: clusterip fs_vcoreshare
> colocation ipsourcing_with_clusterip inf: clusteripsourcing clusterip
> colocation vcoreshare_on_drbd inf: fs_vcoreshare ms_drbd_vcoreshare:Master
> colocation ldap_with_vcoreshare inf: ldap fs_vcoreshare
> order clusterip_after_vcoreshare inf: fs_vcoreshare clusterip
> order ldap_after_clusterip inf: clusterip ldap
> order ipsourcing_after_clusterip inf: clusterip clusteripsourcing
> order vcoreshare_after_drbd inf: ms_drbd_vcoreshare:promote fs_vcoreshare:start
> property $id="cib-bootstrap-options" \
> dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100"
>
>
> ----- Original Message -----
>> From: "Serge Dubrouski" <sergeyfd at gmail.com>
>> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
>> Sent: Thursday, February 24, 2011 11:05:56 AM
>> Subject: Re: [Pacemaker] Validate strategy for RA on DRBD standby node
>>
>> Why are you trying to start LDAP on a node where you don't have your
>> DRBD resource mounted. Having LDAP up on both nodes would make sense
>> if you were building an active/active LDAP cluster with syncrepl or
>> any other replication mechanism. In that case you'd set it up and M/S
>> and or as a clone and would have to provide access to the config file
>> on both nodes. In active/passive case you have to collocate your LDAP
>> resource with your DRBD and filesystem resources and Pacemaker won't
>> try to start LDAP on a node that doesn't have DRBD activated and
>> filesystem mounted.
>>
>> On Thu, Feb 24, 2011 at 6:06 AM, David McCurley <mac at fabric.com>
>> wrote:
>> > Pacemaker and list newbie here :)
>> >
>> > I'm writing a resource adapter in python for the newer release of
>> > OpenLDAP but I need some pointers on a strategy for the validate
>> > function in a certain case. (In python because the more advanced
>> > shell scripting hurts my head :). Here is the situation:
>> >
>> > The config file for OpenLDAP is stored in
>> > /etc/ldap/slapd.d/cn=config.ldif. This is on a DRBD
>> > active-passive system and the /etc/ldap directory is actually a
>> > symlink to the DRBD controlled share /vcoreshare/etc/ldap. The
>> > real config file is at
>> > /vcoreshare/etc/ldap/slapd.d/cn=config.ldif.
>> >
>> > So I'm trying to be very judicious with every function and
>> > validation, checking file permissions, etc. But the problem is
>> > that /etc/ldap/slapd.d/cn=config.ldif is only present on the
>> > active DRBD node. My validate function checks that the file is
>> > readable by the user/group that slapd is to run as. Now, as soon
>> > as I start ldap in the cluster, it starts fine, but validate fails
>> > on the standby node (because the DRBD volume isn't mounted) and
>> > crm_mon shows a failed action:
>> > ----------------------------------------------
>> > ============
>> > Last updated: Wed Feb 23 07:35:19 2011
>> > Stack: openais
>> > Current DC: vcoresrv1 - partition with quorum
>> > Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
>> > 2 Nodes configured, 2 expected votes
>> > 5 Resources configured.
>> > ============
>> >
>> > Online: [ vcoresrv1 vcoresrv2 ]
>> >
>> > fs_vcoreshare (ocf::heartbeat:Filesystem): Started vcoresrv1
>> > Master/Slave Set: ms_drbd_vcoreshare
>> > Masters: [ vcoresrv1 ]
>> > Slaves: [ vcoresrv2 ]
>> > clusterip (ocf::heartbeat:IPaddr2): Started vcoresrv1
>> > clusteripsourcing (ocf::heartbeat:IPsrcaddr): Started
>> > vcoresrv1
>> >
>> > Failed actions:
>> > ldap_monitor_0 (node=vcoresrv2, call=130, rc=5,
>> > status=complete): not installed
>> > ---------------------------------------------
>> >
>> > Is there a way for my RA to know that it is being called on the
>> > active node instead of the passive node. Or more generally, what
>> > would anyone recommend here? I really didn't want to write the
>> > resource adapter so it would be specific to our setup (e.g.
>> > checking to make sure the DRBD mount is readable before looking
>> > for the config files). Maybe Pacemaker passes in some extra env
>> > variable that can be used?
>> >
>> > I'm reluctanct to post the code for the RA here in the list because
>> > it is 450 lines. But, here is the logic for the validate
>> > function:
>> >
>> > if the appropriate slapd user and group do not exist:
>> > return OCF_ERR_INSTALLED
>> > if the ldap config file doesn't exist or isn't readable by the
>> > slapd user:
>> > return OCF_ERR_INSTALLED
>> > if the ldap binary doesn't exist or isn't executable:
>> > return OCF_ERR_INSTALLED
>> > return OCF_SUCCESS
>> >
>> > Or maybe I'm overdoing it in my tests or have misinterpreted the
>> > "OCF Resource Agent Developer's Guide"?
>> >
>> > Any advice or guidance / clarification appreciated.
>> >
>> > Thanks,
>> >
>> > Mac
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started:
>> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs:
>> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>> >
>>
>>
>>
>> --
>> Serge Dubrouski.
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
--
Serge Dubrouski.
More information about the Pacemaker
mailing list