[Pacemaker] Howto handle opt-in clusters WAS: Re: resource monitor operations on wrong nodes

Fri Apr 16 11:52:59 EDT 2010

On Fri, Apr 16, 2010 at 3:26 PM,  <martin.braun at icw.de> wrote:
> Hi,
>
>> > I have a non-symmetric cluster (symmetric-cluster="false") with four
>> > nodes.
>> We still check _every_ node to be sure the resources aren't already
>> running there.
>
> OK, that is reasonable - but I have trouble with the logic of the
> messages: they are listed as failed actions, however if the resource
> mustn't run there at first place it is not a failure that the resource is
> not installed. For the admins of such a cluster  I would like to have a
> clean status view - but you get several "false alarms" when you have
> different resources which can't run on all nodes.
> Will such a message send also an SNMP Trap (or Email if configured)?

Hmmm. Yes.

> Or do I need to install dummy scripts on these nodes to avoid such
> messages?

For now that might be the path of least resistance.
Though your expectations are entirely reasonable, it wouldn't take
much for these "errors" to be handled better.

If you file a bug I'll see what I can do.

> Perhaps this is only more or less something that the crm-shell should
> handle?

Its better handled at a lower level than this.

> Sometimes resources are trying to start on false nodes and are increasing
> also the failcounter - is this also a correct behavior or
> misconfiguration?

Sounds like a misconfig (or RA bug), but I'd need to see logs to be sure.

>
> And what about the ocf-scripts, do I need to copy them to all nodes - even
> to those nodes where the application is not installed. Because I can only
> configure the resources in crm shell when invoked on the node where the
> application is installed otherwise I get an:
>  ERROR: lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a
> reply message of metadata with funct
> ERROR: ocf:icw:ocfAPP: no such resource agent
> (I understand this error message because obviously the shell is looking on
> the local node for the ocf script.)

You can also use crm with -f (i think thats the right option) to force
it to accept the change.
Dejan is our shell expert, he might have an alternative.

>
> I know I can use the cibadmin for this case, but it would be more
> user-friendly to be able to centrally administrate a cluster with the crm
> shell.
>
> I am not sure if I am going in the right direction but I want to set up a
> 6-8 nodes cluster where on pairs (drbd) of them are running different
> applications and I need a status view for "normal" admins so they can see
> at a glance that everything is OK (or not).
>
>
> Thanks in advance,
> Martin
>
>
>
> Andrew Beekhof <andrew at beekhof.net> wrote on 09.04.2010 13:08:07:
>
>> [image removed]
>>
>> Re: [Pacemaker] resource monitor operations on wrong nodes
>>
>> Andrew Beekhof
>>
>> to:
>>
>> The Pacemaker cluster resource manager
>>
>> 09.04.2010 13:10
>>
>> [image removed]
>>
>> From:
>>
>> Andrew Beekhof <andrew at beekhof.net>
>>
>> To:
>>
>> The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
>>
>> Please respond to The Pacemaker cluster resource manager
>> <pacemaker at oss.clusterlabs.org>
>>
>> On Fri, Apr 9, 2010 at 12:06 PM,  <martin.braun at icw.de> wrote:
>> > Hi,
>> >
>> > I have a non-symmetric cluster (symmetric-cluster="false") with four
>> > nodes.
>>
>> We still check _every_ node to be sure the resources aren't already
>> running there.
>>
>> > On two nodes I have allowed a resource group:
>> >
>> > location grpFS-pref1 grpFS 200: wdf-ux-0040
>> > location grpFS-pref2 grpFS 200: wdf-ux-0041
>> >
>> > grpFS is configured as:
>> >
>> > group grpFS resFS resVIP resAPP
>> >
>> > the other nodes are not mentioned in any location constraints for now.
>> >
>> > However I get this:
>> >
>> > <<
>> > Failed actions:
>> >    resAPP_monitor_0 (node=ux-4, call=4, rc=5, status=complete): not
>> > installed
>> >    resAPP_monitor_0 (node=ux-5, call=4, rc=5, status=complete): not
>> > installed
>> >>>
>> >
>> >
>> > My question is why does pacemaker try to monitor the app on the wrong
>> > nodes, I would have thought that with an opt-in cluster this should
> not
>> > happen?
>> > Or do I have to use explicitly loc constraints to avoid  runnning
>> > monitoring the resource on the other nodes?
>> >
>> > Overall Config:
>> >
>> >
>> > node ux-0 \
>> >        attributes standby="off"
>> > node ux-1 \
>> >        attributes standby="off"
>> > node ux-4
>> > node ux-5
>> > primitive resDRBD ocf:linbit:drbd \
>> >        operations $id="resDRBD-operations" \
>> >        op monitor interval="20" role="Slave" timeout="20"
>> > start-delay="1m" \
>> >        op monitor interval="10" role="Master" timeout="20"
>> > start-delay="1m" \
>> >        params drbd_resource="r0" drbdconf="/usr/local/etc/drbd.conf"
>> > primitive resFS ocf:heartbeat:Filesystem \
>> >        operations $id="resFS-operations" \
>> >        op monitor interval="20" timeout="40" start-delay="0" \
>> >        params device="/dev/drbd0" directory="/opt/icw" fstype="ext3"
>> > primitive resAPP ocf:icw:ocfapp2 \
>> >        operations $id="resapp-operations" \
>> >        op start interval="0" timeout="3m" \
>> >        op monitor interval="60s" timeout="30s" start-delay="3m" \
>> >        params [....]
>> >        meta target-role="Started" is-managed="true"
>> > primitive resVIP ocf:heartbeat:IPaddr2 \
>> >        params ip="192.168.210.91" cidr_netmask="24" nic="eth3" \
>> >        operations $id="resVIP-operations" \
>> >        op monitor interval="10s" timeout="20s" start-delay="2s" \
>> >        meta target-role="Started"
>> > group grpFS resFS resVIP resapp \
>> >        meta target-role="started"
>> > ms msDRBD resDRBD \
>> >        meta clone-max="2" notify="true" target-role="started"
>> > location grpFS-pref1 grpFS 200: wdf-ux-0040
>> > location grpFS-pref2 grpFS 200: wdf-ux-0041
>> > location master-pref1 msDRBD 200: wdf-ux-0040
>> > location master-pref2 msDRBD 200: wdf-ux-0041
>> > colocation colFSDRBD inf: grpFS msDRBD:Master
>> > order orderFSDRBD : msDRBD:promote grpFS:start
>> > property $id="cib-bootstrap-options" \
>> >        dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
>> >        cluster-infrastructure="openais" \
>> >        expected-quorum-votes="4" \
>> >        no-quorum-policy="ignore" \
>> >        stonith-enabled="false" \
>> >        last-lrm-refresh="1270053177" \
>> >        symmetric-cluster="false"
>> >
>> >
>> > Thanks in advance,
>> > Martin
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Pacemaker mailing list
>> > Pacemaker at oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> >
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>
> InterComponentWare AG:
> Vorstand: Peter Kirschbauer (Vors.), Jörg Stadler / Aufsichtsratsvors.: Prof. Dr. Christof Hettich
> Firmensitz: 69190 Walldorf, Industriestraße 41 / AG Mannheim HRB 351761 / USt.-IdNr.: DE 198388516
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>