[Pacemaker] RFC: What part of the XML configuration do youhate the most?

Fri Jun 27 15:52:45 UTC 2008

Dejan,

I agree with you on this one -

>> These situations are tricky to handle. Such a high load may also
>>be  a sign that resources should indeed move elsewhere. Or it may
>>even be considered as a service disruption. Though there are most
>>probably shops which would prefer not to do a failover in such
>>cases. At any rate, this feature, if it gets implemented, would
>> have to be used with utmost care.

We run about 88 Linux Guest on an LPAR under z/VM using HA for High-availability for Kernel panics, or application failures - etc within an HA-cluster - we have about 15-two node clusters that are part of this configuration - sometimes these HA clusters are effected by high CPU spikes on other Linux guests that are not even part of an HA cluster within the LPAR - 

In a highly virtualized environment an option to disable this feature would be a must if it's part of the CRM but my recommendation is provide the failure counts and time dimension in the LRMD and allow the RA to inspect and handle this - this probably gives the flexibility to everyone using HA 

Thanks...

Phil 

-----Original Message-----
From: pacemaker-bounces at clusterlabs.org [mailto:pacemaker-bounces at clusterlabs.org] On Behalf Of Dejan Muhamedagic
Sent: Friday, June 27, 2008 8:52 AM
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] RFC: What part of the XML configuration do youhate the most?

Hi Keisuke-san,

On Fri, Jun 27, 2008 at 09:19:33PM +0900, Keisuke MORI wrote:
> Hi,
> 
> Dejan Muhamedagic <dejanmm at fastmail.fm> writes:
> > On Tue, Jun 24, 2008 at 04:02:06PM +0200, Lars Marowsky-Bree wrote:
> >> On 2008-06-24T15:48:12, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> >> 
> >> > >    But precisely we have two scenarios to configure to:
> >> > >    a) monitor NG -> stop -> start on the same node
> >> > >       -> monitor NG (Nth time) -> stop -> failover to another node
> >> > >    b) monitor NG -> monitor NG (Nth times) -> stop -> failover to another node
> >> > > 
> >> > >    The current pacemaker behaves as a), I think, but b) is also
> >> > >    useful when you want to ignore a transient error.
> >> > 
> >> > The b) part has already been discussed on the list and it's
> >> > supposed to be implemented in lrmd. I still don't have the API
> >> > defined, but thought about something like
> >> > 
> >> > 	max-total-failures (how many times a monitor may fail)
> >> > 	max-consecutive-failures (how many times in a row a monitor may fail)
> >> > 
> 
> I also thought that it should be implemented in lrmd at first,
> but now I think it would be better to handle it in crm.
> 
> If we would implement it in lrmd, it would have two kinds of
> fail-counts in different modules (cib and lrmd) and users have
> to understand and use both tools for cib and lrmd depending on
> the kind of the fails even though they are for very similar
> purpose. I think it's confusing for users.

The fail-counts in lrmd will probably be available for
inspection. And they would probably also expire after some time.
What I suggested in the previous messages is actually missing
the time dimension: There should be maximum failures within
a period.

> So I think that lrmd should always report failures like now,
> and crm/cib should hold all the failed status and make a decision.

Of course, it could be done like that as well, though that could
make processing in crm much more complex.

> >> > These should probably be attributes defined on the monitor
> >> > operation level.
> >> 
> >> The "ignore failure reports" clashes a bit with the "react to failures
> >> ASAP" requirement.
> >> 
> >> It is my belief that this should be handled by the RA, not in the LRM
> >> nor the CRM. The monitor op implementation is the place to handle this.
> 
> 
> Yes, it can be implemented in RAs, and that's what we've done actually.
> 
> But in that case, such RAs would have a similar retry loop in
> each scripts and would have their own retry parameters for each RA types.
> 
> I think it's worth having a common way to handle this.

Yes, I also think that having this handled in one place would be
beneficial. The resource agents, though they should know the best
the resources they manage, may not always take into account
all environment peculiarities. Then it is up to the user to
decide if they want to allow a monitor for the resource to fail
now and then.

> >> Beyond that, I strongly feel that "transient errors" are a bad
> >> foundation to build clusters on.
> >
> > Of course, all that is right. However, there are some situations
> > where we could bend the rules. I'm not sure what Keisuke-san had
> > in mind, but for example one could be more forgiving when
> > monitoring certain stonith resources.
> >
> 
> One situation in my mind is when sudden high load happened in
> very short time. The application may fail to respond to the
> monitor op by the RA when the load is very high, but if such
> 'spark of the load' ceases shortly then we don't want to rush to the failover.

These situations are tricky to handle. Such a high load may also
be a sign that resources should indeed move elsewhere. Or it may
even be considered as a service disruption. Though there are most
probably shops which would prefer not to do a failover in such
cases. At any rate, this feature, if it gets implemented, would
have to be used with utmost care.

> Another case we've met was when we wrote a RA to check for some hardware.
> The status from the hardware rarely failed in very specific timing,
> and retrying the check was just fine.

That's what I often observed with some stonith devices.

Cheers,

Dejan

> 
> Thanks,
> -- 
> Keisuke MORI
> NTT DATA Intellilink Corporation
> 
> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at clusterlabs.org
> http://list.clusterlabs.org/mailman/listinfo/pacemaker

_______________________________________________
Pacemaker mailing list
Pacemaker at clusterlabs.org
http://list.clusterlabs.org/mailman/listinfo/pacemaker
--------------------------------------------------------

This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing.
--------------------------------------------------------