[Pacemaker] New System Health feature
Mark Hamzy
hamzy at us.ibm.com
Mon Apr 27 20:25:02 UTC 2009
beekhof at gmail.com wrote on 04/24/2009 11:00:01 AM:
>
> On Thu, Apr 23, 2009 at 17:49, Mark Hamzy <hamzy at us.ibm.com> wrote:
> >
> > Health Attribute-value Meaning
> > green 1000 server is happy, capable of running any resource
> > yellow 0 server is marginal - it is desirable to schedule resources
> > somewhere else if you can
> > red -INFINITY server is unreliable (but still up) and should not be
used
> >
> > Note that all of the values given would be configuration-specific.
These
> > attributes would be set via attrd_updater.
>
> Agreed.
> What I'm not yet clear on though, is why you can't just use these
> attribute with the existing rsc_location constraints.
>
> (And even if there is a need to expose it differently to users, it
> should definitely be using the rsc_location logic internally)
A machine now has three states as it is a part of a cluster. When some
process detects that a failure is imminent, we want to notify the manager
to move resources off of that node.
If daemons asynchronously update variables with values, then system
administrators
need to modify their setups with each known #health-x variable. Each of
the M
constraints needs N variables as input to the overall calculations.
For example, this simple constraint:
<constraints>
<rsc_location id="rsc_location_apache_1" rsc="apache_1">
<rule id="prefered_location_apache_1" score="100">
<expression attribute="#uname" id="prefered_location_apache_1_expr"
operation="eq" value="hs21c"/>
</rule>
</rsc_location>
</constraints>
becomes:
<constraints>
<rsc_location id="rsc_location_apache_1" rsc="apache_1">
<rule id="prefered_location_apache_1" score="100">
<expression attribute="#uname" id="apache_1_uname_expr"
operation="eq" value="hs21c"/>
</rule>
</rsc_location>
<rsc_location id="rsc_location_apache_2" rsc="apache_1">
<rule id="health_location_1_apache_1" score_attribute="#health-ipmi">
<expression attribute="#health-ipmi" id="apache_1_ipmi_expr"
operation="defined"/>
</rule>
</rsc_location>
<rsc_location id="rsc_location_apache_3" rsc="apache_1">
<rule id="health_location_2_apache_1" score-attribute="#health-smart">
<expression attribute="#health-smart" id="apache_1_smart_expr"
operation="defined"/>
</rule>
</rsc_location>
</constraints>
Not only does this has to be done for all of the resources, but new health
metrics must
be known to the administrators.
This is a logistical nightmare. What I am proposing is that pacemaker add
health
scores to nodes. Currently, nodes with no rules applied to them start at
zero.
We want the constraints left alone.
<constraints>
<rsc_location id="rsc_location_apache_1" rsc="apache_1">
<rule id="prefered_location_apache_1" score="100">
<expression attribute="#uname" id="prefered_location_apache_1_expr"
operation="eq" value="hs21c"/>
</rule>
</rsc_location>
</constraints>
How were you proposing that this should be done under the current
rsc_location?
> > There should be an API for health monitoring agents.
>
> More information?
>
> > This would be similar to cluster-wide default set by symmetric-cluster
true
> > (0) or false (-INFINITY).
>
> You lost me here.
When I moved the conversation here, I incorporated two of the comments from
the other
mailing list. The full comments were as follows:
misch at multinet.de wrote:
> There also should be a clean documented API to write you own health-...
agents
> monitoring the system itself.
lmb at suse.de wrote:
> Basically your mechanism modifies the "base score" for a node, somewhat
> similar to the cluster-wide default set by symmetric-cluster true (0) or
> false (-INFINITY).
>
> Sure, but I'd go even beyond this and just add the mechanism for setting
> the base score; translating health scores into these can be done outside
> the core system.
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20090427/e83426ab/attachment-0002.htm>
More information about the Pacemaker
mailing list