[Pacemaker] New System Health feature

Andrew Beekhof andrew at beekhof.net
Thu May 7 11:06:23 EDT 2009


On Wed, May 6, 2009 at 11:32 PM, Mark Hamzy <hamzy at us.ibm.com> wrote:
> beekhof at gmail.com wrote on 04/28/2009 10:31:43 AM:
>> Actually, it would still work if the entity responsible for updating
>> the node health combined the readings from the different sources into
>> a single value.
>> However, then you start to require a daemon and some way to configure
>> it (in order to specify how the sources should be combined).
>> And of course eventually people will want more detail than the
>> combined score... "why is the health red? ".
>
> Yes. There is really only one overall health status of a system.

No there's not, there's several.  You're asking for one attribute per device.

Re-read what I wrote:
  "if the entity responsible for updating the node health combined the
readings from the different sources into a single value"

Thats the complete opposite of what you're talking about.

> It
> can be summed from multiple, independent reporting mechanisms (ipmi,
> smart, mcelog, etc).

This is where the disconnect is.
You seem convinced that everyone will want to sum them up the same way
you do, for every resource in the cluster.
I'm not so sure.

A local disk failure would surely result in RED for that health but
doesn't need to stop the node from hosting resources that don't need
that disk.

> I don't think that complex rules need to exist in order to combine
> scores. Any "red" health forces -INF. All "yellow" healths
> keep weighting the resource more and more to a different node. -INF
> plus anything equals -INF. So a simple merge_weights would work.

Agreed, assuming thats how everyone wanted it to work.




More information about the Pacemaker mailing list