[Pacemaker] clear failcount when monitor is successful?
Lars Marowsky-Bree
lmb at suse.com
Wed Apr 24 11:24:51 UTC 2013
On 2013-04-24T10:37:24, Johan Huysmans <johan.huysmans at inuits.be> wrote:
> --> start situation
> * scope=status name=fail-count-d_tomcat value=0
> * depending resource group running on node
> * crm_mon shows everything ok
>
> --> a failure occurs
> * scope=status name=fail-count-d_tomcat value=1
> * depending resource group stopping on node
> * crm_mon shows failure
>
> --> After 30s (= failure-timeout)
> * scope=status name=fail-count-d_tomcat value=1
> * depending resource group not running on node
> * crm_mon shows NO failure !!!!!
This, by itself, is not necessarily surprising. The property
"cluster-reheck-interval" defines how often the PE gets re-run, and
defaults to 15 minutes.
This is not dynamically adjusted based on failure-timeouts, and if this
feature becomes more widely used, there probably should be a "better"
way to handle/trigger these while still avoiding swamping the cluster
with empty transitions etc.
In short: right now, if you want a failure-timeout of 30s to be
meaningful, you need to set cluster-recheck-interval to something
shorter.
> --> After something changes in the cluster or the recheck interval
> * scope=status name=fail-count-d_tomcat value=0
> * depending resource group can run on node
> * crm_mon shows no failure
> * BUT my resource is still monitored and failing!
I'm not sure I perfectly get what you're saying here with the last
sentence. Did the cluster try to restart it, and it failed again, yet
the failure was ignored this time around?
> I find it disturbing that a resource with a failing monitor has a 0
> failcount, shows ok in crm_mon and allows to run the depending
> resources.
Yes, if I got that right, that would be a problem - please create a
hb_/crm_report and open a bug.
Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
More information about the Pacemaker
mailing list