[Pacemaker] failcount always resets at 15m mark regardless of cluster-recheck-interval

David Nguyen d_k_nguyen at yahoo.com
Thu May 22 03:36:35 EDT 2014


Hi all,

I'm having the following problem.  I have the following settings for
testing purposes:

migration-threshold=1
failure-timeout=15s
cluster-recheck-interval=30s

and verified those are in the running config via cibadmin --query

The issue is that even with failure-timeout and cluster-recheck-interval
set, I've noticed that failcount resets at the default value of minutes.

The way I tested this was to force a resource failure on both nodes (2 node
cluster), then watch syslog and sure enough, the service rights itself
after the 15minute mark.

May 22 00:09:22 sac-prod1-ops-web-09 crmd[16843]:   notice:
do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [
input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]

May 22 00:24:22 sac-prod1-ops-web-09 crmd[16843]:   notice:
do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [
input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ]


Any ideas what I'm doing wrong here?  I would like failcount to reset much
faster


My setup:

2 node centos6.5
pacemaker-1.1.10-14.el6_5.3.x86_64
corosync-1.4.1-17.el6_5.1.x86_64
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140522/f670a957/attachment-0002.html>


More information about the Pacemaker mailing list