<div dir="ltr">Thanks for the reply,  I figured it out.  I was setting those resources like this:<div><br></div><div>pcs resource defaults cluster-rescheck-interval=15s<br></div><div><br></div><div>But that wasn&#39;t getting applied to existing resources.  Setting it explicitly for my pre-existing resource like this fixed the problem:</div>

<div><br></div><div>pcs resource update my_resource cluster-rescheck-interval=15s</div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, May 22, 2014 at 3:27 AM, Andrew Beekhof <span dir="ltr">&lt;<a href="mailto:andrew@beekhof.net" target="_blank">andrew@beekhof.net</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class=""><br>

On 22 May 2014, at 5:36 pm, David Nguyen &lt;<a href="mailto:d_k_nguyen@yahoo.com">d_k_nguyen@yahoo.com</a>&gt; wrote:<br>

<br>

&gt; Hi all,<br>

&gt;<br>

&gt; I&#39;m having the following problem.  I have the following settings for testing purposes:<br>

&gt;<br>

&gt; migration-threshold=1<br>

&gt; failure-timeout=15s<br>

&gt; cluster-recheck-interval=30s<br>

&gt;<br>

&gt; and verified those are in the running config via cibadmin --query<br>

<br>

</div>can we see that output?<br>

<div class=""><br>

&gt;<br>

&gt; The issue is that even with failure-timeout and cluster-recheck-interval set, I&#39;ve noticed that failcount resets at the default value of minutes.<br>

&gt;<br>

&gt; The way I tested this was to force a resource failure on both nodes (2 node cluster), then watch syslog and sure enough, the service rights itself after the 15minute mark.<br>

&gt;<br>

&gt; May 22 00:09:22 sac-prod1-ops-web-09 crmd[16843]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -&gt; S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]<br>

&gt;<br>

&gt; May 22 00:24:22 sac-prod1-ops-web-09 crmd[16843]:   notice: do_state_transition: State transition S_IDLE -&gt; S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ]<br>

&gt;<br>

&gt;<br>

&gt; Any ideas what I&#39;m doing wrong here?  I would like failcount to reset much faster<br>

&gt;<br>

&gt;<br>

&gt; My setup:<br>

&gt;<br>

&gt; 2 node centos6.5<br>

&gt; pacemaker-1.1.10-14.el6_5.3.x86_64<br>

&gt; corosync-1.4.1-17.el6_5.1.x86_64<br>

&gt;<br>

&gt;<br>

</div>&gt; _______________________________________________<br>

&gt; Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

&gt; <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

&gt;<br>

&gt; Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

&gt; Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

&gt; Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>

<br>

<br>_______________________________________________<br>

Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>

<br></blockquote></div><br></div>