[Pacemaker] Reset failcount for resources

Mon Nov 17 06:40:12 UTC 2014

> On 13 Nov 2014, at 10:08 pm, Arjun Pandey <apandepublic at gmail.com> wrote:
> 
> Hi 
> 
> I am running a 2 node cluster with this config
> 
> Master/Slave Set: foo-master [foo]
> Masters: [ bharat ]
> Slaves: [ ram ]
> AC_FLT (ocf::pw:IPaddr): Started bharat
> CR_CP_FLT (ocf::pw:IPaddr): Started bharat
> CR_UP_FLT (ocf::pw:IPaddr): Started bharat
> Mgmt_FLT (ocf::pw:IPaddr): Started bharat
> 
> where IPaddr RA is just modified IPAddr2 RA. Additionally i have a
> collocation constraint for the IP addr to be collocated with the master.
> I have set the migration-threshold as 2 for the VIP. I also have set the failure-timeout to 15s.
> 
> 
> Initially i bring down the interface on bharat to force switch-over to ram. After this i fail the interfaces on bharat again. Now i bring the interface up again on ram. However the virtual IP's are now in stopped state.
> 
> I don't get out of this unless i use crm_resource -C to reset state of resources.
> However if i check failcount of resources after this it's still set as INFINITY.

crm_resource didn't always reset the failcount. I'd encourage you to upgrade your pacemaker packages.

> Based on the documentation the failcount on a node should have expired after the failure-timeout.That doesn't happen. However why don't we reset the count after the the crm_resource -C command too. Any other command to actually reset the failcount.

There should be 'crm_failcount' that will do this

> 
> Thanks in advance
> 
> Regards
> Arjun
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org