[Pacemaker] Reset failcount for resources

Thu Nov 13 11:08:13 UTC 2014

Hi

I am running a 2 node cluster with this config

Master/Slave Set: foo-master [foo]
Masters: [ bharat ]
Slaves: [ ram ]
AC_FLT (ocf::pw:IPaddr): Started bharat
CR_CP_FLT (ocf::pw:IPaddr): Started bharat
CR_UP_FLT (ocf::pw:IPaddr): Started bharat
Mgmt_FLT (ocf::pw:IPaddr): Started bharat

where IPaddr RA is just modified IPAddr2 RA. Additionally i have a
collocation constraint for the IP addr to be collocated with the master.
I have set the migration-threshold as 2 for the VIP. I also have set the
failure-timeout to 15s.

Initially i bring down the interface on bharat to force switch-over to ram.
After this i fail the interfaces on bharat again. Now i bring the interface
up again on ram. However the virtual IP's are now in stopped state.

I don't get out of this unless i use crm_resource -C to reset state of
resources.
However if i check failcount of resources after this it's still set as
INFINITY.
Based on the documentation the failcount on a node should have expired
after the failure-timeout.That doesn't happen. However why don't we reset
the count after the the crm_resource -C command too. Any other command to
actually reset the failcount.

Thanks in advance

Regards
Arjun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20141113/52b2a547/attachment-0003.html>