[Pacemaker] restart of resource is not attempted

Mon Mar 30 05:51:38 UTC 2009

Juha Heinanen wrote:
> i moved all my resources to the standby node.  on this node, mysql
> resource had a problem that prevented it from starting.  i fixed the
> problem and assumed that pacemaker would now automatically start mysql,
> but it does not even try.  it gave up after the first error even when i
> have in my crm config:

By default, a start failure is considered permanent, resulting in a
failcount of 1000000, meaning the resource cannot run on that node again.

You'll have to reset the failcount for that resource on the particular
node to make it run again.

You can change the behaviour by setting "start-failure-is-fatal" (in
crm_config) to false, which would just try starting the resource over
and over until it reaches its migration-threshold. A start failure then
only means failcount+1.

Regards
Dominik

> primitive mysql-server lsb:mysql op monitor interval=20s start-delay=30s
> property failure-timeout=10s
> property cluster-recheck-interval=1m
> 
> in ha-log i see
> 
> Mar 28 18:16:27 lenny1 pengine: [19745]: info: unpack_status: Node lenny1 is in standby-mode
> Mar 28 18:16:27 lenny1 pengine: [19745]: info: determine_online_status: Node lenny1 is standby
> Mar 28 18:16:27 lenny1 pengine: [19745]: notice: group_print: Resource Group: mysql-server-group
> Mar 28 18:16:27 lenny1 pengine: [19745]: notice: native_print:     fs0	(ocf::heartbeat:Filesystem):	Started lenny2
> Mar 28 18:16:27 lenny1 pengine: [19745]: notice: native_print:     virtual-ip	(ocf::heartbeat:IPaddr2):	Started lenny2
> Mar 28 18:16:27 lenny1 pengine: [19745]: notice: native_print:     mysql-server(lsb:mysql):	Stopped 
> Mar 28 18:16:27 lenny1 pengine: [19745]: notice: clone_print: Master/Slave Set: ms-drbd0
> Mar 28 18:16:27 lenny1 pengine: [19745]: notice: native_print:     drbd0:0	(ocf::heartbeat:drbd):	Master lenny2
> Mar 28 18:16:27 lenny1 pengine: [19745]: notice: native_print:     drbd0:1	(ocf::heartbeat:drbd):	Stopped 
> Mar 28 18:16:27 lenny1 pengine: [19745]: info: get_failcount: mysql-server has failed 1000000 times on lenny2
> 
> it is hard for me to believe that the failcount would be so large,
> because i have not seen in ha-log nor in syslog any trace of these
> restarts.
> 
> what it is that i'm missing here?
> 
> complete config was in the previous message.
> 
> -- juha
> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>