[Pacemaker] starting resources with failed stonith resource
Frank Van Damme
frank.vandamme at gmail.com
Tue Jan 7 15:41:18 UTC 2014
Hi list,
I recently had some trouble with a dual-node mysql cluster, which runs
in master-slave mode with Percona resource manager. While analyzing
what happened to the cluster, I found this in syslog (network trouble,
the cluster lost disk/iscsi access on both nodes, this is a piece from
the former master trying to start up again when recovering
connectivity):
Jan 6 07:26:49 infante pengine: [3839]: notice: get_failcount:
Failcount for MasterSlave_mysql on infante has expired (limit was 60s)
Jan 6 07:26:49 infante pengine: [3839]: notice: get_failcount:
Failcount for MasterSlave_mysql on infante has expired (limit was 60s)
Jan 6 07:26:49 infante pengine: [3839]: WARN:
common_apply_stickiness: Forcing p-stonith-ingstad away from infante
after 1000000 failures (max=1000000)
Jan 6 07:26:49 infante pengine: [3839]: notice: LogActions: Start
prim_mysql:0#011(infante)
Jan 6 07:26:49 infante pengine: [3839]: notice: LogActions: Start
prim_mysql:1#011(ingstad)
I don't understand it: if this means that the stonith devices have
failed a million times, why is it trying to start the mysql resource?
It's agains Pacemaker policies to start resources on a cluster without
working stonith devices, isn't it?
--
Frank Van Damme
Make everything as simple as possible, but not simpler. - Albert Einstein
More information about the Pacemaker
mailing list