[Pacemaker] Help with N+1 configuration
Phil Frost
phil at macprofessionals.com
Fri Jul 27 15:56:06 UTC 2012
On 07/27/2012 11:48 AM, Cal Heldenbrand wrote:
> Why wouldn't my mem3 failover happen if it timed out stopping the
> cluster IP?
If a stop action fails, pacemaker can't know if the resource is running,
not running, or in some other broken state. The cluster is in an unknown
state, and there's no reasonable thing pacemaker can do. Since pacemaker
thinks a node is broken (it failed to stop a resource, as requested) but
isn't sure, the solution is to transition to a known state by powering
the node off, resetting it, or otherwise fencing it. Configure a STONITH
resource to do this. Without STONITH, your only option is to manually
address the cause of the failure (high load, in this case), then issue
"crm resource cleanup ..." on any failed resources to instruct pacemaker
that it is safe to try again.
More information about the Pacemaker
mailing list