[Pacemaker] Help with N+1 configuration

Phil Frost phil at macprofessionals.com
Fri Jul 27 15:56:06 UTC 2012


On 07/27/2012 11:48 AM, Cal Heldenbrand wrote:
> Why wouldn't my mem3 failover happen if it timed out stopping the 
> cluster IP?

If a stop action fails, pacemaker can't know if the resource is running, 
not running, or in some other broken state. The cluster is in an unknown 
state, and there's no reasonable thing pacemaker can do. Since pacemaker 
thinks a node is broken (it failed to stop a resource, as requested) but 
isn't sure, the solution is to transition to a known state by powering 
the node off, resetting it, or otherwise fencing it. Configure a STONITH 
resource to do this. Without STONITH, your only option is to manually 
address the cause of the failure (high load, in this case), then issue 
"crm resource cleanup ..." on any failed resources to instruct pacemaker 
that it is safe to try again.





More information about the Pacemaker mailing list