Devin,<br>Thanks for your support.<br><br>As I have tested, its not a problem on the shutdown order. On a regular shutdown everything is working fine until I pull the power cable. After losing the ilo communication the status of the online node changes to &quot;online UNCLEAN&quot;. The other node which is turned off and without any power gets &quot;offline UNCLEAN&quot;. In that situation you can&#39;t manage the resources anymore.<br>I think, that isn&#39;t the behavior of cluster system, if I power off the complete second rack, the resources get lost.<br><br>Thanks<br>Hannes<br><br><br>Devin wrote:<br><br>&gt; You mean with corosync will work fine, because I am using heartbeat instead.<br><br>I suspect that it&#39;s a similar situation with heartbeat. &nbsp;The problem is<br>pacemaker losing communication before the node cleanly disconnects.<br><br>The behavior I saw on my own clusters is that because the init script<br>values were bad, the node&#39;s network interfaces would be brought down<br>before the node had cleanly left the cluster. &nbsp;Since the second node<br>didn&#39;t see a clean disconnect and couldn&#39;t contact the first node, it<br>would stonith the first node sometime after the first node&#39;s network<br>was down but before it was halted (which is pretty rude and can be<br>hard on filesystem integrity).<br><br>&gt; The resource wouldn&#39;t be started by the other node, because it can&#39;t fence the missing node without power on ILO.<br><br>The point that I was trying to make is that your nodes shouldn&#39;t be<br>trying to fence each other unless a node is _unexpectedly_ unreachable.<br>During maintenance, which you presumably do with a controlled shutdown,<br>there should be no fencing at all because the node-going-into-maintenance<br>should first disconnect cleanly. &nbsp;(Because of the bad sequencing where<br>corosync/pacemaker was shut down after the networks went down, a clean<br>disconnect wouldn&#39;t happen, and then the node would get fenced.)<br><br>For a clean shutdown, the cluster should move all resources _before_<br>the node disconnects, thus not requiring fencing in order to run them<br>on the other node.<br><br>Fencing should be an action of last choice, not the normal mode of operation.<br><br>In the case of a true hardware fault, and assuming that you&#39;re using<br>redundant power supplies fed by independent power sources, you wouldn&#39;t<br>see this behavior either unless you were dealing with multiple failures<br>(which is problematic in various ways).<br><br>So whether you&#39;re using heartbeat or corosync, I&#39;d look at your startup<br>and shutdown sequence and ensure that during controlled operations no<br>fencing is being triggered.<br><br>(You can still test your fencing by pulling your non-ILO network<br>cables instead of pulling the power cord.)<br><br>If you&#39;re still concerned about the choice of stonith device and have<br>only one power supply, you can look at something like an APC switched PDU,<br>but I suspect that you&#39;re further ahead (for all of cost, complexity,<br>and redundancy) in using dual power supplies and the ILO.<br><br>Devin<br>-- <br>If it&#39;s sinful, it&#39;s more fun.<br><br><br><br><br>Sent from my HTC<br><br><br><br><br><br>