[Pacemaker] Cluster goes to (unmanaged) Failed state when both nodes are rebooted together

Tue Oct 25 17:33:57 UTC 2011

On 25 Oct 2011, at 05:46, Dejan Muhamedagic wrote:

> I'm afraid that quorum ignore and no stonith combination is a
> lousy premise for a HA cluster. Since this is a two-node cluster,
> ignoring quorum is necessary. But stonith is absolutely necessary
> then to retain sanity. Otherwise, you'll be running into problems
> whenever nodes reboot or lose connectivity.

Is there any logical way around this situation?  Like startup-fencing?  Or simply manual intervention when the nodes fail at the same time?  Or (they are Dell servers with a DRAC6) maybe I could/should enable DRAC-based stonith on my 2-node cluster.  My cluster simple, deliberately active/passive (many drbds, with node location constraints splitting the primaries vaguely down the middle) so I think I have much less risk of data corruption from split-brain.  But I suppose that if the hosts both think they are The Cluster without being able to talk to each other, they'll both try to grab the drbd primary role.

Trying to picture the scenario.  Testing needed :)

n