[Pacemaker] configuration variants for 2 node cluster

Mon Jun 23 15:11:15 CEST 2014

Hi guys,

I want to gather all possible configuration variants for 2-node cluster,
because it has a lot of pitfalls and there are not a lot of information
across the internet about it. And also I have some questions about
configurations and their specific problems.

VARIANT 1:
-----------------
We can use "two_node" and "wait_for_all" option from Corosync's votequorum,
and set up fencing agents with delay on one of them.

Here is a workflow(diagram) of this configuration:

1. Node start.
2. Cluster start (Corosync and Pacemaker) at the boot time.
3. Wait for all nodes. All nodes joined?
    No. Go to step 3.
    Yes. Go to step 4.
4. Start resources.
5. Split brain situation (something with connection between nodes).
6. Fencing agent on the one of the nodes reboots the other node (there is a
configured delay on one of the Fencing agents).
7. Rebooted node go to step 1.

There are two (or more?) important things in this configuration:
1. Rebooted node remains waiting for all nodes to be visible (connection
should be restored).
2. Suppose connection problem still exists and the node which rebooted the
other guy has to be rebooted also (for some reasons). After reboot he is
also stuck on step 3 because of connection problem.

QUESTION:
-----------------
Is it possible somehow to assign to the guy who won the reboot race
(rebooted other guy) a status like a "primary" and allow him not to wait
for all nodes after reboot. And neglect this status after other node joined
this one.
So is it possible?

Right now that's the only configuration I know for 2 node cluster.
Other variants are very appreciated =)

VARIANT 2 (not implemented, just a suggestion):
-----------------
I've been thinking about using external SSD drive (or other external
drive). So for example fencing agent can reserve SSD using SCSI command and
after that reboot the other node.

The main idea of this is the first node, as soon as a cluster starts on it,
reserves SSD till the other node joins the cluster, after that SCSI
reservation is removed.

1. Node start
2. Cluster start (Corosync and Pacemaker) at the boot time.
3. Reserve SSD. Did it manage to reserve?
    No. Don't start resources (Wait for all).
    Yes. Go to step 4.
4. Start resources.
5. Remove SCSI reservation when the other node has joined.
5. Split brain situation (something with connection between nodes).
6. Fencing agent tries to reserve SSD. Did it manage to reserve?
    No. Maybe puts node in standby mode ...
    Yes. Reboot the other node.
7. Optional: a single node can keep SSD reservation till he is alone in the
cluster or till his shut-down.

I am really looking forward to find the best solution (or a couple of them
=)).
Hope I am not the only person ho is interested in this topic.

Thank you,
Kostya
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140623/bd4fb63a/attachment.html>