[Pacemaker] configuration variants for 2 node cluster

Tue Jun 24 08:44:44 UTC 2014

On 24/06/14 09:36, Kostiantyn Ponomarenko wrote:
> Hi Chrissie,
>
> But wait_for_all doesn't help when there is no connection between the nodes.
> Because in case I need to reboot the remaining working node I won't get
> working cluster after that - both nodes will be waiting connection
> between them.
> That's why I am looking for the solution which could help me to get one
> node working in this situation (after reboot).
> I've been thinking about some kind of marker which could help a node to
> determine a state of the other node.
> Like external disk and SCSI reservation command. Maybe you could suggest
> another kind of marker?
> I am not sure can we use a presents of a file on external SSD as the
> marker. Kind of: if there is a file - the other node is alive, if no -
> node is dead.
>

More seriously, that solution is harder than it might seem - which is 
one reason qdiskd was as complex as it became, and why votequorum is as 
conservative as it is when it comes to declaring a workable cluster. If 
someone is there to manually reboot nodes then it might be as well for a 
human decision to be made about which one is capable of running services.

Chrissie

> Digimer,
>
> Thanks for the links and information.
> Anyway if I go this way, I will write my own daemon to determine a state
> of the other node.
> Also the information about fence loop is new for me, thanks =)
>
> Thank you,
> Kostya
>
>
> On Tue, Jun 24, 2014 at 10:55 AM, Christine Caulfield
> <ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>> wrote:
>
>     On 23/06/14 15:49, Digimer wrote:
>
>         Hi Kostya,
>
>             I'm having a little trouble understanding your question, sorry.
>
>             On boot, the node will not start anything, so after booting
>         it, you
>         log in, check that it can talk to the peer node (a simple ping is
>         generally enough), then start the cluster. It will join the peer's
>         existing cluster (even if it's a cluster on just itself).
>
>             If you booted both nodes, say after a power outage, you will
>         check
>         the connection (again, a simple ping is fine) and then start the
>         cluster
>         on both nodes at the same time.
>
>
>
>     wait_for_all helps with most of these situations. If a node goes
>     down then it won't start services until it's seen the non-failed
>     node because wait_for_all prevents a newly rebooted node from doing
>     anything on its own. This also takes care of the case where both
>     nodes are rebooted together of course, because that's the same as a
>     new start.
>
>     Chrissie
>
>
>             If one of the nodes needs to be shut down, say for repairs or
>         upgrades, you migrate the services off of it and over to the
>         peer node,
>         then you stop the cluster (which tells the peer that the node is
>         leaving
>         the cluster). After that, the remaining node operates by itself.
>         When
>         you turn it back on, you rejoin the cluster and migrate the
>         services back.
>
>             I think, maybe, you are looking at things more complicated
>         than you
>         need to. Pacemaker and corosync will handle most of this for
>         you, once
>         setup properly. What operating system do you plan to use, and what
>         cluster stack? I suspect it will be corosync + pacemaker, which
>         should
>         work fine.
>
>         digimer
>
>         On 23/06/14 10:36 AM, Kostiantyn Ponomarenko wrote:
>
>             Hi Digimer,
>
>             Suppose I disabled to cluster on start up, but what about
>             remaining
>             node, if I need to reboot it?
>             So, even in case of connection lost between these two nodes
>             I need to
>             have one node working and providing resources.
>             How did you solve this situation?
>             Should it be a separate daemon which checks somehow
>             connection between
>             the two nodes and decides to run corosync and pacemaker or
>             to keep them
>             down?
>
>             Thank you,
>             Kostya
>
>
>             On Mon, Jun 23, 2014 at 4:34 PM, Digimer <lists at alteeve.ca
>             <mailto:lists at alteeve.ca>
>             <mailto:lists at alteeve.ca <mailto:lists at alteeve.ca>>> wrote:
>
>                  On 23/06/14 09:11 AM, Kostiantyn Ponomarenko wrote:
>
>                      Hi guys,
>                      I want to gather all possible configuration
>             variants for 2-node
>                      cluster,
>                      because it has a lot of pitfalls and there are not
>             a lot of
>                      information
>                      across the internet about it. And also I have some
>             questions
>             about
>                      configurations and their specific problems.
>                      VARIANT 1:
>                      -----------------
>                      We can use "two_node" and "wait_for_all" option
>             from Corosync's
>                      votequorum, and set up fencing agents with delay on
>             one of them.
>                      Here is a workflow(diagram) of this configuration:
>                      1. Node start.
>                      2. Cluster start (Corosync and Pacemaker) at the
>             boot time.
>                      3. Wait for all nodes. All nodes joined?
>                            No. Go to step 3.
>                            Yes. Go to step 4.
>                      4. Start resources.
>                      5. Split brain situation (something with connection
>             between
>             nodes).
>                      6. Fencing agent on the one of the nodes reboots
>             the other node
>                      (there
>                      is a configured delay on one of the Fencing agents).
>                      7. Rebooted node go to step 1.
>                      There are two (or more?) important things in this
>             configuration:
>                      1. Rebooted node remains waiting for all nodes to
>             be visible
>                      (connection
>                      should be restored).
>                      2. Suppose connection problem still exists and the
>             node which
>                      rebooted
>                      the other guy has to be rebooted also (for some
>             reasons). After
>                      reboot
>                      he is also stuck on step 3 because of connection
>             problem.
>                      QUESTION:
>                      -----------------
>                      Is it possible somehow to assign to the guy who won
>             the reboot
>             race
>                      (rebooted other guy) a status like a "primary" and
>             allow him not
>                      to wait
>                      for all nodes after reboot. And neglect this status
>             after
>             other node
>                      joined this one.
>                      So is it possible?
>                      Right now that's the only configuration I know for
>             2 node
>             cluster.
>                      Other variants are very appreciated =)
>                      VARIANT 2 (not implemented, just a suggestion):
>                      -----------------
>                      I've been thinking about using external SSD drive
>             (or other
>             external
>                      drive). So for example fencing agent can reserve
>             SSD using SCSI
>                      command
>                      and after that reboot the other node.
>                      The main idea of this is the first node, as soon as
>             a cluster
>                      starts on
>                      it, reserves SSD till the other node joins the
>             cluster, after
>                      that SCSI
>                      reservation is removed.
>                      1. Node start
>                      2. Cluster start (Corosync and Pacemaker) at the
>             boot time.
>                      3. Reserve SSD. Did it manage to reserve?
>                            No. Don't start resources (Wait for all).
>                            Yes. Go to step 4.
>                      4. Start resources.
>                      5. Remove SCSI reservation when the other node has
>             joined.
>                      5. Split brain situation (something with connection
>             between
>             nodes).
>                      6. Fencing agent tries to reserve SSD. Did it
>             manage to reserve?
>                            No. Maybe puts node in standby mode ...
>                            Yes. Reboot the other node.
>                      7. Optional: a single node can keep SSD reservation
>             till he is
>                      alone in
>                      the cluster or till his shut-down.
>                      I am really looking forward to find the best
>             solution (or a
>                      couple of
>                      them =)).
>                      Hope I am not the only person ho is interested in
>             this topic.
>
>
>                      Thank you,
>                      Kostya
>
>
>                  Hi Kostya,
>
>                     I only build 2-node clusters, and I've not had
>             problems with this
>                  going back to 2009 over dozens of clusters. The tricks
>             I found are:
>
>                  * Disable quorum (of course)
>                  * Setup good fencing, and add a delay to the node you
>             you prefer (or
>                  pick one at random, if equal value) to avoid dual-fences
>                  * Disable to cluster on start up, to prevent fence loops.
>
>                     That's it. With this, your 2-node cluster will be
>             just fine.
>
>                     As for your question; Once a node is fenced
>             successfully, the
>                  resource manager (pacemaker) will take over any
>             services lost on the
>                  fenced node, if that is how you configured it. A node
>             the either
>                  gracefully leaves or dies/fenced should not interfere
>             with the
>                  remaining node.
>
>                     The problem is when a node vanishes and fencing
>             fails. Then, not
>                  knowing what the other node might be doing, the only
>             safe option is
>                  to block, otherwise you risk a split-brain. This is why
>             fencing is
>                  so important.
>
>                  Cheers
>
>                  --
>                  Digimer
>                  Papers and Projects: https://alteeve.ca/w/
>                  What if the cure for cancer is trapped in the mind of a
>             person
>                  without access to education?
>
>                  ___________________________________________________
>                  Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>             <mailto:Pacemaker at oss.clusterlabs.org>
>                  <mailto:Pacemaker at oss.__clusterlabs.org
>             <mailto:Pacemaker at oss.clusterlabs.org>>
>             http://oss.clusterlabs.org/____mailman/listinfo/pacemaker
>             <http://oss.clusterlabs.org/__mailman/listinfo/pacemaker>
>
>             <http://oss.clusterlabs.org/__mailman/listinfo/pacemaker
>             <http://oss.clusterlabs.org/mailman/listinfo/pacemaker>>
>
>                  Project Home: http://www.clusterlabs.org
>                  Getting started:
>             http://www.clusterlabs.org/____doc/Cluster_from_Scratch.pdf
>             <http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf>
>
>             <http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf
>             <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>>
>                  Bugs: http://bugs.clusterlabs.org
>
>
>
>
>             _________________________________________________
>             Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>             <mailto:Pacemaker at oss.clusterlabs.org>
>             http://oss.clusterlabs.org/__mailman/listinfo/pacemaker
>             <http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
>
>             Project Home: http://www.clusterlabs.org
>             Getting started:
>             http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf
>             <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>             Bugs: http://bugs.clusterlabs.org
>
>
>
>
>
>     _________________________________________________
>     Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>     <mailto:Pacemaker at oss.clusterlabs.org>
>     http://oss.clusterlabs.org/__mailman/listinfo/pacemaker
>     <http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
>
>     Project Home: http://www.clusterlabs.org
>     Getting started:
>     http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf
>     <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>     Bugs: http://bugs.clusterlabs.org
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>