[Pacemaker] configuration variants for 2 node cluster
Christine Caulfield
ccaulfie at redhat.com
Tue Jun 24 08:44:44 UTC 2014
On 24/06/14 09:36, Kostiantyn Ponomarenko wrote:
> Hi Chrissie,
>
> But wait_for_all doesn't help when there is no connection between the nodes.
> Because in case I need to reboot the remaining working node I won't get
> working cluster after that - both nodes will be waiting connection
> between them.
> That's why I am looking for the solution which could help me to get one
> node working in this situation (after reboot).
> I've been thinking about some kind of marker which could help a node to
> determine a state of the other node.
> Like external disk and SCSI reservation command. Maybe you could suggest
> another kind of marker?
> I am not sure can we use a presents of a file on external SSD as the
> marker. Kind of: if there is a file - the other node is alive, if no -
> node is dead.
>
More seriously, that solution is harder than it might seem - which is
one reason qdiskd was as complex as it became, and why votequorum is as
conservative as it is when it comes to declaring a workable cluster. If
someone is there to manually reboot nodes then it might be as well for a
human decision to be made about which one is capable of running services.
Chrissie
> Digimer,
>
> Thanks for the links and information.
> Anyway if I go this way, I will write my own daemon to determine a state
> of the other node.
> Also the information about fence loop is new for me, thanks =)
>
> Thank you,
> Kostya
>
>
> On Tue, Jun 24, 2014 at 10:55 AM, Christine Caulfield
> <ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>> wrote:
>
> On 23/06/14 15:49, Digimer wrote:
>
> Hi Kostya,
>
> I'm having a little trouble understanding your question, sorry.
>
> On boot, the node will not start anything, so after booting
> it, you
> log in, check that it can talk to the peer node (a simple ping is
> generally enough), then start the cluster. It will join the peer's
> existing cluster (even if it's a cluster on just itself).
>
> If you booted both nodes, say after a power outage, you will
> check
> the connection (again, a simple ping is fine) and then start the
> cluster
> on both nodes at the same time.
>
>
>
> wait_for_all helps with most of these situations. If a node goes
> down then it won't start services until it's seen the non-failed
> node because wait_for_all prevents a newly rebooted node from doing
> anything on its own. This also takes care of the case where both
> nodes are rebooted together of course, because that's the same as a
> new start.
>
> Chrissie
>
>
> If one of the nodes needs to be shut down, say for repairs or
> upgrades, you migrate the services off of it and over to the
> peer node,
> then you stop the cluster (which tells the peer that the node is
> leaving
> the cluster). After that, the remaining node operates by itself.
> When
> you turn it back on, you rejoin the cluster and migrate the
> services back.
>
> I think, maybe, you are looking at things more complicated
> than you
> need to. Pacemaker and corosync will handle most of this for
> you, once
> setup properly. What operating system do you plan to use, and what
> cluster stack? I suspect it will be corosync + pacemaker, which
> should
> work fine.
>
> digimer
>
> On 23/06/14 10:36 AM, Kostiantyn Ponomarenko wrote:
>
> Hi Digimer,
>
> Suppose I disabled to cluster on start up, but what about
> remaining
> node, if I need to reboot it?
> So, even in case of connection lost between these two nodes
> I need to
> have one node working and providing resources.
> How did you solve this situation?
> Should it be a separate daemon which checks somehow
> connection between
> the two nodes and decides to run corosync and pacemaker or
> to keep them
> down?
>
> Thank you,
> Kostya
>
>
> On Mon, Jun 23, 2014 at 4:34 PM, Digimer <lists at alteeve.ca
> <mailto:lists at alteeve.ca>
> <mailto:lists at alteeve.ca <mailto:lists at alteeve.ca>>> wrote:
>
> On 23/06/14 09:11 AM, Kostiantyn Ponomarenko wrote:
>
> Hi guys,
> I want to gather all possible configuration
> variants for 2-node
> cluster,
> because it has a lot of pitfalls and there are not
> a lot of
> information
> across the internet about it. And also I have some
> questions
> about
> configurations and their specific problems.
> VARIANT 1:
> -----------------
> We can use "two_node" and "wait_for_all" option
> from Corosync's
> votequorum, and set up fencing agents with delay on
> one of them.
> Here is a workflow(diagram) of this configuration:
> 1. Node start.
> 2. Cluster start (Corosync and Pacemaker) at the
> boot time.
> 3. Wait for all nodes. All nodes joined?
> No. Go to step 3.
> Yes. Go to step 4.
> 4. Start resources.
> 5. Split brain situation (something with connection
> between
> nodes).
> 6. Fencing agent on the one of the nodes reboots
> the other node
> (there
> is a configured delay on one of the Fencing agents).
> 7. Rebooted node go to step 1.
> There are two (or more?) important things in this
> configuration:
> 1. Rebooted node remains waiting for all nodes to
> be visible
> (connection
> should be restored).
> 2. Suppose connection problem still exists and the
> node which
> rebooted
> the other guy has to be rebooted also (for some
> reasons). After
> reboot
> he is also stuck on step 3 because of connection
> problem.
> QUESTION:
> -----------------
> Is it possible somehow to assign to the guy who won
> the reboot
> race
> (rebooted other guy) a status like a "primary" and
> allow him not
> to wait
> for all nodes after reboot. And neglect this status
> after
> other node
> joined this one.
> So is it possible?
> Right now that's the only configuration I know for
> 2 node
> cluster.
> Other variants are very appreciated =)
> VARIANT 2 (not implemented, just a suggestion):
> -----------------
> I've been thinking about using external SSD drive
> (or other
> external
> drive). So for example fencing agent can reserve
> SSD using SCSI
> command
> and after that reboot the other node.
> The main idea of this is the first node, as soon as
> a cluster
> starts on
> it, reserves SSD till the other node joins the
> cluster, after
> that SCSI
> reservation is removed.
> 1. Node start
> 2. Cluster start (Corosync and Pacemaker) at the
> boot time.
> 3. Reserve SSD. Did it manage to reserve?
> No. Don't start resources (Wait for all).
> Yes. Go to step 4.
> 4. Start resources.
> 5. Remove SCSI reservation when the other node has
> joined.
> 5. Split brain situation (something with connection
> between
> nodes).
> 6. Fencing agent tries to reserve SSD. Did it
> manage to reserve?
> No. Maybe puts node in standby mode ...
> Yes. Reboot the other node.
> 7. Optional: a single node can keep SSD reservation
> till he is
> alone in
> the cluster or till his shut-down.
> I am really looking forward to find the best
> solution (or a
> couple of
> them =)).
> Hope I am not the only person ho is interested in
> this topic.
>
>
> Thank you,
> Kostya
>
>
> Hi Kostya,
>
> I only build 2-node clusters, and I've not had
> problems with this
> going back to 2009 over dozens of clusters. The tricks
> I found are:
>
> * Disable quorum (of course)
> * Setup good fencing, and add a delay to the node you
> you prefer (or
> pick one at random, if equal value) to avoid dual-fences
> * Disable to cluster on start up, to prevent fence loops.
>
> That's it. With this, your 2-node cluster will be
> just fine.
>
> As for your question; Once a node is fenced
> successfully, the
> resource manager (pacemaker) will take over any
> services lost on the
> fenced node, if that is how you configured it. A node
> the either
> gracefully leaves or dies/fenced should not interfere
> with the
> remaining node.
>
> The problem is when a node vanishes and fencing
> fails. Then, not
> knowing what the other node might be doing, the only
> safe option is
> to block, otherwise you risk a split-brain. This is why
> fencing is
> so important.
>
> Cheers
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a
> person
> without access to education?
>
> ___________________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> <mailto:Pacemaker at oss.clusterlabs.org>
> <mailto:Pacemaker at oss.__clusterlabs.org
> <mailto:Pacemaker at oss.clusterlabs.org>>
> http://oss.clusterlabs.org/____mailman/listinfo/pacemaker
> <http://oss.clusterlabs.org/__mailman/listinfo/pacemaker>
>
> <http://oss.clusterlabs.org/__mailman/listinfo/pacemaker
> <http://oss.clusterlabs.org/mailman/listinfo/pacemaker>>
>
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/____doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf>
>
> <http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>>
> Bugs: http://bugs.clusterlabs.org
>
>
>
>
> _________________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> <mailto:Pacemaker at oss.clusterlabs.org>
> http://oss.clusterlabs.org/__mailman/listinfo/pacemaker
> <http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
>
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org
>
>
>
>
>
> _________________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> <mailto:Pacemaker at oss.clusterlabs.org>
> http://oss.clusterlabs.org/__mailman/listinfo/pacemaker
> <http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
>
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Pacemaker
mailing list