[Pacemaker] configuration variants for 2 node cluster
Christine Caulfield
ccaulfie at redhat.com
Tue Jun 24 07:55:37 UTC 2014
On 23/06/14 15:49, Digimer wrote:
> Hi Kostya,
>
> I'm having a little trouble understanding your question, sorry.
>
> On boot, the node will not start anything, so after booting it, you
> log in, check that it can talk to the peer node (a simple ping is
> generally enough), then start the cluster. It will join the peer's
> existing cluster (even if it's a cluster on just itself).
>
> If you booted both nodes, say after a power outage, you will check
> the connection (again, a simple ping is fine) and then start the cluster
> on both nodes at the same time.
wait_for_all helps with most of these situations. If a node goes down
then it won't start services until it's seen the non-failed node
because wait_for_all prevents a newly rebooted node from doing anything
on its own. This also takes care of the case where both nodes are
rebooted together of course, because that's the same as a new start.
Chrissie
> If one of the nodes needs to be shut down, say for repairs or
> upgrades, you migrate the services off of it and over to the peer node,
> then you stop the cluster (which tells the peer that the node is leaving
> the cluster). After that, the remaining node operates by itself. When
> you turn it back on, you rejoin the cluster and migrate the services back.
>
> I think, maybe, you are looking at things more complicated than you
> need to. Pacemaker and corosync will handle most of this for you, once
> setup properly. What operating system do you plan to use, and what
> cluster stack? I suspect it will be corosync + pacemaker, which should
> work fine.
>
> digimer
>
> On 23/06/14 10:36 AM, Kostiantyn Ponomarenko wrote:
>> Hi Digimer,
>>
>> Suppose I disabled to cluster on start up, but what about remaining
>> node, if I need to reboot it?
>> So, even in case of connection lost between these two nodes I need to
>> have one node working and providing resources.
>> How did you solve this situation?
>> Should it be a separate daemon which checks somehow connection between
>> the two nodes and decides to run corosync and pacemaker or to keep them
>> down?
>>
>> Thank you,
>> Kostya
>>
>>
>> On Mon, Jun 23, 2014 at 4:34 PM, Digimer <lists at alteeve.ca
>> <mailto:lists at alteeve.ca>> wrote:
>>
>> On 23/06/14 09:11 AM, Kostiantyn Ponomarenko wrote:
>>
>> Hi guys,
>> I want to gather all possible configuration variants for 2-node
>> cluster,
>> because it has a lot of pitfalls and there are not a lot of
>> information
>> across the internet about it. And also I have some questions
>> about
>> configurations and their specific problems.
>> VARIANT 1:
>> -----------------
>> We can use "two_node" and "wait_for_all" option from Corosync's
>> votequorum, and set up fencing agents with delay on one of them.
>> Here is a workflow(diagram) of this configuration:
>> 1. Node start.
>> 2. Cluster start (Corosync and Pacemaker) at the boot time.
>> 3. Wait for all nodes. All nodes joined?
>> No. Go to step 3.
>> Yes. Go to step 4.
>> 4. Start resources.
>> 5. Split brain situation (something with connection between
>> nodes).
>> 6. Fencing agent on the one of the nodes reboots the other node
>> (there
>> is a configured delay on one of the Fencing agents).
>> 7. Rebooted node go to step 1.
>> There are two (or more?) important things in this configuration:
>> 1. Rebooted node remains waiting for all nodes to be visible
>> (connection
>> should be restored).
>> 2. Suppose connection problem still exists and the node which
>> rebooted
>> the other guy has to be rebooted also (for some reasons). After
>> reboot
>> he is also stuck on step 3 because of connection problem.
>> QUESTION:
>> -----------------
>> Is it possible somehow to assign to the guy who won the reboot
>> race
>> (rebooted other guy) a status like a "primary" and allow him not
>> to wait
>> for all nodes after reboot. And neglect this status after
>> other node
>> joined this one.
>> So is it possible?
>> Right now that's the only configuration I know for 2 node
>> cluster.
>> Other variants are very appreciated =)
>> VARIANT 2 (not implemented, just a suggestion):
>> -----------------
>> I've been thinking about using external SSD drive (or other
>> external
>> drive). So for example fencing agent can reserve SSD using SCSI
>> command
>> and after that reboot the other node.
>> The main idea of this is the first node, as soon as a cluster
>> starts on
>> it, reserves SSD till the other node joins the cluster, after
>> that SCSI
>> reservation is removed.
>> 1. Node start
>> 2. Cluster start (Corosync and Pacemaker) at the boot time.
>> 3. Reserve SSD. Did it manage to reserve?
>> No. Don't start resources (Wait for all).
>> Yes. Go to step 4.
>> 4. Start resources.
>> 5. Remove SCSI reservation when the other node has joined.
>> 5. Split brain situation (something with connection between
>> nodes).
>> 6. Fencing agent tries to reserve SSD. Did it manage to reserve?
>> No. Maybe puts node in standby mode ...
>> Yes. Reboot the other node.
>> 7. Optional: a single node can keep SSD reservation till he is
>> alone in
>> the cluster or till his shut-down.
>> I am really looking forward to find the best solution (or a
>> couple of
>> them =)).
>> Hope I am not the only person ho is interested in this topic.
>>
>>
>> Thank you,
>> Kostya
>>
>>
>> Hi Kostya,
>>
>> I only build 2-node clusters, and I've not had problems with this
>> going back to 2009 over dozens of clusters. The tricks I found are:
>>
>> * Disable quorum (of course)
>> * Setup good fencing, and add a delay to the node you you prefer (or
>> pick one at random, if equal value) to avoid dual-fences
>> * Disable to cluster on start up, to prevent fence loops.
>>
>> That's it. With this, your 2-node cluster will be just fine.
>>
>> As for your question; Once a node is fenced successfully, the
>> resource manager (pacemaker) will take over any services lost on the
>> fenced node, if that is how you configured it. A node the either
>> gracefully leaves or dies/fenced should not interfere with the
>> remaining node.
>>
>> The problem is when a node vanishes and fencing fails. Then, not
>> knowing what the other node might be doing, the only safe option is
>> to block, otherwise you risk a split-brain. This is why fencing is
>> so important.
>>
>> Cheers
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person
>> without access to education?
>>
>> _________________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>
>> http://oss.clusterlabs.org/__mailman/listinfo/pacemaker
>> <http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>> http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf
>> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
More information about the Pacemaker
mailing list