[Pacemaker] Y should pacemaker be started simultaneously.

Andrei Borzenkov arvidjaar at gmail.com
Sat Oct 18 06:18:47 CEST 2014


В Mon, 06 Oct 2014 10:27:49 -0400
Digimer <lists at alteeve.ca> пишет:

> On 06/10/14 02:11 AM, Andrei Borzenkov wrote:
> > On Mon, Oct 6, 2014 at 9:03 AM, Digimer <lists at alteeve.ca> wrote:
> >> If stonith was configured, after the time out, the first node would fence
> >> the second node ("unable to reach" != "off").
> >>
> >> Alternatively, you can set corosync to 'wait_for_all' and have the first
> >> node do nothing until it sees the peer.
> >>
> >
> > Am I right that wait_for_all is available only in corosync 2.x and not in 1.x?
> 
> You are correct, yes.
> 
> >> To do otherwise would be to risk a split-brain. Each node needs to know the
> >> state of the peer in order to run services safely. By having both start at
> >> the same time, then they know what the other is doing. By disabling quorum,
> >> you allow one node to continue to operate when the other leaves, but it
> >> needs that initial connection to know for sure what it's doing.
> >>
> >
> > Does it apply to both corosync 1.x and 2.x or only to 2.x with
> > wait_for_all? Because I actually also was confused about precise
> > meaning of disabling quorum in pacemaker (setting no-quorum-policy:
> > ignore). So if I have two node cluster with pacemaker 1.x and corosync
> > 1.x with no-quorum-policy=ignore and no fencing - what happens when
> > one single node starts?
> 
> Quorum tells the cluster that if a peer leaves (gracefully or was 
> fenced), the remaining node is allowed to continue providing services.
> 
> Stonith is needed to put a node that is in an unknown state into a known 
> state; Be it because it couldn't reach the node when starting or because 
> the node stopped responding.
> 
> So quorum and stonith play rather different roles.
> 
> Without stonith, regardless of quorum, you risk split-brains and/or data 
> corruption. Operating a cluster without stonith is to operate a cluster 
> in an undermined state and should never be done.
> 

OK I try to rephrase. Is it possible to achieve the same effect as
wait_for_all in corosync 2.x with combination of pacemaker 1.1.x and
corosync 1.x? I.e. ensure that cluster does not come up *on the
first startup* until all nodes are present? So just make cluster nodes
wait for others to join instead of trying to stonith them?



More information about the Pacemaker mailing list