[Pacemaker] setup advice

Tue Jul 2 19:08:50 EDT 2013

I wouldn't be doing anything without corosync2 and its option that requires all nodes to be online before quorum is granted.
Otherwise I can imagine ways that the old master might try to promote itself.

On 02/07/2013, at 7:18 PM, Michael Schwartzkopff <misch at clusterbau.com> wrote:

> Am Dienstag, 2. Juli 2013, 09:47:31 schrieb Stefano Sasso:
> > Hello folks,
> >   I have the following setup in mind, but I need some advice and one hint
> > on how to realize a particular function.
> > 
> > I have a N (>= 2) nodes cluster, with data storage on postgresql.
> > I would like to manage postgres master-slave replication in this way: one
> > node is the "master", one is the "slave", and the others are "standby"
> > nodes.
> > If the master fails, the slave becomes the master, and one of the standby
> > becomes the slave.
> > If the slave fails, one of the standby becomes the new slave.
> > If one of the "standby" fails, no problem :)
> > I can correctly manage this configuration with ms and a custom script
> > (using
> > ocf:pacemaker:Stateful as example). If the cluster is already operational,
> > the failover works fine.
> > 
> > My problem is about cluster start-up: in fact, only the previous running
> > master and slave own the most updated data; so I would like that the new
> > master should be the "old master" (or, even, the old slave), and the new
> > slave should be the "old slave" (but this one is not mandatory). The
> > important thing is that the new master should have up-to-date data.
> > This should happen even if the servers are booted up with some minutes of
> > delay between them. (users are very stupid sometimes).
> > 
> > My idea is the following:
> > the MS resource is not started when the cluster comes up, but on startup
> > there will only be one "arbitrator" resource (started on only one node).
> > This resource reads from somewhere which was the previous master and the
> > previous slave, and it wait up to 5 minutes to see if one of them comes up.
> > In positive case, it forces the MS master resource to be run on that node
> > (and start it); in negative case, if the wait timer expired, it start the
> > master resource on a random node.
>  
> hi,
>  
> an other possible was to acchieve your goal ist to add resource level fencing to your custom postgresql resource agent.
>  
> If a node leaves the cluster and this node is NOT the running master or slave of the postgresql instance, the surviving nodes add a location constraint to the CIB that prevents a postgresql instance running on that lost node:
>  
> loc ResFence_postgresql msPostgresql -inf: <lost node>
>  
> When the node comes back online and is visible in the cluster again your resource agent should remove that connstraint again.
>  
> this constraints also can be set if the resource is stopping on a node. So you can use the notify action the acchieve this from another node. Only on the nodes where postgresql is running there is NO location constraint. If there are any changes in the config (nodes leaves) the notify action checks if a removal or an add of such a location constraint is nescessary.
>  
> Did not work out in total. Perhaps you have to think about the problem again.
>  
> Please see resource level fencing of the drbd agent of linbit and the drbd configuration:
> http://www.drbd.org/users-guide/s-pacemaker-fencing.html, section 8.3.2
>  
> -- 
> Dr. Michael Schwartzkopff
> Guardinistr. 63
> 81375 München
>  
> Tel: (0163) 172 50 98
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org