[Pacemaker] A/P Corosync, PGSQL and Split Brains questions

Stephan-Frank Henry Frank.Henry at gmx.net
Fri Feb 11 08:47:50 UTC 2011


> On: Fri, 11 Feb 2011 01:54:04 +0100, Lars Ellenberg worte:
> On Wed, Feb 09, 2011 at 02:48:52PM +0100, Stephan-Frank Henry wrote:
> > My scenario:
> > Acive/Passive 2 node cluster (serverA & serverB) with Corosync, DRBD &
> PGSQL.
> > The resources are configured as Master/Slave and sofar it is fine.
> > 
> > Since bullet points speak more then words: ;)
> > Test:
> >  1) Pull the plug on the master (serverA)
> >  2) Then Reattach
> > Expected results:
> >  1) serverB becomes Master
> >  2) serverB remains Master, serverA syncs with serverB
> > Actual results:
> >  1) serverB becomes Master
> >  2) serverA becomes Master, data written on serverB is lost.
> 
> Without logs, it does not make much sense to guess what may have
> happened, and why.
> 
> >   net {
> >     cram-hmac-alg sha1;
> >     after-sb-0pri discard-zero-changes;
> >     after-sb-1pri discard-secondary;
> 
> 
> This is configuring data loss.
> Just because during some connection handshake, after a split brain,
> one node is currently secondary, does not necessarily mean that is the
> data set you want to throw away.

Yes, that does make sense.
>From the description about above config it had seemed like what I needed.
 
> >     after-sb-2pri disconnect; 
> >   }
> >   on serverA {
> >     device /dev/drbd0;
> >     disk /dev/sda5;
> >     meta-disk internal;
> >     address 150.158.183.22:7788;
> >   }
> >   on serverB {
> >     device /dev/drbd0;
> >     disk /dev/sda5;
> >     meta-disk internal;
> >     address 150.158.183.23:7788;
> >   }
> > }
> > 
> > ############### /etc/ha.d/ha.cf 
> > 
> > udpport 694
> > ucast eth0 150.158.183.23
> 
> You absolutely want redundant communication links.

Yes, this is becoming more visible.
The problem I have now is that I only have one nic for the communications, the other being used for something else.
So I will press forward to do this.

> > 
> > autojoin none
> > debug 1
> > logfile /var/log/ha-log
> > use_logd false
> > logfacility daemon
> > keepalive 2 # 2 second(s)
> > deadtime 10
> > # warntime 10
> > initdead 80
> > 
> > # list all shared ip addresses we want to ping
> > ping 150.158.183.30
> 
> Ping directive in ha.cf is heartbeat haresources mode stuff.
> Discard that.

Will do.

> > # list all node names
> > node serverB serverA
> > crm yes
> > respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s
> 
> I don't think that makes much sense nowadays anymore.
> Use the pacemaker/ping resource agent instead.
> 
> Besides, you are using corosync.
> So why include the _heartbeat_ configuration here?
> It is completely irrelevant.

Okidoki.
Not being an expert and having to grow into the HA stuff very quickly, I basically am happy the stuff is working at all.

> > ############### /etc/corosync/corosync.conf
> > 
> > totem {
> > 	version: 2
> > 	token: 1000
> > 	hold: 180
> > 	token_retransmits_before_loss_const: 20
> > 	join: 60
> > 	configuration (ms)
> > 	consensus: 4800
> > 	vsftype: none
> > 	max_messages: 20
> > 	clear_node_high_bit: yes
> > 	secauth: off
> > 	threads: 0
> > 	rrp_mode: none
> > 	interface {
> > 		ringnumber: 0
> > 		bindnetaddr: 150.158.183.0
> > 		mcastaddr: 226.94.1.22
> > 		mcastport: 5427
> > 	}
> 
> I said it earlier, you want to have redundant communication channels.
> I'm not on top of thing how corosync redundant ring behavior is doing
> now, but it had some quirks in the past.
> 
> > <cib have_quorum="true" generated="true" ignore_dtd="false" epoch="14"
> num_updates="0" admin_epoch="0" validate-with="transitional-0.6"
> cib-last-written="Wed Feb  9 14:03:30 2011" crm_feature_set="3.0.1" have-quorum="0"
> dc-uuid="serverA">
> 
> validate-with transitional 0.6? Really?
> Is this an upgrade from something?
> Or a copy'n'paste?
>
> Where does this cib come from?

Actually it was hand written by myself, with a lot of help from google.
Not to much copy&paste, but a lot of inspiration.
Also sometimes a cib, after it has been verified would have changes in there. If I removed them the verifier would complain, so I left the stuff in.

And yes, the file does have some history, like 1.5 years old and used with different use-cases. So some stuff might have crept in.

> >         <primitive class="ocf" type="drbd" provider="heartbeat"
> id="drbddisk_rep">
> 
> please use the linbit drbd resource agent.
> 
> >       <group id="rg_drbd" ordered="true">
> 
> >         <primitive id="ip_resource" class="ocf" type="IPaddr2"
> provider="heartbeat">
> 
> >         <primitive class="ocf" provider="heartbeat" type="Filesystem"
> id="fs0">
> 
> >         <primitive id="pgsql" class="ocf" type="pgsql"
> provider="heartbeat">
> 
> 
> >         <rule id="drbd0-master-on-1" role="master" score="100">
> >           <expression id="exp-1" attribute="#uname" operation="eq"
> value="serverA"/>
> 
> Get rid of that rule.
> Seriously.
> 
> Combined with your use of the heartbeat/drbd (instead of the
> linbit/drbd) agent, and the "after-sb-1pri discard-secondary;" in your
> drbd.conf, it is most likely the root cause of your trouble.

Ok, thanks for that.
I had already started looking into the discard-secondary stuff.
I hardly see any examples using it and the exact usage is not to clear for me.

> >         </rule>
> >       </rsc_location>
> >       <rsc_order id="mount_after_drbd" from="rg_drbd" action="start"
> to="ms_drbd0" to_action="promote"/>
> >       <rsc_colocation id="mount_on_drbd" to="ms_drbd0" to_role="master"
> from="rg_drbd" score="INFINITY"/>
> 
> And please start using the crm shell.

You are totally correct.
For the task I am working on, we basically need one config which is then used on multiple setups where only the installation specific values are changed (hostnames, ips, and some other things).
So for us there is either a single-node setup or a dual-node setup (both only differ in the config specifics).
It is required that the system be set up at once with one DVD and then only 'reconfiguration' is allowed.
Setup wise it was a lot easier for me to just create the config files with placeholders and then do a copy|sed with the needed data.
And for that text files are much more easier.

Thanks for all your help.

Frank
-- 
GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit 
gratis Handy-Flat! http://portal.gmx.net/de/go/dsl




More information about the Pacemaker mailing list