[Pacemaker] Cluster split brain on vmware VSphere

Wed Jun 9 12:22:37 UTC 2010

Hi,

On Wed, Jun 09, 2010 at 12:11:09PM +0200, Torresani, Roberto wrote:
> Well... it seem to be SOLVED!!!
> Thank you Dejan.
> In the next few days I will load the cluster and then see how it behaves.
> 
> I simply raise the token value to 10000 msec, leave all the others to the defaults.

You should also raise the consensus value to 12000. corosync
would even refuse to start in this case.

Thanks,

Dejan

> 
> Thank you again.
> Regards,
> Roberto
> 
>  
> 
> > -----Original Message-----
> > From: Dejan Muhamedagic [mailto:dejanmm at fastmail.fm] 
> > Sent: Tuesday, June 08, 2010 6:42 PM
> > To: The Pacemaker cluster resource manager
> > Subject: Re: [Pacemaker] Cluster split brain on vmware VSphere
> > 
> > Hi,
> > 
> > On Mon, Jun 07, 2010 at 02:57:57PM +0200, Torresani, Roberto wrote:
> > > Sorry for have choosen the wrong ml... 
> > 
> > That's no problem. There's just better chance of getting help on
> > the other list.
> > 
> > > Here the corosync.conf used by one cluster, the other one is
> > > just the same provided by the epel repository packages.
> > > 
> > > I will try to raise the token value to 10000 as you suggest. Is
> > > there a theoretical or a best practice to set this value ?
> > 
> > No, but 5000 should be OK for most. Ultimately, it depends on
> > your network. I forgot what was exactly the case here, but it
> > seems like you had some heavy processing (backup?) which used
> > most of resources. That may be really hard to predict. You can
> > use sar or similar to monitor the load.
> > 
> > Thanks,
> > 
> > Dejan
> > 
> > > I will keep you informed as it goes, and open a thread on the
> > > corosync ml if necessary.
> > > 
> > > Thank you.
> > > 
> > > 
> > > # Please read the corosync.conf.5 manual page
> > > compatibility: whitetank
> > > 
> > > totem {
> > >         version: 2
> > >         secauth: off
> > >         threads: 0
> > >         token:          1000
> > >         hold: 180
> > >         token_retransmits_before_loss_const: 20
> > >         join:           60
> > >         consensus:      4800
> > >         vsftype:        none
> > >         max_messages:   20
> > >         interface {
> > >                 ringnumber: 0
> > >                 bindnetaddr: 192.168.206.0
> > >                 mcastaddr: 226.94.1.1
> > >                 mcastport: 5405
> > >         }
> > > }
> > > 
> > > logging {
> > >         fileline: off
> > >         to_stderr: yes
> > >         to_logfile: yes
> > >         to_syslog: yes
> > >         logfile: /tmp/corosync.log
> > >         debug: off
> > >         timestamp: on
> > >         logger_subsys {
> > >                 subsys: AMF
> > >                 debug: off
> > >         }
> > > }
> > > 
> > > amf {
> > >         mode: disabled
> > > }
> > > 
> > > aisexec {
> > >     user:  root
> > >     group: root
> > > }
> > > 
> > > service {
> > >     name: pacemaker
> > >     ver: 0
> > > }
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > 
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: 
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: 
> > http://developerbugs.linux-foundation.org/enter_bug.cgi?produc
> t=Pacemaker
> > 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: 
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: 
> > http://developerbugs.linux-foundation.org/enter_bug.cgi?produc
> t=Pacemaker
> > 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker