[Pacemaker] Cluster with DRBD : split brain

Hugo Deprez hugo.deprez at gmail.com
Wed Jul 20 16:24:02 CET 2011


Hello Andrew,

in fact DRBD was in standalone mode but the cluster was working :

Here is the syslog of the drbd's split brain :

Jul 15 08:45:34 node1 kernel: [1536023.052245] block drbd0: Handshake
successful: Agreed network protocol version 91
Jul 15 08:45:34 node1 kernel: [1536023.052267] block drbd0: conn(
WFConnection -> WFReportParams )
Jul 15 08:45:34 node1 kernel: [1536023.066677] block drbd0: Starting asender
thread (from drbd0_receiver [23281])
Jul 15 08:45:34 node1 kernel: [1536023.066863] block drbd0:
data-integrity-alg: <not-used>
Jul 15 08:45:34 node1 kernel: [1536023.079182] block drbd0:
drbd_sync_handshake:
Jul 15 08:45:34 node1 kernel: [1536023.079190] block drbd0: self
BBA9B794EDB65CDF:9E8FB52F896EF383:C5FE44742558F9E1:1F9E06135B8E296F
bits:75338 flags:0
Jul 15 08:45:34 node1 kernel: [1536023.079196] block drbd0: peer
8343B5F30B2BF674:9E8FB52F896EF382:C5FE44742558F9E0:1F9E06135B8E296F bits:769
flags:0
Jul 15 08:45:34 node1 kernel: [1536023.079200] block drbd0:
uuid_compare()=100 by rule 90
Jul 15 08:45:34 node1 kernel: [1536023.079203] block drbd0: Split-Brain
detected, dropping connection!
Jul 15 08:45:34 node1 kernel: [1536023.079439] block drbd0: helper command:
/sbin/drbdadm split-brain minor-0
Jul 15 08:45:34 node1 kernel: [1536023.083955] block drbd0: meta connection
shut down by peer.
Jul 15 08:45:34 node1 kernel: [1536023.084163] block drbd0: conn(
WFReportParams -> NetworkFailure )
Jul 15 08:45:34 node1 kernel: [1536023.084173] block drbd0: asender
terminated
Jul 15 08:45:34 node1 kernel: [1536023.084176] block drbd0: Terminating
asender thread
Jul 15 08:45:34 node1 kernel: [1536023.084406] block drbd0: helper command:
/sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
Jul 15 08:45:34 node1 kernel: [1536023.084420] block drbd0: conn(
NetworkFailure -> Disconnecting )
Jul 15 08:45:34 node1 kernel: [1536023.084430] block drbd0: error receiving
ReportState, l: 4!
Jul 15 08:45:34 node1 kernel: [1536023.084789] block drbd0: Connection
closed
Jul 15 08:45:34 node1 kernel: [1536023.084813] block drbd0: conn(
Disconnecting -> StandAlone )
Jul 15 08:45:34 node1 kernel: [1536023.086345] block drbd0: receiver
terminated
Jul 15 08:45:34 node1 kernel: [1536023.086349] block drbd0: Terminating
receiver thread


On 19 July 2011 02:30, Andrew Beekhof <andrew at beekhof.net> wrote:

> On Fri, Jul 15, 2011 at 7:58 PM, Hugo Deprez <hugo.deprez at gmail.com>
> wrote:
> > Dear community,
> >
> > I am running on Debian Lenny, a cluster with corosync. I have :
> >
> > One DRBD partition and 4 resources :
> >
> > fs-data    (ocf::heartbeat:Filesystem):
> > mda-ip     (ocf::heartbeat:IPaddr2):
> > postfix    (ocf::heartbeat:postfix):
> > apache     (ocf::heartbeat:apache):
> >
> > Last night something happens and DRBD had a 'split brain'. I think the
> split
> > brain come from
> >
> > The resources was still running on the node 1.
> >
> > I checked the corosync logs and seems that something went wrong, I would
> > like to understand what happen, in order to improve my cluster
> > configuration.
> >
> > Please find attach  the log file.
>
> I see no evidence of a split-brain. Both nodes appear to be able to
> talk to each other.
> What exactly is the problem you encountered?
>
> >
> > It seems that the cluster tried to migrate the resources to the other
> node
> > but didn't succeed ?
> >
> > Any help appreciated.
> >
> > Regards,
> >
> > Hugo
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs:
> >
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
> >
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20110720/ed2ec3bf/attachment.html>


More information about the Pacemaker mailing list