[Pacemaker] Managing DRBD Dual Primary with Pacemaker always initial Split Brains

Thu Oct 2 08:44:37 CEST 2014

Am 01.10.2014 20:46, schrieb Digimer:
>
> At some point along the way, both nodes were Primary while not
> connected, even if for just a moment. Your log snippet above shows the
> results of this break, they do not appear to speak to the break itself.

Even easier to reproduce is the problem when I try stop stop a drbd 
resource and later restart it this always leads to a split brain.

this is the log from the one side:

Oct  2 08:11:46 storage-test-d kernel: [44936.343453] drbd testdata2: 
asender terminated
Oct  2 08:11:46 storage-test-d kernel: [44936.343457] drbd testdata2: 
Terminating drbd_a_testdata
Oct  2 08:11:46 storage-test-d kernel: [44936.362103] drbd testdata2: 
conn( TearDown -> Disconnecting )
Oct  2 08:11:47 storage-test-d kernel: [44936.450052] drbd testdata2: 
Connection closed
Oct  2 08:11:47 storage-test-d kernel: [44936.450070] drbd testdata2: 
conn( Disconnecting -> StandAlone )
Oct  2 08:11:47 storage-test-d kernel: [44936.450074] drbd testdata2: 
receiver terminated
Oct  2 08:11:47 storage-test-d kernel: [44936.450081] drbd testdata2: 
Terminating drbd_r_testdata
Oct  2 08:11:47 storage-test-d kernel: [44936.450104] block drbd11: 
disk( UpToDate -> Failed )
Oct  2 08:11:47 storage-test-d kernel: [44936.514071] block drbd11: 
bitmap WRITE of 0 pages took 0 jiffies
Oct  2 08:11:47 storage-test-d kernel: [44936.514078] block drbd11: 0 KB 
(0 bits) marked out-of-sync by on disk bit-map.
Oct  2 08:11:47 storage-test-d kernel: [44936.514088] block drbd11: 
disk( Failed -> Diskless )
Oct  2 08:11:47 storage-test-d kernel: [44936.514793] block drbd11: 
drbd_bm_resize called with capacity == 0
Oct  2 08:11:47 storage-test-d kernel: [44936.515461] drbd testdata2: 
Terminating drbd_w_testdata
Oct  2 08:12:16 storage-test-d rsyslogd-2177: imuxsock lost 124 messages 
from pid 2748 due to rate-limiting
Oct  2 08:13:06 storage-test-d kernel: [45016.120378] drbd testdata2: 
Starting worker thread (from drbdsetup-84 [10353])
Oct  2 08:13:06 storage-test-d kernel: [45016.121012] block drbd11: 
disk( Diskless -> Attaching )
Oct  2 08:13:06 storage-test-d kernel: [45016.121812] drbd testdata2: 
Method to ensure write ordering: drain
Oct  2 08:13:06 storage-test-d kernel: [45016.121817] block drbd11: max 
BIO size = 1048576
Oct  2 08:13:06 storage-test-d kernel: [45016.121825] block drbd11: 
drbd_bm_resize called with capacity == 838835128
Oct  2 08:13:06 storage-test-d kernel: [45016.127192] block drbd11: 
resync bitmap: bits=104854391 words=1638350 pages=3200
Oct  2 08:13:06 storage-test-d kernel: [45016.127199] block drbd11: size 
= 400 GB (419417564 KB)
Oct  2 08:13:06 storage-test-d kernel: [45016.321361] block drbd11: 
recounting of set bits took additional 2 jiffies
Oct  2 08:13:06 storage-test-d kernel: [45016.321369] block drbd11: 0 KB 
(0 bits) marked out-of-sync by on disk bit-map.
Oct  2 08:13:06 storage-test-d kernel: [45016.321382] block drbd11: 
disk( Attaching -> UpToDate )
Oct  2 08:13:06 storage-test-d kernel: [45016.321388] block drbd11: 
attached to UUIDs 
28A688FAC06E2662:0000000000000000:0EABC2724124755C:0EAAC2724124755C
Oct  2 08:13:06 storage-test-d kernel: [45016.376555] drbd testdata2: 
conn( StandAlone -> Unconnected )
Oct  2 08:13:06 storage-test-d kernel: [45016.376634] drbd testdata2: 
Starting receiver thread (from drbd_w_testdata [10355])
Oct  2 08:13:06 storage-test-d kernel: [45016.376876] drbd testdata2: 
receiver (re)started
Oct  2 08:13:06 storage-test-d kernel: [45016.376897] drbd testdata2: 
conn( Unconnected -> WFConnection )
Oct  2 08:13:07 storage-test-d rsyslogd-2177: imuxsock begins to drop 
messages from pid 2748 due to rate-limiting
Oct  2 08:13:07 storage-test-d kernel: [45016.707045] block drbd11: 
role( Secondary -> Primary )
Oct  2 08:13:07 storage-test-d kernel: [45016.729180] block drbd11: new 
current UUID 
C58090DF57933525:28A688FAC06E2662:0EABC2724124755C:0EAAC2724124755C
Oct  2 08:13:07 storage-test-d kernel: [45016.876920] drbd testdata2: 
Handshake successful: Agreed network protocol version 101
Oct  2 08:13:07 storage-test-d kernel: [45016.876926] drbd testdata2: 
Agreed to support TRIM on protocol level
Oct  2 08:13:07 storage-test-d kernel: [45016.876999] drbd testdata2: 
conn( WFConnection -> WFReportParams )
Oct  2 08:13:07 storage-test-d kernel: [45016.877013] drbd testdata2: 
Starting asender thread (from drbd_r_testdata [10376])
Oct  2 08:13:07 storage-test-d kernel: [45017.015220] block drbd11: 
drbd_sync_handshake:
Oct  2 08:13:07 storage-test-d kernel: [45017.015228] block drbd11: self 
C58090DF57933525:28A688FAC06E2662:0EABC2724124755C:0EAAC2724124755C 
bits:0 flags:0
Oct  2 08:13:07 storage-test-d kernel: [45017.015234] block drbd11: peer 
7F282664519D49A1:28A688FAC06E2662:0EABC2724124755C:0EAAC2724124755C 
bits:0 flags:0
Oct  2 08:13:07 storage-test-d kernel: [45017.015239] block drbd11: 
uuid_compare()=100 by rule 90
Oct  2 08:13:07 storage-test-d kernel: [45017.015247] block drbd11: 
helper command: /sbin/drbdadm initial-split-brain minor-11
Oct  2 08:13:07 storage-test-d kernel: [45017.018192] block drbd11: 
helper command: /sbin/drbdadm initial-split-brain minor-11 exit code 0 (0x0)
Oct  2 08:13:07 storage-test-d kernel: [45017.018218] block drbd11: 
Split-Brain detected but unresolved, dropping connection!
Oct  2 08:13:07 storage-test-d kernel: [45017.018244] block drbd11: 
helper command: /sbin/drbdadm split-brain minor-11
Oct  2 08:13:07 storage-test-d kernel: [45017.021201] block drbd11: 
helper command: /sbin/drbdadm split-brain minor-11 exit code 0 (0x0)
Oct  2 08:13:07 storage-test-d kernel: [45017.021242] drbd testdata2: 
conn( WFReportParams -> Disconnecting )
Oct  2 08:13:07 storage-test-d kernel: [45017.021248] drbd testdata2: 
error receiving ReportState, e: -5 l: 0!
Oct  2 08:13:07 storage-test-d kernel: [45017.021277] drbd testdata2: 
asender terminated
Oct  2 08:13:07 storage-test-d kernel: [45017.021280] drbd testdata2: 
Terminating drbd_a_testdata
Oct  2 08:13:07 storage-test-d kernel: [45017.143260] drbd testdata2: 
Connection closed
Oct  2 08:13:07 storage-test-d kernel: [45017.143440] drbd testdata2: 
conn( Disconnecting -> StandAlone )
Oct  2 08:13:07 storage-test-d kernel: [45017.143446] drbd testdata2: 
receiver terminated
Oct  2 08:13:07 storage-test-d kernel: [45017.143449] drbd testdata2: 
Terminating drbd_r_testdata

This is the log from the either side. As you can see both sides promote 
even before the connections is ready.

Oct  2 08:11:46 storage-test-c kernel: [75654.680647] block drbd11: 
role( Primary -> Secondary )
Oct  2 08:11:46 storage-test-c kernel: [75654.680764] block drbd11: 
bitmap WRITE of 0 pages took 0 jiffies
Oct  2 08:11:46 storage-test-c kernel: [75654.680769] block drbd11: 0 KB 
(0 bits) marked out-of-sync by on disk bit-map.
Oct  2 08:11:46 storage-test-c kernel: [75654.725842] block drbd11: 
peer( Primary -> Secondary )
Oct  2 08:11:46 storage-test-c rsyslogd-2177: imuxsock begins to drop 
messages from pid 2472 due to rate-limiting
Oct  2 08:11:46 storage-test-c kernel: [75654.947972] drbd testdata2: 
peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( 
UpToDate -> DUnknown )
Oct  2 08:11:46 storage-test-c kernel: [75654.948085] drbd testdata2: 
asender terminated
Oct  2 08:11:46 storage-test-c kernel: [75654.948089] drbd testdata2: 
Terminating drbd_a_testdata
Oct  2 08:11:47 storage-test-c kernel: [75655.052087] drbd testdata2: 
Connection closed
Oct  2 08:11:47 storage-test-c kernel: [75655.052096] drbd testdata2: 
conn( Disconnecting -> StandAlone )
Oct  2 08:11:47 storage-test-c kernel: [75655.052099] drbd testdata2: 
receiver terminated
Oct  2 08:11:47 storage-test-c kernel: [75655.052102] drbd testdata2: 
Terminating drbd_r_testdata
Oct  2 08:11:47 storage-test-c kernel: [75655.052119] block drbd11: 
disk( UpToDate -> Failed )
Oct  2 08:11:47 storage-test-c kernel: [75655.120108] block drbd11: 
bitmap WRITE of 0 pages took 0 jiffies
Oct  2 08:11:47 storage-test-c kernel: [75655.120112] block drbd11: 0 KB 
(0 bits) marked out-of-sync by on disk bit-map.
Oct  2 08:11:47 storage-test-c kernel: [75655.120117] block drbd11: 
disk( Failed -> Diskless )
Oct  2 08:11:47 storage-test-c kernel: [75655.120419] block drbd11: 
drbd_bm_resize called with capacity == 0
Oct  2 08:11:47 storage-test-c kernel: [75655.120900] drbd testdata2: 
Terminating drbd_w_testdata
Oct  2 08:12:16 storage-test-c rsyslogd-2177: imuxsock lost 125 messages 
from pid 2472 due to rate-limiting
Oct  2 08:13:06 storage-test-c kernel: [75734.646310] drbd testdata2: 
Starting worker thread (from drbdsetup-84 [7942])
Oct  2 08:13:06 storage-test-c kernel: [75734.646588] block drbd11: 
disk( Diskless -> Attaching )
Oct  2 08:13:06 storage-test-c kernel: [75734.647120] drbd testdata2: 
Method to ensure write ordering: drain
Oct  2 08:13:06 storage-test-c kernel: [75734.647126] block drbd11: max 
BIO size = 1048576
Oct  2 08:13:06 storage-test-c kernel: [75734.647133] block drbd11: 
drbd_bm_resize called with capacity == 838835128
Oct  2 08:13:06 storage-test-c kernel: [75734.652037] block drbd11: 
resync bitmap: bits=104854391 words=1638350 pages=3200
Oct  2 08:13:06 storage-test-c kernel: [75734.652050] block drbd11: size 
= 400 GB (419417564 KB)
Oct  2 08:13:06 storage-test-c kernel: [75734.796781] block drbd11: 
recounting of set bits took additional 3 jiffies
Oct  2 08:13:06 storage-test-c kernel: [75734.796786] block drbd11: 0 KB 
(0 bits) marked out-of-sync by on disk bit-map.
Oct  2 08:13:06 storage-test-c kernel: [75734.796800] block drbd11: 
disk( Attaching -> UpToDate )
Oct  2 08:13:06 storage-test-c kernel: [75734.796804] block drbd11: 
attached to UUIDs 
28A688FAC06E2662:0000000000000000:0EABC2724124755C:0EAAC2724124755C
Oct  2 08:13:06 storage-test-c kernel: [75734.821025] drbd testdata2: 
conn( StandAlone -> Unconnected )
Oct  2 08:13:06 storage-test-c kernel: [75734.821051] drbd testdata2: 
Starting receiver thread (from drbd_w_testdata [7944])
Oct  2 08:13:06 storage-test-c kernel: [75734.821108] drbd testdata2: 
receiver (re)started
Oct  2 08:13:06 storage-test-c kernel: [75734.821127] drbd testdata2: 
conn( Unconnected -> WFConnection )
Oct  2 08:13:07 storage-test-c rsyslogd-2177: imuxsock begins to drop 
messages from pid 2472 due to rate-limiting
Oct  2 08:13:07 storage-test-c kernel: [75735.288894] block drbd11: 
role( Secondary -> Primary )
Oct  2 08:13:07 storage-test-c kernel: [75735.289121] block drbd11: new 
current UUID 
7F282664519D49A1:28A688FAC06E2662:0EABC2724124755C:0EAAC2724124755C
Oct  2 08:13:07 storage-test-c kernel: [75735.480079] drbd testdata2: 
Handshake successful: Agreed network protocol version 101
Oct  2 08:13:07 storage-test-c kernel: [75735.480083] drbd testdata2: 
Agreed to support TRIM on protocol level
Oct  2 08:13:07 storage-test-c kernel: [75735.480158] drbd testdata2: 
conn( WFConnection -> WFReportParams )
Oct  2 08:13:07 storage-test-c kernel: [75735.480164] drbd testdata2: 
Starting asender thread (from drbd_r_testdata [7965])
Oct  2 08:13:07 storage-test-c kernel: [75735.568077] block drbd11: 
drbd_sync_handshake:
Oct  2 08:13:07 storage-test-c kernel: [75735.568082] block drbd11: self 
7F282664519D49A1:28A688FAC06E2662:0EABC2724124755C:0EAAC2724124755C 
bits:0 flags:0
Oct  2 08:13:07 storage-test-c kernel: [75735.568087] block drbd11: peer 
C58090DF57933525:28A688FAC06E2662:0EABC2724124755C:0EAAC2724124755C 
bits:0 flags:0
Oct  2 08:13:07 storage-test-c kernel: [75735.568091] block drbd11: 
uuid_compare()=100 by rule 90
Oct  2 08:13:07 storage-test-c kernel: [75735.568096] block drbd11: 
helper command: /sbin/drbdadm initial-split-brain minor-11
Oct  2 08:13:07 storage-test-c kernel: [75735.570098] block drbd11: 
helper command: /sbin/drbdadm initial-split-brain minor-11 exit code 0 (0x0)
Oct  2 08:13:07 storage-test-c kernel: [75735.570111] block drbd11: 
Split-Brain detected but unresolved, dropping connection!
Oct  2 08:13:07 storage-test-c kernel: [75735.570168] block drbd11: 
helper command: /sbin/drbdadm split-brain minor-11
Oct  2 08:13:07 storage-test-c kernel: [75735.571935] block drbd11: 
helper command: /sbin/drbdadm split-brain minor-11 exit code 0 (0x0)
Oct  2 08:13:07 storage-test-c kernel: [75735.571960] drbd testdata2: 
conn( WFReportParams -> Disconnecting )
Oct  2 08:13:07 storage-test-c kernel: [75735.571964] drbd testdata2: 
error receiving ReportState, e: -5 l: 0!
Oct  2 08:13:07 storage-test-c kernel: [75735.572015] drbd testdata2: 
asender terminated
Oct  2 08:13:07 storage-test-c kernel: [75735.572018] drbd testdata2: 
Terminating drbd_a_testdata
Oct  2 08:13:07 storage-test-c kernel: [75735.672070] drbd testdata2: 
Connection closed
Oct  2 08:13:07 storage-test-c kernel: [75735.672124] drbd testdata2: 
conn( Disconnecting -> StandAlone )
Oct  2 08:13:07 storage-test-c kernel: [75735.672127] drbd testdata2: 
receiver terminated
Oct  2 08:13:07 storage-test-c kernel: [75735.672130] drbd testdata2: 
Terminating drbd_r_testdata

And yes, I can easily configure stonith/fencing- stonith is even already 
working I just would have to add the resource level side in drbd but for 
my opinion this adds an unnesesary complexity to this problem at THIS 
point of the survey. I definitely reach a split brain situation here 
where I don't expect it. Stopping a resource and re starting in a normal 
fashion should not cause a split brain. And if the split brain is there 
the fencing should kick in and I would have a node turning off and 
possibly in a loop as on every restart the resource would split brains 
into half and shoot a node. I think it would be best to have this 
cluster running in a way that under normal circumstances does NOT cause 
split brains and afterwards get it working for odd circumstances where 
fencing is needed. I will definitely try them all.

And again. I still believe that pacemaker is causing the problem here by 
promoting the nodes too fast.
I try the second part manually:

1. stop the resource in pacemaker via crm stop $resource
2. set the cluster into maintenance
3. start the drbd resource via the shell

Both sides come up as secondary, sync and are ready to promote.

If you do this via Pacemaker as above you can see that it promotes the 
resource to primary although it is not connected at this point 
(WFConnection) and naturally this means split brain afterwards.

> I'd take an errant fence over a split-brain any day.

Not if your good node with the good data is being fenced and the one 
with the (consistent but old) data comes up. This leads to a time warp 
in the FS in any VM running on this storage and will most probably cause 
them to crash all and possibly damage their file systems too. And 
finally you have got the situation where you don't know if both nodes 
contain relevant data.

> That said, that you
> can recover without invalidating one of the nodes is very odd.

I saw that from the logs it isn't too odd, it seems it recovers 
automatically, I have configured

                 after-sb-0pri discard-zero-changes;
                 after-sb-1pri discard-secondary;
                 after-sb-2pri disconnect;

Where the after-sb-1pri policy lets the secondary sync from the primary.

> What
> version of DRBD are you using?

I am currently running 8.4.5 on to of Debian Wheezy with Pacemaker 1.1.7

> If you stop DRBD on a node, the other should stay in WFConnection state.
> Otherwise, it won't hear the connection request when you restart the peer.

Yes this is true and this is happening though... just that the other 
node seems to come up as primary which is unexpected and

after-sb-2pri disconnect;

disconnects both hosts at this point.

> You are right though, if a node goes Primary *before* connecting, you
> have a split-brain. The start order should be attach -> connect ->
> primary. Why it's not doing that is to be determined, but it would
> certainly be a problem if you're seeing attach -> primary -> connect.

regards, Felix