[Pacemaker] Two master colocation question

Tue Aug 26 00:42:43 EDT 2008

On Tue, Aug 26, 2008 at 12:57:45AM +0200, Robert Heinzmann (ml) wrote:
> We probably mean the same thing (I hope). Practically speaking a acid  
> compliant database system does the following (abstract actions) when  
> commiting a transaction:
>
> 1) writing to the log
> 2) flush the log ((and wait for the write to finish)
> 3) do other things
> 4) before deleting the log / overwriting the log / if there is time and  
> nothing else to do write to the datafile
> 5) remove the transaction from the log
>
> So write order consistency means: For applications expecting local disk  
> semantics (like the acid scenario described above), protocol C hehaves  
> like local disk.
>
> This means it is safe to place database log files and data files on  
> different drbd devices as long as the protocol is C.

... and neither replication stream is disrupted.

problem is:
if the replication breaks "in interessting ways",
one device will be ahead of the other on the peer.

which means, that as long as each drbd uses its own
replication stream (consisting of two tcp connections),
it can be dangerous to spread logically tightly coupled data
on more than one drbd.

thinking about probabilities, it may still be ok using DRBD protocol C
and one file system each.
though I'd recommend to have data base logs and data files on the same drbd.
yes I know every database consultant tells you to spread those to
separate file systems [on separate _physical_ devices; some forget about
this part!], but for consistency reasons, this should use the same
replication link => the same drbd => typically the same filesystem.

as a bad example,
I have seen installations building LVM2 VGs from multiple drbd as PVs
(even using drbd protocol A), then striping the LVs.
maybe even using different physical links
for "load balancing" the replication traffic.
if you think about how your file systems look like when
one of the replication streams is disrupted independently,
you should see why this is BAD PRACTICE.

the problem can probably only be avoided, if
we make "logical groups" of devices use
the same replication data stream.
which will eventually happen.
someday.

-- 
: Lars Ellenberg                
: LINBIT HA-Solutions GmbH
: DRBD®/HA support and consulting    http://www.linbit.com

DRBD® and LINBIT® are registered trademarks
of LINBIT Information Technologies GmbH