[Pacemaker] Managing DRBD Dual Primary with Pacemaker always initial Split Brains
Felix Zachlod
fz.lists at sis-gmbh.info
Thu Oct 2 21:07:31 CEST 2014
Am 02.10.2014 18:02, schrieb Digimer:
> On 02/10/14 02:44 AM, Felix Zachlod wrote:
>> I am currently running 8.4.5 on to of Debian Wheezy with Pacemaker 1.1.7
>
> Please upgrade to 1.1.10+!
>
Are you referring to a special bug/ code change? I normally don't like
building all this stuff from source instead using the packages if there
are not very good reasons for it. I run some 1.1.7 debian base pacemaker
clusters for a long time now without any issue and I am sure that this
version seems to run very stable so as long as I am not facing a
specific problem with this version I'd prefer sticking to it rather than
putting brand new stuff from source together which might face other
compatibility issues later on.
I am nearly sure that I found a hint to the problem:
adjust_master_score (string, [5 10 1000 10000]): master score adjustments
Space separated list of four master score adjustments for different
scenarios:
- only access to 'consistent' data
- only remote access to 'uptodate' data
- currently Secondary, local access to 'uptodate' data, but remote
is unknown
This is from the drbd resource agent's meta data.
As you can see the RA will report a master score of 1000 if it is
secondary and (thinks) it has up to date data. According to the logs it
is reporting 1000 though... I set a location rule with a score of -1001
for the Master role and finally Pacemaker is waiting to promote the
nodes to Master till the next monitor action when it notices until the
nodes are connected and synced and report a MS of 10000. What is
interesting to me is
a) why do both drbd nodes think they have uptodate data when coming back
online- at least one should know that it has been disconnected when
another node was still up and consider that data might have been changed
in the meantime. and in case I am rebooting a single node it can almost
be sure that it has only "consistent" data cause the other side was
still primary when shutting down this one
b) why does obviously nobody face this problem as it should behave like
this in any primary primary cluster
but I think I will try passing this on to the drbd mailing list too.
regards, Felix
More information about the Pacemaker
mailing list