[Pacemaker] DRBD primary/primary + Pacemaker goes into split brain after crm node standby/online
Andrew Beekhof
andrew at beekhof.net
Wed Jun 11 23:55:14 UTC 2014
On 12 Jun 2014, at 12:13 am, Alexis de BRUYN <alexis.mailinglist at de-bruyn.fr> wrote:
> On 10.06.2014 01:44, Andrew Beekhof wrote:
>>
>> On 10 Jun 2014, at 4:07 am, Alexis de BRUYN <alexis.mailinglist at de-bruyn.fr> wrote:
>>
>>> Hi Everybody,
>>>
>>> I have an issue with a 2-node Debian Wheezy primary/primary DRBD
>>> Pacemaker/Corosync configuration.
>>>
>>> After a 'crm node standby' then a 'crm node online', the DRBD volume
>>> stays in a 'split brain state' (cs:StandAlone ro:Primary/Unknown).
>>>
>>> A soft or hard reboot of one node gets rid of the split brain and/or
>>> doesn't create one.
>>>
>>> I have followed http://www.drbd.org/users-guide-8.3/ and keep my tests
>>> as simple as possible (no activity and no filesystem on the DRBD volume).
>>>
>>> I don't see what I am doing wrong. Could anybody help me with this please.
>>
>> There could be a pacemaker bug.
>> Master/slave resources are quite complex internally and have received many improvements in the years since 1.1.7.
>> So simply upgrading pacemaker could be the answer.
>
> Hi Andrew,
>
> I have followed your advice and updated Pacemaker/Corosync by installing
> a fresh Debian Sid but I still have the issue with the following packages:
I don't know exactly what went into those packages and there have been more fixes (aren't there always :-/) since 1.1.10, but it is certainly recent enough to deserve a closer look.
Could you run crm_report for the period covered by your test? (No need to reproduce, just tell crm_report when you did the test and it will create a tarball for you to attach here).
>
> # uname -a
> Linux testvm1 3.13-1-amd64 #1 SMP Debian 3.13.10-1 (2014-04-15) x86_64
> GNU/Linux
>
> # cat /etc/issue && dpkg -l | egrep "corosync|pacemaker|drbd"
> Debian GNU/Linux jessie/sid \n \l
>
> ii corosync 1.4.6-1 amd64
> Standards-based cluster framework (daemon and modules)
> ii crmsh 1.2.6+git+e77add-1.2 amd64
> CRM shell for the pacemaker cluster manager
> ii drbd8-utils 2:8.4.4-1 amd64
> RAID 1 over TCP/IP for Linux (user utilities)
> ii pacemaker 1.1.10+git20130802-4 amd64
> HA cluster resource manager
> ii pacemaker-cli-utils 1.1.10+git20130802-4 amd64
> Command line interface utilities for Pacemaker
>
> And with the "experimental" packages, I cannot connect to the cluster
> via crmsh too:
>
> # cat /etc/issue && dpkg -l | egrep "corosync|pacemaker|drbd"
> Debian GNU/Linux jessie/sid \n \l
>
> ii corosync 2.3.3-1 amd64
> Standards-based cluster framework (daemon and modules)
> ii crmsh 1.2.6+git+e77add-1.2 amd64
> CRM shell for the pacemaker cluster manager
> ii drbd8-utils 2:8.4.4-1 amd64
> RAID 1 over TCP/IP for Linux (user utilities)
> ii libcorosync-common4 2.3.3-1 amd64
> Standards-based cluster framework, common library
> ii pacemaker 1.1.11-1 amd64
> HA cluster resource manager
> ii pacemaker-cli-utils 1.1.11-1 amd64
> Command line interface utilities for Pacemaker
>
> I will try to build last versions of Pacemaker/Corosync on a Debian
> Wheezy before reporting my issue via Bugzilla.
>
> Thanks for your help.
>
>
> --
> Alexis de BRUYN
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140612/24566b7c/attachment-0004.sig>
More information about the Pacemaker
mailing list