[Pacemaker] Help with DRBD resources on Pacemaker

Thu Aug 23 18:42:05 EDT 2012

Hi Matthias.

Tank you for the quick reply.

I would just try to remove the last 2 lines in the "net" column of DRBD 
Config. I had the same behavior with the option "ping-timeout" in the "net" 
column.
Starting DRBD via init-script worked but not managed via pacemaker.
Just try it!
I am still investigating the reason for that.

I tryed removing the lines you sugested but the results are the same :-(

Additionaly i would remove the line with "become primary on both " in the 
DRBD Config since Pacemaker will promote the DRBD Resources itself to 
Master/Primary.

I need this statement because it will be running a OCFS2 filesystem and both 
hosts will be using those directories for HTTP/HTTPS services.

Regards

Matthias

Regards,
Carlos Xavier.

Carlos Xavier <cbastos at connection.com.br> schrieb:

Hi.

I configured one cluster using one resouce of DRBD and it works fine,
althoug I had to downgrade the kernel to the version 2.6.37.6 to get it
stable.
Now I´m configuring another cluster but two DRBD resources are required.
They were created and if I use the system init script to start them both get
up

apolo:~ # cat /proc/drbd
version: 8.3.9 (api:88/proto:86-95)
srcversion: A67EB2D25C5AFBFF3D8B788
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:672 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:680 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

When I try to start them using the cluster stack i get some random results
like this one:

apolo:~ # cat /proc/drbd
version: 8.3.9 (api:88/proto:86-95)
srcversion: A67EB2D25C5AFBFF3D8B788
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:672 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
    ns:0 nr:0 dw:0 dr:680 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

crm(live)resource# status
 Master/Slave Set: msDRBD_0 [resDRBD_0]
     Masters: [ apolo diana ]
 Master/Slave Set: msDRBD_1 [resDRBD_1]
     Masters: [ apolo ]
     Stopped: [ resDRBD_1:1 ]

This was solved with a cleanup on the resource resDRBD_1:1

crm(live)resource# cleanup resDRBD_1:1
Cleaning up resDRBD_1:1 on apolo
Cleaning up resDRBD_1:1 on diana
Waiting for 3 replies from the CRMd... OK
crm(live)resource# status
 Master/Slave Set: msDRBD_0 [resDRBD_0]
     Masters: [ apolo diana ]
 Master/Slave Set: msDRBD_1 [resDRBD_1]
     Masters: [ apolo diana ]

apolo:~ # cat /proc/drbd
version: 8.3.9 (api:88/proto:86-95)
srcversion: A67EB2D25C5AFBFF3D8B788
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:672 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:680 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

Trying to test a litle more the configuration I took down the resources and
tryed to get them up again

crm(live)resource# stop resDRBD_1
crm(live)resource# stop resDRBD_0
crm(live)resource# status
 Master/Slave Set: msDRBD_0 [resDRBD_0]
     Stopped: [ resDRBD_0:0 resDRBD_0:1 ]
 Master/Slave Set: msDRBD_1 [resDRBD_1]
     Stopped: [ resDRBD_1:0 resDRBD_1:1 ]
crm(live)resource# start resDRBD_1
crm(live)resource# start resDRBD_0
crm(live)resource# status
 Master/Slave Set: msDRBD_0 [resDRBD_0]
     Masters: [ apolo diana ]
 Master/Slave Set: msDRBD_1 [resDRBD_1]
     Masters: [ apolo diana ]

The crm says its all ok but if you go on the command line what you see is a
split brain:

apolo:~ # cat /proc/drbd
version: 8.3.9 (api:88/proto:86-95)
srcversion: A67EB2D25C5AFBFF3D8B788
 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
    ns:0 nr:0 dw:0 dr:672 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
    ns:0 nr:0 dw:0 dr:680 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

Those are my DRBD resource configurations and the cluster configuration:
https://dl.dropbox.com/u/96446079/backup.res
https://dl.dropbox.com/u/96446079/export.res
https://dl.dropbox.com/u/96446079/crm_resources.txt

Can you help me to fix this issue?

Best regards,
Carlos Xavier.

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

________________________________

Disclaimer:
Aus Rechts- und Sicherheitsgründen ist die in dieser E-Mail gegebene 
Information nicht rechtsverbindlich.
Eine rechtsverbindliche Bestätigung reichen wir Ihnen gerne auf Anforderung 
in schriftlicher Form nach.
Beachten Sie bitte, dass jede Form der unautorisierten Nutzung, 
Veröffentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail 
nicht gestattet ist.
Diese Nachricht ist ausschließlich für den bezeichneten Adressaten oder 
dessen Vertreter bestimmt.
Sollten Sie nicht der vorgesehene Adressat dieser E-Mail oder dessen 
Vertreter sein, so bitten wir Sie, sich mit dem Absender der E-Mail in 
Verbindung zu setzen und/oder diese Nachricht mit allen Anhängen zu löschen.

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org