[Pacemaker] 3rd node just for quorum

Thu Jun 9 14:18:40 UTC 2011

Am 09.06.2011 01:05, schrieb Anton Altaparmakov:
> Hi Klaus,
> 
> On 8 Jun 2011, at 22:21, Klaus Darilion wrote:
>> Hi!
>>
>> Currently I have a 2 node cluster and I want to add a 3rd node to use
>> quorum to avoid split brain.
>>
>> The service (DRBD+DB) should only run either on node1 or node2. Node3
>> can not provide the service - it should just help the other nodes to
>> find out if their network is broken or the other node's network is broken.
>>
>> Is my idea useful?
> 
> Yes.  That is what we do for all our Pacemake based setups.
> 
>> How do I add such a "simple" 3rd node - just by using location
>> constraints for the service to be run only on node1 or node2?
> 
> Here is an example:
>
> [...]

Hi Anton!

Thanks for toe config snippet. I try to add one thing after the other to
my config and I am already stuck without adding the 3rd node.

Currently I just have configured the DRBD resource and the filesystem
resource:

node db1-bh
node db2-bh
primitive drbd0 ocf:linbit:drbd \
        params drbd_resource="r0" \
        op monitor interval="15s"
primitive drbd0_fs ocf:heartbeat:Filesystem \
        params device="/dev/drbd0" directory="/mnt" fstype="ext4"
group grp_database drbd0_fs
ms ms_drbd0 drbd0 \
        meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
colocation database_on_drbd0 inf: grp_database ms_drbd0:Master
property $id="cib-bootstrap-options" \
        dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        pe-error-series-max="100" \
        pe-warn-series-max="100" \
        pe-input-series-max="100"
rsc_defaults $id="rsc-options" \
        resource-stickiness="5"

I start node 1. (node 2 is down). Here, the problem is, that the
filesystem can not be started, crm_mon shows:

============
Last updated: Thu Jun  9 16:12:35 2011
Stack: openais
Current DC: db1-bh - partition WITHOUT quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ db1-bh ]
OFFLINE: [ db2-bh ]

 Master/Slave Set: ms_drbd0
     Masters: [ db1-bh ]
     Stopped: [ drbd0:1 ]

Failed actions:
    drbd0_fs_start_0 (node=db1-bh, call=7, rc=1, status=complete):
unknown error

Analysing the logfile it seems that the filesystem primitive is started
before ms_drbd0 is promoted to Primary:

Jun  9 15:56:49 db1-bh pengine: [8667]: notice: clone_print:
Master/Slave Set: ms_drbd0
Jun  9 15:56:49 db1-bh pengine: [8667]: notice: short_print:
Slaves: [ db1-bh ]
Jun  9 15:56:49 db1-bh pengine: [8667]: notice: short_print:
Stopped: [ drbd0:1 ]
Jun  9 15:56:49 db1-bh pengine: [8667]: info: native_color: Resource
drbd0:1 cannot run anywhere
Jun  9 15:56:49 db1-bh pengine: [8667]: info: master_color: Promoting
drbd0:0 (Slave db1-bh)
Jun  9 15:56:49 db1-bh pengine: [8667]: info: master_color: ms_drbd0:
Promoted 1 instances of a possible 1 to master
Jun  9 15:56:49 db1-bh pengine: [8667]: info: master_color: Promoting
drbd0:0 (Slave db1-bh)
Jun  9 15:56:49 db1-bh pengine: [8667]: info: master_color: ms_drbd0:
Promoted 1 instances of a possible 1 to master

...

Jun  9 15:56:49 db1-bh Filesystem[8865]: INFO: Running start for
/dev/drbd0 on /mnt
Jun  9 15:56:49 db1-bh lrmd: [8665]: info: RA output:
(drbd0_fs:start:stderr) FATAL: Module scsi_hostadapter not found.
...
Jun  9 15:56:49 db1-bh Filesystem[8865]: ERROR: Couldn't sucessfully
fsck filesystem for /dev/drbd0

...

Jun  9 15:56:50 db1-bh kernel: [21875.203353] block drbd0: role(
Secondary -> Primary )

I suspect that Pacemaker tells DRBD to promote the Secondary to Primary
and immediately starts the Filesystem primitive - before DRBD has
promoted the resource to Primary.

Any ideas how to solve this?

Thanks
Klaus