[Pacemaker] 3rd node just for quorum
Klaus Darilion
klaus.mailinglists at pernau.at
Wed Jun 22 07:05:04 UTC 2011
Just for the records: I had forgotten to setup a "order" constraint to
start the filesystem after the promotion of the master.
order drbd_before_grp_database inf: ms_drbd0:promote grp_database:start
regards
Klaus
Am 09.06.2011 16:18, schrieb Klaus Darilion:
>
>
> Am 09.06.2011 01:05, schrieb Anton Altaparmakov:
>> Hi Klaus,
>>
>> On 8 Jun 2011, at 22:21, Klaus Darilion wrote:
>>> Hi!
>>>
>>> Currently I have a 2 node cluster and I want to add a 3rd node to use
>>> quorum to avoid split brain.
>>>
>>> The service (DRBD+DB) should only run either on node1 or node2. Node3
>>> can not provide the service - it should just help the other nodes to
>>> find out if their network is broken or the other node's network is broken.
>>>
>>> Is my idea useful?
>>
>> Yes. That is what we do for all our Pacemake based setups.
>>
>>> How do I add such a "simple" 3rd node - just by using location
>>> constraints for the service to be run only on node1 or node2?
>>
>> Here is an example:
>>
>> [...]
>
> Hi Anton!
>
> Thanks for toe config snippet. I try to add one thing after the other to
> my config and I am already stuck without adding the 3rd node.
>
> Currently I just have configured the DRBD resource and the filesystem
> resource:
>
> node db1-bh
> node db2-bh
> primitive drbd0 ocf:linbit:drbd \
> params drbd_resource="r0" \
> op monitor interval="15s"
> primitive drbd0_fs ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" directory="/mnt" fstype="ext4"
> group grp_database drbd0_fs
> ms ms_drbd0 drbd0 \
> meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> colocation database_on_drbd0 inf: grp_database ms_drbd0:Master
> property $id="cib-bootstrap-options" \
> dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore" \
> pe-error-series-max="100" \
> pe-warn-series-max="100" \
> pe-input-series-max="100"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="5"
>
>
> I start node 1. (node 2 is down). Here, the problem is, that the
> filesystem can not be started, crm_mon shows:
>
> ============
> Last updated: Thu Jun 9 16:12:35 2011
> Stack: openais
> Current DC: db1-bh - partition WITHOUT quorum
> Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
>
> Online: [ db1-bh ]
> OFFLINE: [ db2-bh ]
>
> Master/Slave Set: ms_drbd0
> Masters: [ db1-bh ]
> Stopped: [ drbd0:1 ]
>
> Failed actions:
> drbd0_fs_start_0 (node=db1-bh, call=7, rc=1, status=complete):
> unknown error
>
>
>
>
> Analysing the logfile it seems that the filesystem primitive is started
> before ms_drbd0 is promoted to Primary:
>
>
>
> Jun 9 15:56:49 db1-bh pengine: [8667]: notice: clone_print:
> Master/Slave Set: ms_drbd0
> Jun 9 15:56:49 db1-bh pengine: [8667]: notice: short_print:
> Slaves: [ db1-bh ]
> Jun 9 15:56:49 db1-bh pengine: [8667]: notice: short_print:
> Stopped: [ drbd0:1 ]
> Jun 9 15:56:49 db1-bh pengine: [8667]: info: native_color: Resource
> drbd0:1 cannot run anywhere
> Jun 9 15:56:49 db1-bh pengine: [8667]: info: master_color: Promoting
> drbd0:0 (Slave db1-bh)
> Jun 9 15:56:49 db1-bh pengine: [8667]: info: master_color: ms_drbd0:
> Promoted 1 instances of a possible 1 to master
> Jun 9 15:56:49 db1-bh pengine: [8667]: info: master_color: Promoting
> drbd0:0 (Slave db1-bh)
> Jun 9 15:56:49 db1-bh pengine: [8667]: info: master_color: ms_drbd0:
> Promoted 1 instances of a possible 1 to master
>
> ...
>
> Jun 9 15:56:49 db1-bh Filesystem[8865]: INFO: Running start for
> /dev/drbd0 on /mnt
> Jun 9 15:56:49 db1-bh lrmd: [8665]: info: RA output:
> (drbd0_fs:start:stderr) FATAL: Module scsi_hostadapter not found.
> ...
> Jun 9 15:56:49 db1-bh Filesystem[8865]: ERROR: Couldn't sucessfully
> fsck filesystem for /dev/drbd0
>
> ...
>
> Jun 9 15:56:50 db1-bh kernel: [21875.203353] block drbd0: role(
> Secondary -> Primary )
>
>
> I suspect that Pacemaker tells DRBD to promote the Secondary to Primary
> and immediately starts the Filesystem primitive - before DRBD has
> promoted the resource to Primary.
>
>
> Any ideas how to solve this?
>
> Thanks
> Klaus
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
More information about the Pacemaker
mailing list