[Pacemaker] 3rd node just for quorum

Wed Jun 22 07:05:04 UTC 2011

Just for the records: I had forgotten to setup a "order" constraint to
start the filesystem after the promotion of the master.

order drbd_before_grp_database inf: ms_drbd0:promote grp_database:start

regards
Klaus

Am 09.06.2011 16:18, schrieb Klaus Darilion:
> 
> 
> Am 09.06.2011 01:05, schrieb Anton Altaparmakov:
>> Hi Klaus,
>>
>> On 8 Jun 2011, at 22:21, Klaus Darilion wrote:
>>> Hi!
>>>
>>> Currently I have a 2 node cluster and I want to add a 3rd node to use
>>> quorum to avoid split brain.
>>>
>>> The service (DRBD+DB) should only run either on node1 or node2. Node3
>>> can not provide the service - it should just help the other nodes to
>>> find out if their network is broken or the other node's network is broken.
>>>
>>> Is my idea useful?
>>
>> Yes.  That is what we do for all our Pacemake based setups.
>>
>>> How do I add such a "simple" 3rd node - just by using location
>>> constraints for the service to be run only on node1 or node2?
>>
>> Here is an example:
>>
>> [...]
> 
> Hi Anton!
> 
> Thanks for toe config snippet. I try to add one thing after the other to
> my config and I am already stuck without adding the 3rd node.
> 
> Currently I just have configured the DRBD resource and the filesystem
> resource:
> 
> node db1-bh
> node db2-bh
> primitive drbd0 ocf:linbit:drbd \
>         params drbd_resource="r0" \
>         op monitor interval="15s"
> primitive drbd0_fs ocf:heartbeat:Filesystem \
>         params device="/dev/drbd0" directory="/mnt" fstype="ext4"
> group grp_database drbd0_fs
> ms ms_drbd0 drbd0 \
>         meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> colocation database_on_drbd0 inf: grp_database ms_drbd0:Master
> property $id="cib-bootstrap-options" \
>         dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
>         cluster-infrastructure="openais" \
>         expected-quorum-votes="2" \
>         stonith-enabled="false" \
>         no-quorum-policy="ignore" \
>         pe-error-series-max="100" \
>         pe-warn-series-max="100" \
>         pe-input-series-max="100"
> rsc_defaults $id="rsc-options" \
>         resource-stickiness="5"
> 
> 
> I start node 1. (node 2 is down). Here, the problem is, that the
> filesystem can not be started, crm_mon shows:
> 
> ============
> Last updated: Thu Jun  9 16:12:35 2011
> Stack: openais
> Current DC: db1-bh - partition WITHOUT quorum
> Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
> 
> Online: [ db1-bh ]
> OFFLINE: [ db2-bh ]
> 
>  Master/Slave Set: ms_drbd0
>      Masters: [ db1-bh ]
>      Stopped: [ drbd0:1 ]
> 
> Failed actions:
>     drbd0_fs_start_0 (node=db1-bh, call=7, rc=1, status=complete):
> unknown error
> 
> 
> 
> 
> Analysing the logfile it seems that the filesystem primitive is started
> before ms_drbd0 is promoted to Primary:
> 
> 
> 
> Jun  9 15:56:49 db1-bh pengine: [8667]: notice: clone_print:
> Master/Slave Set: ms_drbd0
> Jun  9 15:56:49 db1-bh pengine: [8667]: notice: short_print:
> Slaves: [ db1-bh ]
> Jun  9 15:56:49 db1-bh pengine: [8667]: notice: short_print:
> Stopped: [ drbd0:1 ]
> Jun  9 15:56:49 db1-bh pengine: [8667]: info: native_color: Resource
> drbd0:1 cannot run anywhere
> Jun  9 15:56:49 db1-bh pengine: [8667]: info: master_color: Promoting
> drbd0:0 (Slave db1-bh)
> Jun  9 15:56:49 db1-bh pengine: [8667]: info: master_color: ms_drbd0:
> Promoted 1 instances of a possible 1 to master
> Jun  9 15:56:49 db1-bh pengine: [8667]: info: master_color: Promoting
> drbd0:0 (Slave db1-bh)
> Jun  9 15:56:49 db1-bh pengine: [8667]: info: master_color: ms_drbd0:
> Promoted 1 instances of a possible 1 to master
> 
> ...
> 
> Jun  9 15:56:49 db1-bh Filesystem[8865]: INFO: Running start for
> /dev/drbd0 on /mnt
> Jun  9 15:56:49 db1-bh lrmd: [8665]: info: RA output:
> (drbd0_fs:start:stderr) FATAL: Module scsi_hostadapter not found.
> ...
> Jun  9 15:56:49 db1-bh Filesystem[8865]: ERROR: Couldn't sucessfully
> fsck filesystem for /dev/drbd0
> 
> ...
> 
> Jun  9 15:56:50 db1-bh kernel: [21875.203353] block drbd0: role(
> Secondary -> Primary )
> 
> 
> I suspect that Pacemaker tells DRBD to promote the Secondary to Primary
> and immediately starts the Filesystem primitive - before DRBD has
> promoted the resource to Primary.
> 
> 
> Any ideas how to solve this?
> 
> Thanks
> Klaus
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker