[Pacemaker] Resource ordering/colocating question (DRBD + LVM + FS)
Heikki Manninen
hma at iki.fi
Tue Sep 10 08:51:32 EDT 2013
Not sure whether I'm doing this the right way but here goes..
With resources started on node #1:
# crm_simulate -L -s -d pgdbsrv01.cl1.local
Current cluster status:
Online: [ pgdbsrv01.cl1.local pgdbsrv02.cl1.local ]
Master/Slave Set: DRBD_ms_data01 [DRBD_data01]
Masters: [ pgdbsrv01.cl1.local ]
Slaves: [ pgdbsrv02.cl1.local ]
Master/Slave Set: DRBD_ms_data02 [DRBD_data02]
Masters: [ pgdbsrv01.cl1.local ]
Slaves: [ pgdbsrv02.cl1.local ]
Resource Group: GRP_data01
LVM_vgdata01 (ocf::heartbeat:LVM): Started pgdbsrv01.cl1.local
FS_data01 (ocf::heartbeat:Filesystem): Started pgdbsrv01.cl1.local
Resource Group: GRP_data02
LVM_vgdata02 (ocf::heartbeat:LVM): Started pgdbsrv01.cl1.local
FS_data02 (ocf::heartbeat:Filesystem): Started pgdbsrv01.cl1.local
fusion-fencing (stonith:fence_fusion): Started pgdbsrv02.cl1.local
Performing requested modifications
+ Taking node pgdbsrv01.cl1.local offline
Allocation scores:
clone_color: DRBD_ms_data01 allocation score on pgdbsrv01.cl1.local: 0
clone_color: DRBD_ms_data01 allocation score on pgdbsrv02.cl1.local: 0
clone_color: DRBD_data01:0 allocation score on pgdbsrv01.cl1.local: 0
clone_color: DRBD_data01:0 allocation score on pgdbsrv02.cl1.local: 10000
clone_color: DRBD_data01:1 allocation score on pgdbsrv01.cl1.local: 0
clone_color: DRBD_data01:1 allocation score on pgdbsrv02.cl1.local: 10000
native_color: DRBD_data01:0 allocation score on pgdbsrv01.cl1.local: -INFINITY
native_color: DRBD_data01:0 allocation score on pgdbsrv02.cl1.local: 10000
native_color: DRBD_data01:1 allocation score on pgdbsrv01.cl1.local: -INFINITY
native_color: DRBD_data01:1 allocation score on pgdbsrv02.cl1.local: -INFINITY
DRBD_data01:0 promotion score on pgdbsrv02.cl1.local: 10000
DRBD_data01:1 promotion score on none: 0
clone_color: DRBD_ms_data02 allocation score on pgdbsrv01.cl1.local: 0
clone_color: DRBD_ms_data02 allocation score on pgdbsrv02.cl1.local: 0
clone_color: DRBD_data02:0 allocation score on pgdbsrv01.cl1.local: 0
clone_color: DRBD_data02:0 allocation score on pgdbsrv02.cl1.local: 10000
clone_color: DRBD_data02:1 allocation score on pgdbsrv01.cl1.local: 0
clone_color: DRBD_data02:1 allocation score on pgdbsrv02.cl1.local: 10000
native_color: DRBD_data02:0 allocation score on pgdbsrv01.cl1.local: -INFINITY
native_color: DRBD_data02:0 allocation score on pgdbsrv02.cl1.local: 10000
native_color: DRBD_data02:1 allocation score on pgdbsrv01.cl1.local: -INFINITY
native_color: DRBD_data02:1 allocation score on pgdbsrv02.cl1.local: -INFINITY
DRBD_data02:0 promotion score on pgdbsrv02.cl1.local: 10000
DRBD_data02:1 promotion score on none: 0
group_color: GRP_data01 allocation score on pgdbsrv01.cl1.local: 0
group_color: GRP_data01 allocation score on pgdbsrv02.cl1.local: 0
group_color: LVM_vgdata01 allocation score on pgdbsrv01.cl1.local: 0
group_color: LVM_vgdata01 allocation score on pgdbsrv02.cl1.local: 0
group_color: FS_data01 allocation score on pgdbsrv01.cl1.local: 0
group_color: FS_data01 allocation score on pgdbsrv02.cl1.local: 0
native_color: LVM_vgdata01 allocation score on pgdbsrv01.cl1.local: -INFINITY
native_color: LVM_vgdata01 allocation score on pgdbsrv02.cl1.local: 10000
native_color: FS_data01 allocation score on pgdbsrv01.cl1.local: -INFINITY
native_color: FS_data01 allocation score on pgdbsrv02.cl1.local: 0
group_color: GRP_data02 allocation score on pgdbsrv01.cl1.local: 0
group_color: GRP_data02 allocation score on pgdbsrv02.cl1.local: 0
group_color: LVM_vgdata02 allocation score on pgdbsrv01.cl1.local: 0
group_color: LVM_vgdata02 allocation score on pgdbsrv02.cl1.local: 0
group_color: FS_data02 allocation score on pgdbsrv01.cl1.local: 0
group_color: FS_data02 allocation score on pgdbsrv02.cl1.local: 0
native_color: LVM_vgdata02 allocation score on pgdbsrv01.cl1.local: -INFINITY
native_color: LVM_vgdata02 allocation score on pgdbsrv02.cl1.local: 10000
native_color: FS_data02 allocation score on pgdbsrv01.cl1.local: -INFINITY
native_color: FS_data02 allocation score on pgdbsrv02.cl1.local: 0
native_color: fusion-fencing allocation score on pgdbsrv01.cl1.local: 0
native_color: fusion-fencing allocation score on pgdbsrv02.cl1.local: 0
Transition Summary:
* Promote DRBD_data01:0 (Slave -> Master pgdbsrv02.cl1.local)
* Demote DRBD_data01:1 (Master -> Stopped pgdbsrv01.cl1.local)
* Promote DRBD_data02:0 (Slave -> Master pgdbsrv02.cl1.local)
* Demote DRBD_data02:1 (Master -> Stopped pgdbsrv01.cl1.local)
* Move LVM_vgdata01 (Started pgdbsrv01.cl1.local -> pgdbsrv02.cl1.local)
* Move FS_data01 (Started pgdbsrv01.cl1.local -> pgdbsrv02.cl1.local)
* Move LVM_vgdata02 (Started pgdbsrv01.cl1.local -> pgdbsrv02.cl1.local)
* Move FS_data02 (Started pgdbsrv01.cl1.local -> pgdbsrv02.cl1.local)
..taking node #1 offline (standby) for real, resources running on node #2, then:
# crm_simulate -L -s -u pgdbsrv01.cl1.local
Current cluster status:
Node pgdbsrv01.cl1.local: standby
Online: [ pgdbsrv02.cl1.local ]
Master/Slave Set: DRBD_ms_data01 [DRBD_data01]
Masters: [ pgdbsrv02.cl1.local ]
Stopped: [ DRBD_data01:1 ]
Master/Slave Set: DRBD_ms_data02 [DRBD_data02]
Masters: [ pgdbsrv02.cl1.local ]
Stopped: [ DRBD_data02:1 ]
Resource Group: GRP_data01
LVM_vgdata01 (ocf::heartbeat:LVM): Started pgdbsrv02.cl1.local
FS_data01 (ocf::heartbeat:Filesystem): Started pgdbsrv02.cl1.local
Resource Group: GRP_data02
LVM_vgdata02 (ocf::heartbeat:LVM): Started pgdbsrv02.cl1.local
FS_data02 (ocf::heartbeat:Filesystem): Started pgdbsrv02.cl1.local
fusion-fencing (stonith:fence_fusion): Started pgdbsrv02.cl1.local
Performing requested modifications
+ Bringing node pgdbsrv01.cl1.local online
Allocation scores:
clone_color: DRBD_ms_data01 allocation score on pgdbsrv01.cl1.local: 0
clone_color: DRBD_ms_data01 allocation score on pgdbsrv02.cl1.local: 0
clone_color: DRBD_data01:0 allocation score on pgdbsrv01.cl1.local: 0
clone_color: DRBD_data01:0 allocation score on pgdbsrv02.cl1.local: 10000
clone_color: DRBD_data01:1 allocation score on pgdbsrv01.cl1.local: 0
clone_color: DRBD_data01:1 allocation score on pgdbsrv02.cl1.local: 0
native_color: DRBD_data01:0 allocation score on pgdbsrv01.cl1.local: -INFINITY
native_color: DRBD_data01:0 allocation score on pgdbsrv02.cl1.local: 10000
native_color: DRBD_data01:1 allocation score on pgdbsrv01.cl1.local: -INFINITY
native_color: DRBD_data01:1 allocation score on pgdbsrv02.cl1.local: -INFINITY
DRBD_data01:0 promotion score on pgdbsrv02.cl1.local: 10000
DRBD_data01:1 promotion score on none: 0
clone_color: DRBD_ms_data02 allocation score on pgdbsrv01.cl1.local: 0
clone_color: DRBD_ms_data02 allocation score on pgdbsrv02.cl1.local: 0
clone_color: DRBD_data02:0 allocation score on pgdbsrv01.cl1.local: 0
clone_color: DRBD_data02:0 allocation score on pgdbsrv02.cl1.local: 10000
clone_color: DRBD_data02:1 allocation score on pgdbsrv01.cl1.local: 0
clone_color: DRBD_data02:1 allocation score on pgdbsrv02.cl1.local: 0
native_color: DRBD_data02:0 allocation score on pgdbsrv01.cl1.local: -INFINITY
native_color: DRBD_data02:0 allocation score on pgdbsrv02.cl1.local: 10000
native_color: DRBD_data02:1 allocation score on pgdbsrv01.cl1.local: -INFINITY
native_color: DRBD_data02:1 allocation score on pgdbsrv02.cl1.local: -INFINITY
DRBD_data02:0 promotion score on pgdbsrv02.cl1.local: 10000
DRBD_data02:1 promotion score on none: 0
group_color: GRP_data01 allocation score on pgdbsrv01.cl1.local: 0
group_color: GRP_data01 allocation score on pgdbsrv02.cl1.local: 0
group_color: LVM_vgdata01 allocation score on pgdbsrv01.cl1.local: 0
group_color: LVM_vgdata01 allocation score on pgdbsrv02.cl1.local: 0
group_color: FS_data01 allocation score on pgdbsrv01.cl1.local: 0
group_color: FS_data01 allocation score on pgdbsrv02.cl1.local: 0
native_color: LVM_vgdata01 allocation score on pgdbsrv01.cl1.local: -INFINITY
native_color: LVM_vgdata01 allocation score on pgdbsrv02.cl1.local: 10000
native_color: FS_data01 allocation score on pgdbsrv01.cl1.local: -INFINITY
native_color: FS_data01 allocation score on pgdbsrv02.cl1.local: 0
group_color: GRP_data02 allocation score on pgdbsrv01.cl1.local: 0
group_color: GRP_data02 allocation score on pgdbsrv02.cl1.local: 0
group_color: LVM_vgdata02 allocation score on pgdbsrv01.cl1.local: 0
group_color: LVM_vgdata02 allocation score on pgdbsrv02.cl1.local: 0
group_color: FS_data02 allocation score on pgdbsrv01.cl1.local: 0
group_color: FS_data02 allocation score on pgdbsrv02.cl1.local: 0
native_color: LVM_vgdata02 allocation score on pgdbsrv01.cl1.local: -INFINITY
native_color: LVM_vgdata02 allocation score on pgdbsrv02.cl1.local: 10000
native_color: FS_data02 allocation score on pgdbsrv01.cl1.local: -INFINITY
native_color: FS_data02 allocation score on pgdbsrv02.cl1.local: 0
native_color: fusion-fencing allocation score on pgdbsrv01.cl1.local: 0
native_color: fusion-fencing allocation score on pgdbsrv02.cl1.local: 0
Transition Summary:
And that's it. Once I unstandby node #1 and rerun the same simulate:
# crm_simulate -L -s -u pgdbsrv01.cl1.local
Current cluster status:
Online: [ pgdbsrv01.cl1.local pgdbsrv02.cl1.local ]
Master/Slave Set: DRBD_ms_data01 [DRBD_data01]
Masters: [ pgdbsrv02.cl1.local ]
Slaves: [ pgdbsrv01.cl1.local ]
Master/Slave Set: DRBD_ms_data02 [DRBD_data02]
Masters: [ pgdbsrv02.cl1.local ]
Slaves: [ pgdbsrv01.cl1.local ]
Resource Group: GRP_data01
LVM_vgdata01 (ocf::heartbeat:LVM): Stopped
FS_data01 (ocf::heartbeat:Filesystem): Stopped
Resource Group: GRP_data02
LVM_vgdata02 (ocf::heartbeat:LVM): Stopped
FS_data02 (ocf::heartbeat:Filesystem): Stopped
fusion-fencing (stonith:fence_fusion): Started pgdbsrv01.cl1.local
Performing requested modifications
+ Bringing node pgdbsrv01.cl1.local online
Allocation scores:
clone_color: DRBD_ms_data01 allocation score on pgdbsrv01.cl1.local: 0
clone_color: DRBD_ms_data01 allocation score on pgdbsrv02.cl1.local: 0
clone_color: DRBD_data01:0 allocation score on pgdbsrv01.cl1.local: 10000
clone_color: DRBD_data01:0 allocation score on pgdbsrv02.cl1.local: 10000
clone_color: DRBD_data01:1 allocation score on pgdbsrv01.cl1.local: 10000
clone_color: DRBD_data01:1 allocation score on pgdbsrv02.cl1.local: 10000
native_color: DRBD_data01:1 allocation score on pgdbsrv01.cl1.local: 10000
native_color: DRBD_data01:1 allocation score on pgdbsrv02.cl1.local: 10000
native_color: DRBD_data01:0 allocation score on pgdbsrv01.cl1.local: 10000
native_color: DRBD_data01:0 allocation score on pgdbsrv02.cl1.local: -INFINITY
DRBD_data01:1 promotion score on pgdbsrv02.cl1.local: 10000
DRBD_data01:0 promotion score on pgdbsrv01.cl1.local: 10000
clone_color: DRBD_ms_data02 allocation score on pgdbsrv01.cl1.local: 0
clone_color: DRBD_ms_data02 allocation score on pgdbsrv02.cl1.local: 0
clone_color: DRBD_data02:0 allocation score on pgdbsrv01.cl1.local: 10000
clone_color: DRBD_data02:0 allocation score on pgdbsrv02.cl1.local: 10000
clone_color: DRBD_data02:1 allocation score on pgdbsrv01.cl1.local: 10000
clone_color: DRBD_data02:1 allocation score on pgdbsrv02.cl1.local: 10000
native_color: DRBD_data02:1 allocation score on pgdbsrv01.cl1.local: 10000
native_color: DRBD_data02:1 allocation score on pgdbsrv02.cl1.local: 10000
native_color: DRBD_data02:0 allocation score on pgdbsrv01.cl1.local: 10000
native_color: DRBD_data02:0 allocation score on pgdbsrv02.cl1.local: -INFINITY
DRBD_data02:1 promotion score on pgdbsrv02.cl1.local: 10000
DRBD_data02:0 promotion score on pgdbsrv01.cl1.local: 10000
group_color: GRP_data01 allocation score on pgdbsrv01.cl1.local: 0
group_color: GRP_data01 allocation score on pgdbsrv02.cl1.local: 0
group_color: LVM_vgdata01 allocation score on pgdbsrv01.cl1.local: 0
group_color: LVM_vgdata01 allocation score on pgdbsrv02.cl1.local: 0
group_color: FS_data01 allocation score on pgdbsrv01.cl1.local: 0
group_color: FS_data01 allocation score on pgdbsrv02.cl1.local: 0
native_color: LVM_vgdata01 allocation score on pgdbsrv01.cl1.local: -INFINITY
native_color: LVM_vgdata01 allocation score on pgdbsrv02.cl1.local: 10000
native_color: FS_data01 allocation score on pgdbsrv01.cl1.local: -INFINITY
native_color: FS_data01 allocation score on pgdbsrv02.cl1.local: 0
group_color: GRP_data02 allocation score on pgdbsrv01.cl1.local: 0
group_color: GRP_data02 allocation score on pgdbsrv02.cl1.local: 0
group_color: LVM_vgdata02 allocation score on pgdbsrv01.cl1.local: 0
group_color: LVM_vgdata02 allocation score on pgdbsrv02.cl1.local: 0
group_color: FS_data02 allocation score on pgdbsrv01.cl1.local: 0
group_color: FS_data02 allocation score on pgdbsrv02.cl1.local: 0
native_color: LVM_vgdata02 allocation score on pgdbsrv01.cl1.local: -INFINITY
native_color: LVM_vgdata02 allocation score on pgdbsrv02.cl1.local: 10000
native_color: FS_data02 allocation score on pgdbsrv01.cl1.local: -INFINITY
native_color: FS_data02 allocation score on pgdbsrv02.cl1.local: 0
native_color: fusion-fencing allocation score on pgdbsrv01.cl1.local: 0
native_color: fusion-fencing allocation score on pgdbsrv02.cl1.local: 0
Transition Summary:
* Start LVM_vgdata01 (pgdbsrv02.cl1.local - blocked)
* Start FS_data01 (pgdbsrv02.cl1.local - blocked)
* Start LVM_vgdata02 (pgdbsrv02.cl1.local - blocked)
* Start FS_data02 (pgdbsrv02.cl1.local - blocked)
--
Heikki M
On 9.9.2013, at 12.02, Andreas Mock <andreas.mock at web.de> wrote:
> Hi Heikki,
>
> it has to be crm_simulate -L -s. Sorry for the wrong command line
> parameters.
>
> Best regards
> Andreas
>
>
> -----Ursprüngliche Nachricht-----
> Von: Heikki Manninen [mailto:hma at iki.fi]
> Gesendet: Montag, 9. September 2013 10:46
> An: The Pacemaker cluster resource manager
> Betreff: Re: [Pacemaker] Resource ordering/colocating question (DRBD + LVM +
> FS)
>
> Hello Andreas, thanks for your input, much appreciated.
>
> On 5.9.2013, at 16.39, "Andreas Mock" <andreas.mock at web.de> wrote:
>
>> 1) The second output of crm_mon show a resource IP_database
>> which is not shown in the initial crm_mon output and also
>> not in the config. => Reduce your problem/config to the
>> minimum being reproducible.
>
> True. I edited out the resource from the e-mail that did not have anything
> to do with the problem as such (works ok all the time). Just forgot to
> remove it from the second copy-paste also. And yes, no more IP resource in
> the configuration.
>
>> 2) Enable logging and look out which node is the DC.
>> There in the logs you find many many informations showing
>> what is going on. Hint: Open a terminal session with an
>> opened tail -f logfile. Watch it while inserting commands.
>> You'll get used to it.
>
> Seems that node #2 was the DC (also visible in the pcs status output). I
> have looked at the logs all the time, just not yet too familiar with the
> contents of pacemaker logging. Here's the thing that keeps repeating
> everytime those LVM and FS resources stay in stopped state:
>
> Sep 3 20:01:23 pgdbsrv02 pengine[1667]: notice: LogActions: Start
> LVM_vgdata01#011(pgdbsrv01.cl1.local - blocked)
> Sep 3 20:01:23 pgdbsrv02 pengine[1667]: notice: LogActions: Start
> FS_data01#011(pgdbsrv01.cl1.local - blocked)
> Sep 3 20:01:23 pgdbsrv02 pengine[1667]: notice: LogActions: Start
> LVM_vgdata02#011(pgdbsrv01.cl1.local - blocked)
> Sep 3 20:01:23 pgdbsrv02 pengine[1667]: notice: LogActions: Start
> FS_data02#011(pgdbsrv01.cl1.local - blocked)
>
> So what does blocked mean here? Is it that the node #1 in this case is in
> need of fencing/stonithing and thus being blocked or something else (I have
> a backgroud in the RHCS/HACMP/LifeKeeper etc. world). No quorum policy is
> set to ignore.
>
>> 3) The shown status of a drbd resource (crm_mon) doesn't show
>> you all informations of the drbd devices. Have a look at
>> drbd-overview on both nodes. (e.g. syncing status).
>
> True, DRBD is working fine on these occations. Connected, Synced etc.
>
>> 4) This setup CRIES for stonithing. Even in a test environment.
>> When stonith happens (this is what you see immediately) you
>> know something went wrong. This is a good indicator for
>> errors in agents or in the config. Believe me, as tedious stonithing
>> is the valuable it is for getting hints for bad cluster state.
>> On virtual machines stonithing is not as painful as on real
>> servers.
>
> Very much true. I have implemented some custom fencing/stonithing agents
> before on physical and virtual cluster environments. Problem being here is
> that I'm not aware of reasonably simple ways to implement stonith with
> VMware Fusion that I'm bound to use for this test setup. Have to dig more
> into this though. So fencing from cman cluster.conf is chained to pacemaker
> fencing and pacemaker stonithing is disabled, no quorum policy is ignore.
>
>> 5) Is the drbd fencing script enabled? If yes, in certain circumstances
>> -INF rules are inserted to deny promoting of "wrong" nodes.
>> You should grep for them 'cibadmin -Q | grep <resname>'
>
> No, DRBD fencing is not enabled and split-brain recovery is done manually.
>
>> 6) crm_simulate -L -v gives you an output of the scores of
>> the resources on each node. I really don't know how to read it
>> exactly (Is there a documentation of that anywhere?), but it
>> gives you a hint where to look at, when resources don't start.
>> Especially the aggregation of stickiness values in groups are
>> sometimes misleading.
>
> Could be that I have some different version maybe, because -v is unknown
> option and:
>
> # crm_simulate -L -V
>
> Current cluster status:
> Online: [ pgdbsrv01.cl1.local pgdbsrv02.cl1.local ]
>
> Master/Slave Set: DRBD_ms_data01 [DRBD_data01]
> Masters: [ pgdbsrv01.cl1.local ]
> Slaves: [ pgdbsrv02.cl1.local ]
> Master/Slave Set: DRBD_ms_data02 [DRBD_data02]
> Masters: [ pgdbsrv01.cl1.local ]
> Slaves: [ pgdbsrv02.cl1.local ]
> Resource Group: GRP_data01
> LVM_vgdata01 (ocf::heartbeat:LVM): Stopped
> FS_data01 (ocf::heartbeat:Filesystem): Stopped
> Resource Group: GRP_data02
> LVM_vgdata02 (ocf::heartbeat:LVM): Stopped
> FS_data02 (ocf::heartbeat:Filesystem): Stopped
>
>
> Only shows that much.
>
> Original problem description left quoted below.
>
>
> --
> Heikki M
>
>
>> -----Ursprüngliche Nachricht-----
>> Von: Heikki Manninen [mailto:hma at iki.fi]
>> Gesendet: Donnerstag, 5. September 2013 14:08
>> An: pacemaker at oss.clusterlabs.org
>> Betreff: [Pacemaker] Resource ordering/colocating question (DRBD + LVM +
> FS)
>>
>> Hello,
>>
>> I'm having a bit of a problem understanding what's going on with my simple
>> two-node demo cluster here. My resources come up correctly after
> restarting
>> the whole cluster but the LVM and Filesystem resources fail to start after
> a
>> single node restart or standby/unstandby (after node comes back online -
> why
>> do they even stop/start after the second node comes back?).
>>
>> OS: CentOS 6.4 (cman stack)
>> Pacemaker: pacemaker-1.1.8-7.el6.x86_64
>> DRBD: drbd84-utils-8.4.3-1.el6.elrepo.x86_64
>>
>> Everything is configured using: pcs-0.9.26-10.el6_4.1.noarch
>>
>> Two DRBD resources configured and working: data01 & data02
>> Two nodes: pgdbsrv01.cl1.local & pgdbsrv02.cl1.local
>>
>> Configuration:
>>
>> node pgdbsrv01.cl1.local
>> node pgdbsrv02.cl1.local
>> primitive DRBD_data01 ocf:linbit:drbd \
>> params drbd_resource="data01" \
>> op monitor interval="30s"
>> primitive DRBD_data02 ocf:linbit:drbd \
>> params drbd_resource="data02" \
>> op monitor interval="30s"
>> primitive FS_data01 ocf:heartbeat:Filesystem \
>> params device="/dev/mapper/vgdata01-lvdata01" directory="/data01"
>> fstype="ext4" \
>> op monitor interval="30s"
>> primitive FS_data02 ocf:heartbeat:Filesystem \
>> params device="/dev/mapper/vgdata02-lvdata02" directory="/data02"
>> fstype="ext4" \
>> op monitor interval="30s"
>> primitive LVM_vgdata01 ocf:heartbeat:LVM \
>> params volgrpname="vgdata01" exclusive="true" \
>> op monitor interval="30s"
>> primitive LVM_vgdata02 ocf:heartbeat:LVM \
>> params volgrpname="vgdata02" exclusive="true" \
>> op monitor interval="30s"
>> group GRP_data01 LVM_vgdata01 FS_data01
>> group GRP_data02 LVM_vgdata02 FS_data02
>> ms DRBD_ms_data01 DRBD_data01 \
>> meta master-max="1" master-node-max="1" clone-max="2"
>> clone-node-max="1" notify="true"
>> ms DRBD_ms_data02 DRBD_data02 \
>> meta master-max="1" master-node-max="1" clone-max="2"
>> clone-node-max="1" notify="true"
>> colocation colocation-GRP_data01-DRBD_ms_data01-INFINITY inf: GRP_data01
>> DRBD_ms_data01:Master
>> colocation colocation-GRP_data02-DRBD_ms_data02-INFINITY inf: GRP_data02
>> DRBD_ms_data02:Master
>> order order-DRBD_data01-GRP_data01-mandatory : DRBD_data01:promote
>> GRP_data01:start
>> order order-DRBD_data02-GRP_data02-mandatory : DRBD_data02:promote
>> GRP_data02:start
>> property $id="cib-bootstrap-options" \
>> dc-version="1.1.8-7.el6-394e906" \
>> cluster-infrastructure="cman" \
>> stonith-enabled="false" \
>> no-quorum-policy="ignore" \
>> migration-threshold="1"
>> rsc_defaults $id="rsc_defaults-options" \
>> resource-stickiness="100"
>>
>>
>> 1) After starting the cluster, everything runs happily:
>>
>> Last updated: Tue Sep 3 00:11:13 2013
>> Last change: Tue Sep 3 00:05:15 2013 via cibadmin on pgdbsrv01.cl1.local
>> Stack: cman
>> Current DC: pgdbsrv02.cl1.local - partition with quorum
>> Version: 1.1.8-7.el6-394e906
>> 2 Nodes configured, unknown expected votes
>> 9 Resources configured.
>>
>> Online: [ pgdbsrv01.cl1.local pgdbsrv02.cl1.local ]
>>
>> Full list of resources:
>>
>> Master/Slave Set: DRBD_ms_data01 [DRBD_data01]
>> Masters: [ pgdbsrv01.cl1.local ]
>> Slaves: [ pgdbsrv02.cl1.local ]
>> Master/Slave Set: DRBD_ms_data02 [DRBD_data02]
>> Masters: [ pgdbsrv01.cl1.local ]
>> Slaves: [ pgdbsrv02.cl1.local ]
>> Resource Group: GRP_data01
>> LVM_vgdata01 (ocf::heartbeat:LVM): Started pgdbsrv01.cl1.local
>> FS_data01 (ocf::heartbeat:Filesystem): Started pgdbsrv01.cl1.local
>> Resource Group: GRP_data02
>> LVM_vgdata02 (ocf::heartbeat:LVM): Started pgdbsrv01.cl1.local
>> FS_data02 (ocf::heartbeat:Filesystem): Started pgdbsrv01.cl1.local
>>
>> 2) Putting node #1 to standby mode - after which everything runs happily
> on
>> node pgdbsrv02.cl1.local
>>
>> # pcs cluster standby pgdbsrv01.cl1.local
>> # pcs status
>> Last updated: Tue Sep 3 00:16:01 2013
>> Last change: Tue Sep 3 00:15:55 2013 via crm_attribute on
>> pgdbsrv02.cl1.local
>> Stack: cman
>> Current DC: pgdbsrv02.cl1.local - partition with quorum
>> Version: 1.1.8-7.el6-394e906
>> 2 Nodes configured, unknown expected votes
>> 9 Resources configured.
>>
>>
>> Node pgdbsrv01.cl1.local: standby
>> Online: [ pgdbsrv02.cl1.local ]
>>
>> Full list of resources:
>>
>> IP_database (ocf::heartbeat:IPaddr2): Started pgdbsrv02.cl1.local
>> Master/Slave Set: DRBD_ms_data01 [DRBD_data01]
>> Masters: [ pgdbsrv02.cl1.local ]
>> Stopped: [ DRBD_data01:1 ]
>> Master/Slave Set: DRBD_ms_data02 [DRBD_data02]
>> Masters: [ pgdbsrv02.cl1.local ]
>> Stopped: [ DRBD_data02:1 ]
>> Resource Group: GRP_data01
>> LVM_vgdata01 (ocf::heartbeat:LVM): Started pgdbsrv02.cl1.local
>> FS_data01 (ocf::heartbeat:Filesystem): Started
>> pgdbsrv02.cl1.local
>> Resource Group: GRP_data02
>> LVM_vgdata02 (ocf::heartbeat:LVM): Started pgdbsrv02.cl1.local
>> FS_data02 (ocf::heartbeat:Filesystem): Started
>> pgdbsrv02.cl1.local
>>
>> 3) Putting node #1 back online - it seems that all the resources stop (?)
>> and then DRBD gets promoted successfully on node #2 but LVM and FS
> resources
>> never start
>>
>> # pcs cluster unstandby pgdbsrv01.cl1.local
>> # pcs status
>> Last updated: Tue Sep 3 00:17:00 2013
>> Last change: Tue Sep 3 00:16:56 2013 via crm_attribute on
>> pgdbsrv02.cl1.local
>> Stack: cman
>> Current DC: pgdbsrv02.cl1.local - partition with quorum
>> Version: 1.1.8-7.el6-394e906
>> 2 Nodes configured, unknown expected votes
>> 9 Resources configured.
>>
>>
>> Online: [ pgdbsrv01.cl1.local pgdbsrv02.cl1.local ]
>>
>> Full list of resources:
>>
>> IP_database (ocf::heartbeat:IPaddr2): Started pgdbsrv02.cl1.local
>> Master/Slave Set: DRBD_ms_data01 [DRBD_data01]
>> Masters: [ pgdbsrv02.cl1.local ]
>> Slaves: [ pgdbsrv01.cl1.local ]
>> Master/Slave Set: DRBD_ms_data02 [DRBD_data02]
>> Masters: [ pgdbsrv02.cl1.local ]
>> Slaves: [ pgdbsrv01.cl1.local ]
>> Resource Group: GRP_data01
>> LVM_vgdata01 (ocf::heartbeat:LVM): Stopped
>> FS_data01 (ocf::heartbeat:Filesystem): Stopped
>> Resource Group: GRP_data02
>> LVM_vgdata02 (ocf::heartbeat:LVM): Stopped
>> FS_data02 (ocf::heartbeat:Filesystem): Stopped
>>
>>
>>
>> Any ideas why this is happening/what could be wrong in the resource
>> configuration? The same thing happens when testing the situation with the
>> resources located vice-versa in the beginning. Also, if I stop & start one
>> of the nodes, same thing happens once the node gets back online.
>>
>>
>> --
>> Heikki Manninen <hma at iki.fi>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list