[Pacemaker] problem with colocating and ordering sets of resources

Andreas Kurz andreas at hastexo.com
Wed Nov 23 17:12:17 EST 2011


On 11/22/2011 05:12 PM, Ronney Meier - Rorotec Informatik GmbH wrote:
>>>>> Hi all
>>>>>
>>>>> I have a 2 node setup with lvm on top oft wo drbd devices and two
>>>>> xen
>>>> virtual machines running.
>>>>> The pacemaker version used is 1.0.11 .
>>>>> All of the resources should always be running on the same node.
>>>>> This usually works, but last week after a crash it happened, that
>>>>> pacemaker
>>>> made one of the drbd devices master on one node, and the other drbd
>>>> device on the other node. So it couldn't bring up the lvm anymore and
>>>> therefore also none of the other resources were able to run.
>>>>> My colocation and order constraints look like that here:
>>>>>
>>>>> colocation xen_on_drbd inf: ( xen-windows_server_sb
>>>>> xen-windows_server_standard ) fs-xendata lv-drbd_data (
>>>>> ms-drbd_r0:Master ms-drbd_r1:Master )
>>>>>
>>>>> order ord-xen_after_drbd inf: ( ms-drbd_r0:promote
>>>>> ms-drbd_r1:promote
>>>>> ) lv-drbd_data:start fs-xendata:start ( xen-windows_server_sb:start
>>>>> xen-windows_server_standard:start )
>>>>>
>>>>> So that it should first promote the two drbd devices, then den
>>>>> bringing up
>>>> the lvm (=lv-drbd_data), then mounting the filesystem with the xen
>>>> images
>>>> (=fs_xendata) and then starting up the virtual machines.
>>>>> Resources in the colocation constraint inside of parenthesis should
>>>>> be
>>>> independent from each other.
>>>>>
>>>>> If I didn't completely missunderstood how the colocation of sets
>>>>> works, it
>>>> should be impossible, that pacemaker promotes the two drbd devices on
>>>> different nodes.
>>>>> Or did I make an error with these constraints?
>>>>
>>>> Yes, you make an error ;-) ... if you set the parenthesis around
>>>> resources you set the sequential attribute to false. For an order resource
>> sets that means:
>>>> unordered. For a colocation set it means: not colocated ... and that
>>>> is what you saw in you setup.
>>>>
>>>>
>>>> Remove the parenthesis around the DRBD masters in your colocation and
>>>> everything should be fine.
>>>>
>>>> Are these two DRBD devices acting as two PVs for one VG? This is not
>>>> a recommended setup because ... as you already saw ... the two DRBD
>>>> master could end on different nodes or be disconnected individually
>>>> having a different data generation in case of an failover and then
>>>> you have serious problems with your VG. This can be handled properly
>>>> in DRBD 8.4.x. that supports serveral volumes per connection.
>>>>
>>>> Regards,
>>>> Andreas
>>>
>>> Hi andreas
>>>
>>> Thanks a lot for your answer.
>>>
>>> I was aware about what the sequential attribute means for an order
>> resource...
>>> From the documentation I understood that for the colocation attribute it
>> means that they are collocated, but don't depend on each other, so if one of
>> them doesn't run pacemaker will not also shoot down the other ones
>> (especially important for the virtual machines).
>>
>> Setting the sequential attribute to false makes them unrelated to each other
>> -- no colocation (or order) at all ... not unrelated to other resources within
>> the set.
> 
> I just took a look again on
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-resource-sets-collocation.html
> 
> There it states that: 
> "
> This notation can also be used in this context to tell the cluster that a set of resources must all be located with a common peer, but have no dependencies on each other. In this scenario, unlike the previous on, B would be allowed to remain active even if A or C (or both) were inactive.
> "
> And below it has an example where it uses sequential="false".
> Actually the explanations and examples on this page don't really fit 100% together and are quite difficult to interprete but  I still have the feeling that it states, that setting the sequential attribute to false, will still force them to be collocated, but they don't depend have any dependencies anymore.
> At least I always thought that the colocation notation with parenthesis in the crm is a mapping to the xml configuration explained on the above page.

Example:

colocation one-set inf: A B C D

... this is _one_ resource set with A being the important resource .. B
follows A, C follows B ... like a group

colocation two-sets inf: A B ( C D )

... creates _two_ resource sets. Colocation between sets is like simple
colocation constraints:

set(A B) follows set(C D) ... B follows A _but_ C and D are not
colocated. In case e.g. A or AB are not started, C and D can run on
different nodes.

Example how they can be independant to each other but have to start on
the same node:

colocation tree-sets inf: A B (C D) E

... where E can also be a Dummy resource ... now C and D need to follow
E but they don't depend on each other .. so if one can not start on any
node it does not affect the other, but of course the dependent set(A B).

> 
>>> Ok, so I will take away the parenthesis, but just for further understanding:
>>> If the parenthesis are around the drbd resources, it means they don't
>>> need to be collocated with each other. But don't they still need to be
>>> colocated with the lv-drbd_data resource? And since both of them need
>>> to be collocated with this resource they implicitly also need to be
>>> collocated with each other? Or pacemaker doesn't resolve this kind of
>>> implicit dependencies? (or another misunderstanding from my site?)
>>>
>>
>> It is the other way round all resources depend on the drbd:Master resources
>> ... the order within colocation resource sets is like in groups ... the order
>> between sets is like simple colocation constraints, not like in groups ....
> 
> Ok. If I understand you correctly, then lv-drbd_data resource needs to  be collocated with the drbd:master resources. 
> But if the sequential attribute is set to false, then does it need to be collocated with both drbd:master resources or is one sufficient?
> Because if it needs to be collocated with both, the only possible solution  to this dependency for pacemaker should be to also collocate the both drbd:master resources.
> Sorry if I'm making a mess here but I'm just trying to understand :-).

colocation xen_on_drbd inf: ( xen-windows_server_sb
xen-windows_server_standard ) fs-xendata lv-drbd_data
ms-drbd_r0:Master ms-drbd_r1:Master

is fine because: xen follows fs follows lv follows drbd-r0 follows drbd-r1

In fact I don't think this sequential=false for colocation makes any
sense here as the resources must run together anyway ... ok ... for the
xen VMs to allow them to be independant to each other it is fine.

Makes much more sense for ordering as it allows parallel start and stop
of resources.

> 
>>> Yes, these two drbd devices act as two PV for one VG. We didn't had
>> enough HDD space left, so we had to add more...
>>> At least the problem with ending up on different nodes pacemaker should
>> take care of (if properly configured, ähm), but if they get disconnected
>> individually we'd had quite a big mess and I'm not even sure if the lvm driver
>> would notice that. Thanks a lot, I didn't think about that. I Will see how I get
>> drbd 8.4 running in debian...
>>
>> Why not extending the lower level device of the DRBD resource followed by
>> a resize operation of DRBD?
> 
> You mean by creating a lvm-drbd-lvm configuration (such to speak drbd on top of a LV, and the drbd itself as a PV for an other VG).
> I would need that, since we are having several LVs. Or is there an obious solution I just don't see? If somehow possible I want to avoid to create a drbd device for each LV.
> 

At least use a LV as lower level device for DRBD to make it online
resizeable. Then you can etxtend the VG by more PVs and resize the LVs
... I like the one LV per DRBD per VM setup because it adds a lot of
flexibility when balancing the resources.


> Thanks for your patiency :-)

No problem, you are welcome!

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> ronney
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 286 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111123/d3fa4757/attachment-0003.sig>


More information about the Pacemaker mailing list