[Pacemaker] Multi-site support in pacemaker (tokens, deadman, CTR)

Thu Apr 28 19:33:00 UTC 2011

Hi Lars,
Thanks for the explanation.

On 04/28/11 02:55, Lars Marowsky-Bree wrote:
> On 2011-04-26T23:34:16, Yan Gao <ygao at novell.com> wrote:
> 
> Perhaps chosing the name "token" for the cluster-wide attributes was not
> a wise move, as it does invoke the "token" association from
> corosync/totem.
> 
> What do you all think about switching this word to "ticket"? And have
> the Cluster Ticket Registry manage them? Less confusion later on, I
> think.
> 
> I'll try the word "ticket" for the rest of the mail and we can see how
> that works out ;-)
> 
> (I think the word works - you can own a ticket, grant a ticket, cancel,
> and revoke tickets ...)
Sounds fine to me:-)

>>> "Tokens" are, essentially, cluster-wide attributes (similar to node
>>> attributes, just for the whole partition).
>> Specifically, a "<tokens>" section with an attribute set (
>> "<token_set>" or something) under "/cib/configuration"?
> 
> Yes; a ticket section, just like that.
All right. How about the schema:
    <element name="configuration">
      <interleave>
...
        <element name="tickets">
          <zeroOrMore>
            <element name="ticket_set">
              <externalRef href="nvset.rng"/>
            </element>
          </zeroOrMore>
        </element>
...

>> - A completely new type of constraint:
>>   <rsc_token id="rscX-with-tokenA" rsc="rscX" token="tokenA"
>> kind="Deadman"/>
> 
> Personally, I lean towards this. (Andrew has expressed a wish to do
> without the "rsc_" prefix, so lets drop this ;-)
Well then, how about "ticket_dep" or "ticket_req"?

> 
> Not sure the kind="Deadman" is actually required, but it probably makes
> sense to be able to switch off the big hammer for debugging purposes.
> ;-)
I was thinking it's for switching on/off "immediately fence once the
dependency is no longer satisfied".

> 
> I don't see why any resource would depend on several tickets; but I can
> see a use case for wanting to depend on _not_ owning a ticket, similar
> to the node attributes. And the resource would need a role, obviously.
OK. The schema I can imagine:

  <define name="element-ticket_dep">
    <element name="ticket_dep">
      <attribute name="id"><data type="ID"/></attribute>
      <choice>
        <oneOrMore>
          <ref name="element-resource-set"/>
        </oneOrMore>
        <group>
          <attribute name="rsc"><data type="IDREF"/></attribute>
          <optional>
            <attribute name="rsc-role">
              <ref name="attribute-roles"/>
            </attribute>
          </optional>
        </group>
      </choice>
      <attribute name="ticket"><text/></attribute>
    </element>
  </define>

> 
> Andrew, Yan - do you think we should allow _values_ for tickets, or
> should they be strictly defined/undefined/set/unset?
I think allowing values should be helpful to distinguish different demands.

>> If so, isn't it supposed to be revoked manually by default? So the
>> short-circuited fail-over needs an admin to participate?
> 
> No to both; it can be revoked manually, yes, but it isn't going to be
> always the case. I'm also not quite sure I understand where this
> question is headed; how does it matter here whether the ticket is
> revoked manually or not?
I was just thinking -- before we have the CTR, we rely on the admin
quite much.

> 
>> Does it means an option for users to choose if they want an
>> immediate fencing or stopping the resources normally? Is it global
>> or particularly for a specific token , or even/just for a specific
>> dependency?
> 
> Good question. This came up above already briefly ...
> 
> I _think_ there should be a special value that a ticket can be set to
> that doesn't fence, but stops everything cleanly.
> 
> However, while the ticket is in this state, the site _still_ owns it (no
> other site can get it yet, and were it to lose the ticket due to
> expiration, it'd still need to fence all remaining nodes so that the
> services can be started elsewhere). 
> 
> Perhaps the CTR doesn't even need to know about this - it's a special
> setting of the ticket at a given site. Perhaps it makes sense to
> distinguish between owning the ticket (as granted on request via the CTR
> or manually), and its value (which is set locally)? perhaps:
> 
> Ownership is a true/false flag. Value is a positive integer (including
> 0).
> 
> A site that "owns" a ticket of value 0 will stop resources cleanly, and
> afterwards relinquish the ticket itself.
> 
> A site that "owns" a ticket of any value and loses it will perform the
> deadman dance.
> 
> A site that does not own a ticket but has a non-zero value for it
> defined will request the ticket from the CTR; the CTR will grant it to
> the site with the highest bid (but not to a site with 0)
The site with the highest "bid" is being revoked the ticket. Should it
clear the "bid" also? Otherwise it will get the ticket again soon after?

> (if these are
> equal, to the site with the highest node count, if these again are
> equal, to the site with the lowest nodeid).

> 
> (Tangent - ownership appears to belong to the status section; the value
> seems belongs to the cib->ticket section(?).)
Perhaps. Although there's no appropriate place to set a cluster-wide
attribute in the status section so far.

Other solutions are:
A "ticket" is not a nvpair. It is

- An object with "ownership" and "bid" attributes.
Or:
- A nvpair-set which includes the "ownership" and "bid" nvpairs.

> 
> The value can be set manually - in that case, it allows the admin to
> define a primary site for a given set of resources. (It might also be
> modified automatically at a later stage based on whatever metric.)
> 
> If a site owns a ticket, but doesn't have the highest value, it would
> either fail-back automatically - or require manual intervention, 
OK, it seems to have answered my previous question. It should be
configurable from CTR server side.

> which
> I'd assume to be quite common. (Again, this builds a very simplistic
> active/passive overlay.)
> 
> Does that make sense, or am I creating more confusion than answers? ;-)
Definitely makes a lot of sense:-)

Regards,
  Yan
-- 
Gao,Yan <ygao at novell.com>
Software Engineer
China Server Team, OPS Engineering, Novell, Inc.