[Pacemaker] can we update an attribute with cmpxchg "atomic compare and exchange" semantics?
Lars Ellenberg
lars.ellenberg at linbit.com
Mon Sep 29 20:22:06 UTC 2014
On Wed, Sep 10, 2014 at 11:50:58AM +0200, Lars Ellenberg wrote:
>
> Hi Andrew (and others).
>
> For a certain use case (yes, I'm talking about DRBD "peer-fencing" on
> loss of replication link), it would be nice to be able to say:
>
> update some_attribute=some_attribute+1 where some_attribute >= 0
>
> delete some_attribute where some_attribute=0
>
> Ok, that's not the classic cmpxchg(), more of an atomic_add();
> or similar enough. With hopefully just a single cib roundrip.
>
>
> Let me rephrase:
> Update attribute "this_is_pink" (for node-X with ID attr-ID):
>
> fail if said attr-ID exists elsewhere (not as the intended attribute
> at the intended place in the xml tree)
> (this comes for free already, I think)
>
> if it does not exist at all, assume it was present with current value 0
>
> if the current (or assumed current) value is >= 0, add 1
>
> if the current value is < 0, fail
>
> (optionally: return new value? old value?)
Did anyone read this?
> My intended use case scenario is this:
>
> Two DRBD nodes, several DRBD resources,
> at least a few of them in "dual-primary".
>
> Replication link breaks.
>
> Fence-peer handlers are triggered individually for each resource on
> both nodes, and try to concurrently modify the cib (place fencing
> constraints).
>
> With the current implementation of crm-fence-peer.sh, it is likely that
> some DRBD resources "win" on one node, some "win" on the other node.
> The respective losers will have their IO blocked.
>
> Which means that most likely on both nodes some DRBD will stay blocked,
> some monitor operation will soon fail, some stop operation (to recover
> from the monitor fail) will soon fail, and the recovery of that will be
> node-level fencing of the affected node.
>
> In short: both nodes will be hard-reset
> because of a replication link failure.
>
>
>
> If I would instead use a single attribute (with a pre-determined ID) for all
> instances of the fence-peer handler, the first to come would "chose" the
> victim node, all others would just add their count.
> There will be only one loser, and more importantly: one survivor.
>
> Once the replication link is re-established,
> DRBD resynchronization will bring the former loser up-to-date,
> and the respective after-resync handlers will decrease that "breakage
> count". Once the breakage count hits zero, it can and should be deleted.
>
> Presence of the "breakage count" attribute with value > 0 would mean
> "this node must not be promoted", which would be a static constraint
> to be added to all DRBD resources.
>
> Does that make sense?
>
> (I have more insane proposals, in case we have multiple (more than 2)
> Primaries during normal operation, but I'm not yet able to write them
> down without being seriously confused by myself...)
>
>
> I could open-code it with shell and cibadmin, btw.
> I did a proof-of-concept once that does
> a. cibadmin -Q
> b. some calculations,
> then prepares the update statement xml based on cib content seen,
> *including* the cib generation counters
> c. cibadmin -R (or -C, -M, -D, as appropriate)
> this will fail if the cib was modified in a relevant way since a,
> because of the included generation counters
> d. repeat as necessary
>
>
> But that is beyond ugly.
> And probably fragile.
> And would often fail for all the wrong reasons, just because some status
> code has changed and bumped the cib generation counters.
>
> What would be needed to add such functionality?
> Where would it go?
> cibadmin? cib? crm_attribute? possibly also attrd?
>
> Thanks,
> Lars
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
More information about the Pacemaker
mailing list