[Pacemaker] Help with N+1 configuration

Thu Jul 26 14:35:09 EDT 2012

On 07/26/2012 02:16 PM, Cal Heldenbrand wrote:
> That seems very handy -- and I don't need to specify 3 clones?   Once 
> my memcached OCF script reports a downed service, one of them will 
> automatically transition to the current failover node?

There are options for the clone on how many instances of the cloned 
resource to create, but they default to the number of nodes in the 
cluster. See: 
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch10s02s02.html

> Is there any reason you specified just a single memcache_clone, 
> instead of both the memcache primitive and memcached_clone?  I might 
> not be understanding exactly how a clone works.  Is it like... maybe a 
> "symbolic link" to a primitive, with the ability to specify different 
> metadata and parameters?

Once you make a clone, the underlying primitive isn't referenced 
anywhere else (that I can think of). If you want to stop memcache, you 
don't stop the primitive; you add a location constraint forbidding the 
clone from running on the node where you want to stop memcache ("crm 
resource migrate" is easiest). I can't find the relevant documentation, 
but this is just how they work. The same is true for groups -- the 
member primitives are never referenced except by the group. I believe in 
most cases if you try to reference the primitive, you will get an error.

> Despite the advertisement of consistent hashing with memcache clients, 
> I've found that they still have long timeouts waiting on connecting to 
> an IP.  So, keeping the clustered IPs up at all times is more 
> important than having a seasoned cache behind them.

I don't know a whole lot about memcache, but it sounds like you might 
even want to reduce the colocation score for the ips on memcache to be a 
large number, but not infinity. This way in the case that memcache is 
broken everywhere, the ips are still permitted to run. This might also 
cover you in the case that a bug in your resource agent thinks memcache 
has failed everywhere, but actually it's still running fine. The 
decision depends which failure the memcache clients handle better: the 
IP being down, or the IP being up but not having a working memcache 
server behind it.