[Pacemaker] Call cib_query failed (-41): Remote node did not respond

Tue Jul 3 22:26:44 CEST 2012

----- Original Message -----
> From: "Brian J. Murrell" <brian at interlinx.bc.ca>
> To: pacemaker at clusterlabs.org
> Sent: Tuesday, July 3, 2012 2:15:09 PM
> Subject: Re: [Pacemaker] Call cib_query failed (-41): Remote node did not	respond
> 
> On 12-06-27 11:30 PM, Andrew Beekhof wrote:
> > 
> > The updates from you aren't the problem.  Its the number of
> > resource
> > operations (that need to be stored in the CIB) that result from
> > your
> > changes that might be causing the problem.
> 
> Just to follow this up for anyone currently following or anyone
> finding
> this thread in the future...
> 
> It turns out that the problem is simply the size of the HA cluster
> that
> I want to create.  The details are in the bug I filed at
> http://bugs.clusterlabs.org/show_bug.cgi?id=5076 but the short story
> is
> that I can add the number of resources and constrains I want to add
> (i.e. 32-34 of each, as previously described in this thread),
> concurrently even, so long as there is not more than 4 nodes per
> corosync/pacemaker cluster.
> 
> Even adding 4 passive nodes (I only tried 8 total of 8 nodes, but not
> values between 4 and 8 so the tipping point might be somewhere in
> between 4 and 8) -- nodes that do no CIB operations of their own made
> pacemaker crumble.
>
> 
> So the summary seems to be that pacemaker cannot scale to more than a
> handful of nodes, even when the nodes are big: 12 core Xeon nodes
> with
> gobs of memory.

This is not a definite.  Perhaps you are experiencing this given the pacemaker version you are running and the torture test you are running with all those parallel commands, but I wouldn't go as far as to say pacemaker cannot scale to more than a handful of nodes.  It completely depends on the situation.  16 nodes with 32 resources might work... 3 nodes with 100 resources might not.  There is a limit to how far deployments can scale, but it is not easy to quantify values that hold any real truth across all deployments.  I'm sure you know this, I just wanted to be explicit about this so there is no confusion caused by people who may use your example as a concrete metric.

> 
> I can only guess that everybody is using pacemaker in "pair" (or not
> much bigger) type configurations currently.  Is that accurate?
>