[Pacemaker] Call cib_query failed (-41): Remote node did not respond
David Vossel
dvossel at redhat.com
Tue Jul 3 22:26:44 CEST 2012
----- Original Message -----
> From: "Brian J. Murrell" <brian at interlinx.bc.ca>
> To: pacemaker at clusterlabs.org
> Sent: Tuesday, July 3, 2012 2:15:09 PM
> Subject: Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond
>
> On 12-06-27 11:30 PM, Andrew Beekhof wrote:
> >
> > The updates from you aren't the problem. Its the number of
> > resource
> > operations (that need to be stored in the CIB) that result from
> > your
> > changes that might be causing the problem.
>
> Just to follow this up for anyone currently following or anyone
> finding
> this thread in the future...
>
> It turns out that the problem is simply the size of the HA cluster
> that
> I want to create. The details are in the bug I filed at
> http://bugs.clusterlabs.org/show_bug.cgi?id=5076 but the short story
> is
> that I can add the number of resources and constrains I want to add
> (i.e. 32-34 of each, as previously described in this thread),
> concurrently even, so long as there is not more than 4 nodes per
> corosync/pacemaker cluster.
>
> Even adding 4 passive nodes (I only tried 8 total of 8 nodes, but not
> values between 4 and 8 so the tipping point might be somewhere in
> between 4 and 8) -- nodes that do no CIB operations of their own made
> pacemaker crumble.
>
>
> So the summary seems to be that pacemaker cannot scale to more than a
> handful of nodes, even when the nodes are big: 12 core Xeon nodes
> with
> gobs of memory.
This is not a definite. Perhaps you are experiencing this given the pacemaker version you are running and the torture test you are running with all those parallel commands, but I wouldn't go as far as to say pacemaker cannot scale to more than a handful of nodes. It completely depends on the situation. 16 nodes with 32 resources might work... 3 nodes with 100 resources might not. There is a limit to how far deployments can scale, but it is not easy to quantify values that hold any real truth across all deployments. I'm sure you know this, I just wanted to be explicit about this so there is no confusion caused by people who may use your example as a concrete metric.
>
> I can only guess that everybody is using pacemaker in "pair" (or not
> much bigger) type configurations currently. Is that accurate?
>
More information about the Pacemaker
mailing list