[Pacemaker] Call cib_query failed (-41): Remote node did not respond

Wed Jul 4 02:12:26 EDT 2012

On Wed, Jul 4, 2012 at 10:06 AM, Brian J. Murrell <brian at interlinx.bc.ca> wrote:
> On 12-07-03 04:26 PM, David Vossel wrote:
>>
>> This is not a definite.  Perhaps you are experiencing this given the pacemaker version you are running
>
> Yes, that is absolutely possible and it certainly has been under
> consideration throughout this process.  I did also recognize however,
> that I am running the latest stable (1.1.6) release and while I might be
> able to experiment with with a development branch in the lab, I could
> not use it in production.  So while it would be an interesting
> experiment, my primary goal had to be getting 1.1.6 to run stably.
>
>> and the torture test you are running with all those parallel commands,
>
> It is worth keeping in mind that all of those parallel commands are just
> as parallel with the 4 node cluster as they are with the 8 (4 nodes
> actively modifying the CIB + 4 completely idle nodes) and 16 node
> clusters -- both of which failed.
>
> Just because I reduced the number of nodes doesn't mean that I reduced
> the parallelism any.

Yes. You did.  You reduced the number of "check what state the
resource is on every node" probes.

> The commands being run on each node are not
> serialized and are all launched in parallel on the 4 node cluster as
> much as they were with the 16 node cluster.
>
> So strictly speaking, it doesn't seem that parallelism in the CIB
> modifications are as much of a factor as simply the number of nodes in
> the cluster, even when some (i.e. in the 8 node test I did) of the nodes
> are entirely passive and not modifying the CIB at all.

Now I'm getting annoyed.
I keep explaining this is not true yet you keep repeating the above assertion.

Please go back an re-read my previous answers (both here and
off-list). Properly.  I will be happy to clarify anything that is
still unclear.

>
>> but I wouldn't go as far as to say pacemaker cannot scale to more than a handful of nodes.
>
> I'd totally welcome being shown the error of my ways.
>
>> I'm sure you know this, I just wanted to be explicit about this so there is no confusion caused by people who may use your example as a concrete metric.
>
> But of course.  In my experiments, it was clear that the cib process
> could peak a single core on my 12 core Xeons with just 4 nodes in the
> cluster at times.
>
> Therefore it is also clear that some time down the road, assuming CPU is
> the limiting factor here, it's quite easy to see how a faster CPU core,
> or multithreading the cib would allow for better scaling, but my point
> was simply at the current time, and again, assuming (since I don't know
> for sure what the limiting factor really is) CPU is the limiting factor
> here, somewhere between 4-8 nodes is the limit with more or less default
> tunings.
>
>> From the deployments I've seen on the mailing list and bug reports, the most common clusters appear to be around the 2-6 node mark.
>
> Which seems consistent.
>
>> The messaging involved with keeping the all the local resource operations in the CIB synced across that many nodes is pretty insane.
>
> Indeed, and I most certainly had considered that.  What really threw a
> curve in that train of thought for me though was that even idle,
> non-CIB-modifying nodes (i.e. turning a working 4 node cluster into a
> non-working 8 node cluster by adding 4 nodes that do nothing with the
> CIB) can tip a working configuration over into non-working.
>
> I could most certainly see how the contention of 8 nodes all trying to
> jam stuff into the CIB might be taxing with all of the locking that
> needs to go on, etc, but for those 4 added idle nodes to add enough
> complexity to make an working 4 node cluster not work is puzzling.
> Puzzling enough (granted, to somebody who knows zilch about the
> messaging that goes on with CIB operations) to make is smell more like a
> bug than simple contention.
>
>> If you are set on using pacemaker,
>
> Well, I am not necessarily married to it.  It did just seem like the
> tool with the critical mass behind it.  As sketchy as it might seem to
> ask, (and I only am since you seem to be hinting that there might be a
> better tool for the job) is there a tool more suited to the job?
>
>> the best approach for scaling for your situation would probably be to try and figure out how to break nodes into smaller clusters that are easier to manage.
>
> Indeed, that is what I ended up doing.  Now my 16 node cluster is 4 4
> node clusters.  The problem with that though, is that when a node in a
> cluster fails, it has only 3 other nodes to spread it's resources around
> onto, and if 2 should fail, 2 nodes are trying to service twice their
> normal load.  The benefit of larger clusters is clear. in giving
> pacemaker more nodes to evenly distribute resources to, impacting the
> load of other the other nodes minimally when one or more nodes of the
> cluster do fail.
>
>> I have not heard of a single deployment as large as you are thinking of.
>
> Heh.  Not atypical of me to push the envelope I'm afraid.  :-/
>
> Cheers, and many thanks for your input.  It is valuable to this discussion.
>
> b.
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>