[Pacemaker] High CIB load on DC election
Cédric Dufour - Idiap Research Institute
cedric.dufour at idiap.ch
Mon Sep 22 13:22:38 UTC 2014
Hello again,
My PM 1.1.12 cluster is quite large: 22 nodes, ~300 resources.
When gracefully shutting down the current DC (iow. move resources elsewhere, node standby, pacemaker stop, corosync stop) the CIB load increases - on the slowest nodes to close to 100% - until the new DC gets elected.
What explains this phenomenom ?
(What could I do to limit/circumvent it ?)
In parallel, when this happens and on those nodes that display the "throttle_mode: High CIB load detected" message, my "ping" (network connectivity) RA times out without obvious explanation (the RA timeout is conservative enough, compared to the ping timeout/attempts, so that it should never kick in). Looking at the code of the ".../resource.d/pacemaker/ping", I suspect - though I may be wrong - the culprit is "attrd_updater".
Hypothesis: "attrd_updater" doesn't return immediately, as it is supposed to do, because of the high CIB load.
Does this hypothesis make sense ?
(PS: it is very difficult for me to reproduce/debug this issue, showing up on my production cluster, without risking to wreak havoc with my services)
Thank you very much for your response(s)
Best,
Cédric
--
Cédric Dufour @ Idiap Research Institute
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140922/9bf4654b/attachment-0003.html>
More information about the Pacemaker
mailing list