[Pacemaker] high cib load on config change
Andreas Kurz
andreas at hastexo.com
Tue Oct 9 15:34:17 CEST 2012
On 10/09/2012 01:42 PM, James Harper wrote:
> As per previous post, I'm seeing very high cib load whenever I make a configuration change, enough load that things timeout seemingly instantly. I thought this was happening well before the configured timeout but now I'm not so sure, maybe the timeouts are actually working okay and it just seems instant. If the timeouts are in fact working correctly then it's keeping the CPU at 100% for over 30 seconds to the exclusion of any monitoring checks (or maybe locking the cib so the checks can't run?)
>
> When I make a change I see the likes of this sort of thing in the logs (see data below email), which I thought might be solved by this https://github.com/ClusterLabs/pacemaker/commit/10e9e579ab032bde3938d7f3e13c414e297ba3e9 but i just checked the 1.1.7 source that the Debian packages are built from and it turns out that that patch already exists in 1.1.7.
>
> Are the messages below actually an indication of a problem? If I understand it correctly it's failing to apply the configuration diff and is instead forcing a full resync of the configuration across some or all nodes, which is causing the high load.
>
> I ran the crm_report but it includes a lot of information I really need to remove so I'm reluctant to submit it in full unless it really all is required to resolve the problem.
>
You already did some tuning like increasing batch-limit in your cluster
properties and increased corosync timings? Hard to say more without
getting more information ... if your configuration details are too
sensitive to post on a public mailing-list you can of course hire
someone and give that information under NDA ....
Regards,
Andreas
--
Need help with Pacemaker?
http://www.hastexo.com/now
> Thanks
>
> James
>
> Oct 9 21:35:30 bitvs2 cib: [6185]: info: apply_xml_diff: Digest mis-match: expected e7f7aaa1eb10c7a633e94da57dfda2ac, calculated 445109490690d53e024c333fac6ab4c9
> Oct 9 21:35:30 bitvs2 cib: [6185]: notice: cib_process_diff: Diff 0.1354.85 -> 0.1354.86 not applied to 0.1354.85: Failed application of an update diff
> Oct 9 21:35:30 bitvs2 cib: [6185]: info: cib_server_process_diff: Requesting re-sync from peer
> Oct 9 21:35:30 bitvs2 cib: [6185]: notice: cib_server_process_diff: Not applying diff 0.1354.85 -> 0.1354.86 (sync in progress)
> Oct 9 21:35:30 bitvs2 cib: [6185]: notice: cib_server_process_diff: Not applying diff 0.1354.86 -> 0.1354.87 (sync in progress)
> Oct 9 21:35:30 bitvs2 cib: [6185]: notice: cib_server_process_diff: Not applying diff 0.1354.86 -> 0.1354.87 (sync in progress)
> Oct 9 21:35:30 bitvs2 cib: [6185]: notice: cib_server_process_diff: Not applying diff 0.1354.86 -> 0.1354.87 (sync in progress)
> Oct 9 21:35:30 bitvs2 cib: [6185]: notice: cib_server_process_diff: Not applying diff 0.1354.87 -> 0.1355.1 (sync in progress)
> Oct 9 21:35:30 bitvs2 cib: [6185]: info: cib_process_diff: Diff 0.1355.1 -> 0.1355.2 not applied to 0.1354.85: current "epoch" is less than required
> Oct 9 21:35:30 bitvs2 cib: [6185]: info: cib_server_process_diff: Requesting re-sync from peer
> Oct 9 21:35:33 bitvs2 cib: [6185]: info: apply_xml_diff: Digest mis-match: expected b77fae3dc1e835e0d6a3d1a305d262cb, calculated 120fcac6996ff9f5148f69712fc54689
> Oct 9 21:35:33 bitvs2 cib: [6185]: notice: cib_process_diff: Diff 0.1357.7 -> 0.1357.8 not applied to 0.1357.7: Failed application of an update diff
> Oct 9 21:35:33 bitvs2 cib: [6185]: info: cib_server_process_diff: Requesting re-sync from peer
> Oct 9 21:35:33 bitvs2 cib: [6185]: notice: cib_server_process_diff: Not applying diff 0.1357.7 -> 0.1357.8 (sync in progress)
> Oct 9 21:35:33 bitvs2 cib: [6185]: notice: cib_server_process_diff: Not applying diff 0.1357.8 -> 0.1358.1 (sync in progress)
> Oct 9 21:35:33 bitvs2 cib: [6185]: notice: cib_server_process_diff: Not applying diff 0.1358.1 -> 0.1358.2 (sync in progress)
> Oct 9 21:35:33 bitvs2 cib: [6185]: notice: cib_server_process_diff: Not applying diff 0.1358.2 -> 0.1358.3 (sync in progress)
> Oct 9 21:35:33 bitvs2 cib: [6185]: notice: cib_server_process_diff: Not applying diff 0.1358.3 -> 0.1359.1 (sync in progress)
> Oct 9 21:35:33 bitvs2 cib: [6185]: info: cib_process_diff: Diff 0.1359.1 -> 0.1359.2 not applied to 0.1357.7: current "epoch" is less than required
> Oct 9 21:35:33 bitvs2 cib: [6185]: info: cib_server_process_diff: Requesting re-sync from peer
> Oct 9 21:35:33 bitvs2 cib: [6185]: notice: cib_server_process_diff: Not applying diff 0.1359.2 -> 0.1359.3 (sync in progress)
> Oct 9 21:35:33 bitvs2 cib: [6185]: notice: cib_server_process_diff: Not applying diff 0.1359.3 -> 0.1359.4 (sync in progress)
> Oct 9 21:35:33 bitvs2 cib: [6185]: notice: cib_server_process_diff: Not applying diff 0.1359.4 -> 0.1359.5 (sync in progress)
> Oct 9 21:35:33 bitvs2 cib: [6185]: notice: cib_server_process_diff: Not applying diff 0.1359.5 -> 0.1359.6 (sync in progress)
> Oct 9 21:35:33 bitvs2 cib: [6185]: notice: cib_server_process_diff: Not applying diff 0.1359.6 -> 0.1359.7 (sync in progress)
> Oct 9 21:35:33 bitvs2 cib: [6185]: info: cib_process_diff: Diff 0.1359.7 -> 0.1359.8 not applied to 0.1357.7: current "epoch" is less than required
> Oct 9 21:35:33 bitvs2 cib: [6185]: info: cib_server_process_diff: Requesting re-sync from peer
> Oct 9 21:35:33 bitvs2 cib: [6185]: notice: cib_server_process_diff: Not applying diff 0.1359.8 -> 0.1359.9 (sync in progress)
> Oct 9 21:35:33 bitvs2 cib: [6185]: notice: cib_server_process_diff: Not applying diff 0.1359.9 -> 0.1359.10 (sync in progress)
> Oct 9 21:35:33 bitvs2 cib: [6185]: notice: cib_server_process_diff: Not applying diff 0.1359.10 -> 0.1359.11 (sync in progress)
> Oct 9 21:35:33 bitvs2 cib: [6185]: notice: cib_server_process_diff: Not applying diff 0.1359.11 -> 0.1359.12 (sync in progress)
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20121009/934fd031/attachment.sig>
More information about the Pacemaker
mailing list