[Pacemaker] Pacemaker 1.1.12 cib testing, crm_mon doesn't work

Johan Huysmans johan.huysmans at inuits.be
Fri Jun 13 10:25:36 CEST 2014


Hi,

I performed some extra testing.

I cleared my complete cib (cibadmin -E) and my crm_mon showed again some 
information.

I gradually started adding resources and monitored the cib.xml size 
(cibadmin -Ql > cib.xml; ll -h cib.xml).
This grew to about 430K. When adding another bunch of resources the 
crm_mon didn't respond anymore.

I checked the files in /dev/shm:
-rw------- 1 hacluster root 516K jun 13 08:14 
qb-cib_ro-event-58122-10738-13-data
-rw------- 1 hacluster root 8,1K jun 13 08:14 
qb-cib_ro-event-58122-10738-13-header
-rw------- 1 hacluster root 516K jun 13 08:14 
qb-cib_ro-request-58122-10738-13-data
-rw------- 1 hacluster root 8,1K jun 13 08:14 
qb-cib_ro-request-58122-10738-13-header
-rw------- 1 hacluster root 516K jun 13 08:14 
qb-cib_ro-response-58122-10738-13-data
-rw------- 1 hacluster root 8,1K jun 13 08:14 
qb-cib_ro-response-58122-10738-13-header
-rw------- 1 hacluster root  96M jun 13 07:58 
qb-cib_rw-event-58122-58130-12-data
-rw------- 1 hacluster root 8,1K jun 13 07:58 
qb-cib_rw-event-58122-58130-12-header
-rw------- 1 hacluster root  96M jun 13 08:13 
qb-cib_rw-request-58122-58130-12-data
-rw------- 1 hacluster root 8,1K jun 13 07:58 
qb-cib_rw-request-58122-58130-12-header
-rw------- 1 hacluster root  96M jun 13 07:58 
qb-cib_rw-response-58122-58130-12-data
-rw------- 1 hacluster root 8,1K jun 13 07:58 
qb-cib_rw-response-58122-58130-12-header

Apparantly the cib_rw*data files are 96M size, but the cib_ro files are 
only 516K.
If this file must store the complete cib it is too small to store our 
complete cib,
which could explain why our crm_mon isn't working but the write actions 
are giving no problems.

gr.
Johan

On 13-06-14 09:35, Johan Huysmans wrote:
> Hi,
>
> The PCMK_ipc_buffer was already set to 10000000 (10M).
>
> For testing I increased to buffer to 10000000 (100M), without results.
>
> I decreased the buffer to the default 20480 (20K) as it then shows the 
> suggested value.
> If I leave it running for a couple of minutes I got these suggested 
> values:
> Jun 13 06:33:53 [62206] SRV-5-1        cib:    error: crm_ipc_prepare: 
>     Could not compress the message into less than the configured ipc 
> limit (20480 bytes).Set PCMK_ipc_buffer to a higher value (3854984 
> bytes suggested)
> Jun 13 06:33:54 [62206] SRV-5-1        cib:    error: crm_ipc_prepare: 
>     Could not compress the message into less than the configured ipc 
> limit (20480 bytes).Set PCMK_ipc_buffer to a higher value (7709968 
> bytes suggested)
> Jun 13 06:33:58 [62206] SRV-5-1        cib:    error: crm_ipc_prepare: 
>     Could not compress the message into less than the configured ipc 
> limit (20480 bytes).Set PCMK_ipc_buffer to a higher value (15419936 
> bytes suggested)
> Jun 13 06:35:55 [62206] SRV-5-1        cib:    error: crm_ipc_prepare: 
>     Could not compress the message into less than the configured ipc 
> limit (20480 bytes).Set PCMK_ipc_buffer to a higher value (30839872 
> bytes suggested)
> Jun 13 06:37:56 [62206] SRV-5-1        cib:    error: crm_ipc_prepare: 
>     Could not compress the message into less than the configured ipc 
> limit (20480 bytes).Set PCMK_ipc_buffer to a higher value (61679744 
> bytes suggested)
> Jun 13 07:16:47 [44112] SRV-5-1        cib:    error: crm_ipc_prepare: 
>     Could not compress the message into less than the configured ipc 
> limit (20480 bytes).Set PCMK_ipc_buffer to a higher value (3952852 
> bytes suggested)
> Jun 13 07:16:47 [44112] SRV-5-1        cib:    error: crm_ipc_prepare: 
>     Could not compress the message into less than the configured ipc 
> limit (20480 bytes).Set PCMK_ipc_buffer to a higher value (7905704 
> bytes suggested)
> Jun 13 07:18:03 [44112] SRV-5-1        cib:    error: crm_ipc_prepare: 
>     Could not compress the message into less than the configured ipc 
> limit (20480 bytes).Set PCMK_ipc_buffer to a higher value (15811408 
> bytes suggested)
> Jun 13 07:18:43 [44112] SRV-5-1        cib:    error: crm_ipc_prepare: 
>     Could not compress the message into less than the configured ipc 
> limit (20480 bytes).Set PCMK_ipc_buffer to a higher value (31622816 
> bytes suggested)
> Jun 13 07:18:48 [44112] SRV-5-1        cib:    error: crm_ipc_prepare: 
>     Could not compress the message into less than the configured ipc 
> limit (20480 bytes).Set PCMK_ipc_buffer to a higher value (63245632 
> bytes suggested)
> Jun 13 07:20:49 [44112] SRV-5-1        cib:    error: crm_ipc_prepare: 
>     Could not compress the message into less than the configured ipc 
> limit (20480 bytes).Set PCMK_ipc_buffer to a higher value (126491264 
> bytes suggested)
> Jun 13 07:22:50 [44112] SRV-5-1        cib:    error: crm_ipc_prepare: 
>     Could not compress the message into less than the configured ipc 
> limit (20480 bytes).Set PCMK_ipc_buffer to a higher value (252982528 
> bytes suggested)
> Jun 13 07:22:52 [44112] SRV-5-1        cib:    error: crm_ipc_prepare: 
>     Could not compress the message into less than the configured ipc 
> limit (20480 bytes).Set PCMK_ipc_buffer to a higher value (505965056 
> bytes suggested)
> Jun 13 07:23:57 [44112] SRV-5-1        cib:    error: crm_ipc_prepare: 
>     Could not compress the message into less than the configured ipc 
> limit (20480 bytes).Set PCMK_ipc_buffer to a higher value (1011930112 
> bytes suggested)
> Jun 13 07:24:51 [44112] SRV-5-1        cib:    error: crm_ipc_prepare: 
>     Could not compress the message into less than the configured ipc 
> limit (20480 bytes).Set PCMK_ipc_buffer to a higher value (2023860224 
> bytes suggested)
> Jun 13 07:26:52 [44112] SRV-5-1        cib:    error: crm_ipc_prepare: 
>     Could not compress the message into less than the configured ipc 
> limit (20480 bytes).Set PCMK_ipc_buffer to a higher value (-247246848 
> bytes suggested)
> Jun 13 07:27:22 [44112] SRV-5-1        cib:    error: crm_ipc_prepare: 
>     Could not compress the message into less than the configured ipc 
> limit (20480 bytes).Set PCMK_ipc_buffer to a higher value (-494493696 
> bytes suggested)
> Jun 13 07:29:22 [44112] SRV-5-1        cib:    error: crm_ipc_prepare: 
>     Could not compress the message into less than the configured ipc 
> limit (20480 bytes).Set PCMK_ipc_buffer to a higher value (-988987392 
> bytes suggested)
>
> There is definitely something wrong. Is it the printing of the 
> suggested value or is it something else ?
>
> If I check the cib.xml files in /var/lib/pacemaker/cib/ all files are 
> a bit smaller then 300K.
>
> Changing these buffers did not solve my problem not getting results 
> from crm_mon.
>
> Gr.
> Johan
>
>
> On 13-06-14 01:13, Andrew Beekhof wrote:
>> On 12 Jun 2014, at 10:53 pm, Johan Huysmans<johan.huysmans at inuits.be>  wrote:
>>
>>> Hi All,
>>>
>>> I deployed Pacemaker 1.1.12-rc2 on our platform to test the cib changes.
>>> This was needed on our setup as it contains 6 nodes, 150 resources and the cib process was using lots of cpu.
>>>
>>> With a limited set of resources (6 nodes, 30 resources) everything worked as expected, including crm_mon.
>>> When loading the complete set of resources we lost the crm_mon functionality on all nodes.
>>> The cluster is running as expected (running all resources) however we don't have any visibility.
>>>
>>> I noticed that operations performing changes did actually work like (crm resource stop <resourcename>),
>>> but crm resource status didn't work (using crmsh-2.0+git46-1.1.x86_64).
>>>
>>> I noticed that /dev/shm/qb-cib_ro* files are created, and lsof shows that they are both opened by crm_mon and cib.
>>>
>>>
>>> When executing "crm_mon -1" I get following messages in /var/log/messages (and /var/log/pacemaker.log)
>>> Jun 12 12:47:38 [8062] SRV-5-1        cib:   notice: crm_ipcs_sendv:     Response 2 to 0x1810370[17836] (1091618 bytes) failed: Resource temporarily unavailable (-11)
>>> Jun 12 12:47:38 [8062] SRV-5-1        cib:  warning: do_local_notify:     Sync reply to crm_mon failed: No message of desired type
>>>
>>>
>>> Restarting the pacemaker and cman service of 1 node didn't solve it.
>>>
>>>
>>> What is causing this problem and how can I resolve it ?
>> Almost certainly you're hitting IPC limits associated with large clusters.
>>
>> You should be able to tune:
>>
>> # PCMK_ipc_buffer=20480
>>
>> In /etc/sysconfig/pacemaker and then restart the cluster.
>>
>> Note also:
>>
>> # For non-systemd based systems, prefix 'export' to each enabled line
>>
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list:Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home:http://www.clusterlabs.org
>> Getting started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140613/c0031f29/attachment.html>


More information about the Pacemaker mailing list