[Pacemaker] pacemaker shutdown under high load
Andrew Beekhof
andrew at beekhof.net
Wed Oct 30 22:26:54 UTC 2013
On 17 Oct 2013, at 1:37 am, Alessandro Bono <alessandro.bono at gmail.com> wrote:
> On 16/10/2013 00:11, Andrew Beekhof wrote:
>> On 09/10/2013, at 10:53 PM, Alessandro Bono <alessandro.bono at gmail.com>
>> wrote:
>>
>>
>>> Hi
>>>
>>>
>>> this week end my pacemaker shutdown on primary node during machine backup
>>> attached compressed log of primary node, logs of secondary node is too big, if needed I can provide as external link
>>> inspecting logs I found these errors
>>>
>> looks like corosync went away from underneath pacemaker, hence "Corosync connection lost! Exiting."
> Is there a way to debug this problem?
Enable more logging in corosync? Look for a core file too.
The corosync list might have more practical advice.
> Nodes are regular centos 6.4 64bit machine with this corosync version
>
> corosync-1.4.1-15.el6_4.1.x86_64
> corosynclib-1.4.1-15.el6_4.1.x86_64
>
> Have I to package latest 1.4.x version and try it?
Wouldn't hurt.
> As a workaround I put in maintaince mode cluster prior to backup but it's not a solution
>
>>> Oct 05 22:26:46 [31338] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=ga2-ext/crmd/17, version=0.155.87)
>>> Oct 05 22:26:46 [31341] ga1-ext attrd: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
>>> Oct 05 22:26:46 [31343] ga1-ext crmd: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
>>> Oct 05 22:26:46 [31343] ga1-ext crmd: error: crmd_cs_destroy: connection terminated
>>> Oct 05 22:26:46 [31343] ga1-ext crmd: debug: qb_ipcs_unref: qb_ipcs_unref() - destroying
>>> Oct 05 22:26:47 [31343] ga1-ext crmd: info: qb_ipcs_us_withdraw: withdrawing server sockets
>>> Oct 05 22:26:47 [31343] ga1-ext crmd: debug: qb_ipcc_disconnect: qb_ipcc_disconnect()
>>> Oct 05 22:26:47 [31343] ga1-ext crmd: debug: qb_rb_close: Closing ringbuffer: /dev/shm/qb-attrd-request-31341-31343-9-header
>>> Oct 05 22:26:46 [31332] ga1-ext pacemakerd: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
>>> Oct 05 22:26:46 [31339] ga1-ext stonith-ng: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
>>> Oct 05 22:26:46 [31338] ga1-ext cib: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
>>> Oct 05 22:26:47 [31343] ga1-ext crmd: debug: qb_rb_close: Closing ringbuffer: /dev/shm/qb-attrd-response-31341-31343-9-header
>>> Oct 05 22:26:47 [31332] ga1-ext pacemakerd: error: mcp_cpg_destroy: Connection destroyed
>>> Oct 05 22:26:47 [31339] ga1-ext stonith-ng: error: stonith_peer_cs_destroy: Corosync connection terminated
>>> Oct 05 22:26:47 [31339] ga1-ext stonith-ng: info: stonith_shutdown: Terminating with 1 clients
>>> Oct 05 22:26:47 [31339] ga1-ext stonith-ng: debug: cib_native_signoff: Signing out of the CIB Service
>>> Oct 05 22:26:47 [31339] ga1-ext stonith-ng: debug: qb_ipcc_disconnect: qb_ipcc_disconnect()
>>> Oct 05 22:26:47 [31343] ga1-ext crmd: debug: qb_rb_close: Closing ringbuffer: /dev/shm/qb-attrd-event-31341-31343-9-header
>>> Oct 05 22:26:47 [31341] ga1-ext attrd: crit: attrd_cs_destroy: Lost connection to Corosync service!
>>> Oct 05 22:26:47 [31341] ga1-ext attrd: notice: main: Exiting...
>>> Oct 05 22:26:47 [31341] ga1-ext attrd: notice: main: Disconnecting client 0x1b03990, pid=31343...
>>> Oct 05 22:26:47 [31341] ga1-ext attrd: debug: qb_ipcs_disconnect: qb_ipcs_disconnect(31341-31343-9) state:2
>>> Oct 05 22:26:47 [31341] ga1-ext attrd: info: crm_client_destroy: Destroying 0 events
>>> Oct 05 22:26:47 [31338] ga1-ext cib: error: cib_cs_destroy: Corosync connection lost! Exiting.
>>>
>>> ps this is a resend to open a new thread, sorry for double mail
>>>
>>> --
>>> Cordiali Saluti
>>> Alessandro Bono
>>>
>>> <ga1-ext.corosync.log-20131006.gz>_______________________________________________
>>> Pacemaker mailing list:
>>> Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>>
>>> Project Home:
>>> http://www.clusterlabs.org
>>>
>>> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>
>>> Bugs:
>>> http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list:
>> Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>
>> Project Home:
>> http://www.clusterlabs.org
>>
>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>
>> Bugs:
>> http://bugs.clusterlabs.org
>
>
> --
> Cordiali Saluti
> Alessandro Bono
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list