[Pacemaker] Exiting corosync-notifyd results in shutting downof pacemakerd
Grüninger, Andreas (LGL Extern)
Andreas.Grueninger at lgl.bwl.de
Thu Oct 4 07:57:00 UTC 2012
>> Is this an error or the desired result?
>Based on the logs, pacemaker thinks corosync died. Did that happen?
>If so there is not much pacemaker can do :-(
And that is absolutely ok when corosync dies.
Corosync does not die but is still healthy.
It is corosync-notifyd which is started additionally to corosync as a separate process and which is finished with kill as daemon or with ctrl-c as foreground process.
The job of corosync-notifyd is sending of SNMP traps.
This is the functionality of crm_mon -C .. -S ... for pacemaker.
So corosync-notifyd sends the wrong signal or pacemaker does a little bit too much.
Pacemaker should just ignore this ending connection.
Is there a chance in pacemaker or should should this better solved in corosync/corosync-notifyd?
Andreas
-----Ursprüngliche Nachricht-----
Von: Andrew Beekhof [mailto:andrew at beekhof.net]
Gesendet: Mittwoch, 3. Oktober 2012 01:09
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Exiting corosync-notifyd results in shutting downof pacemakerd
On Wed, Oct 3, 2012 at 2:51 AM, Grüninger, Andreas (LGL Extern) <Andreas.Grueninger at lgl.bwl.de> wrote:
> I am currently investigating the monitoring of corosync/pacemaker with snmp.
> crm_mon used with the OCF resource ClusterMon works as it should.
>
> But corosync-notifyd can't be used in our case.
> I start corosync-notifyd in the foreground as follows corosync-notifyd
> -f -l -s -m 10.50.235.1
>
> When I stop the running corosync-notifyd with CTRL-C, pacemaker shuts down with the following entries in the logfile.
> Is this an error or the desired result?
Based on the logs, pacemaker thinks corosync died. Did that happen?
If so there is not much pacemaker can do :-(
>
> ....
> Oct 02 18:42:19 [27126] pacemakerd: error: cfg_connection_destroy: Connection destroyed
> Oct 02 18:42:19 [27126] pacemakerd: notice: pcmk_shutdown_worker: Shuting down Pacemaker
> Oct 02 18:42:19 [27126] pacemakerd: notice: stop_child: Stopping crmd: Sent -15 to process 27177
> Oct 02 18:42:19 [27126] pacemakerd: error: cpg_connection_destroy: Connection destroyed
> Oct 02 18:42:19 [27177] crmd: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> Oct 02 18:42:19 [27177] crmd: notice: crm_shutdown: Requesting shutdown, upper limit is 1200000ms
> Oct 02 18:42:19 [27128] stonith-ng: error: pcmk_cpg_dispatch: Connection to the CPG API failed: 2
> Oct 02 18:42:19 [27177] crmd: info: do_shutdown_req: Sending shutdown request to zd-sol-s1-v61
> Oct 02 18:42:19 [27128] stonith-ng: error: stonith_peer_ais_destroy: AIS connection terminated
> Oct 02 18:42:19 [27128] stonith-ng: info: stonith_shutdown: Terminating with 1 clients
> Oct 02 18:42:19 [27130] attrd: error: pcmk_cpg_dispatch: Connection to the CPG API failed: 2
> Oct 02 18:42:19 [27130] attrd: crit: attrd_ais_destroy: Lost connection to Corosync service!
> Oct 02 18:42:19 [27130] attrd: notice: main: Exiting...
> Oct 02 18:42:19 [27130] attrd: notice: main: Disconnecting client 81ffc38, pid=27177...
> Oct 02 18:42:19 [27128] stonith-ng: info: qb_ipcs_us_withdraw: withdrawing server sockets
> Oct 02 18:42:19 [27128] stonith-ng: info: crm_xml_cleanup: Cleaning up memory from libxml2
> Oct 02 18:42:19 [27130] attrd: error: attrd_cib_connection_destroy: Connection to the CIB terminated...
> Oct 02 18:42:19 [27127] cib: error: pcmk_cpg_dispatch: Connection to the CPG API failed: 2
> Oct 02 18:42:19 [27127] cib: error: cib_ais_destroy: Corosync connection lost! Exiting.
> Oct 02 18:42:19 [27129] lrmd: info: lrmd_ipc_destroy: LRMD client disconnecting 807e768 - name: crmd id: 1d659f61-d6e2-4ef3-f674-b9a8ba8029e8
> Oct 02 18:42:19 [27127] cib: info: terminate_cib: cib_ais_destroy: Exiting fast...
> Oct 02 18:42:19 [27127] cib: info: qb_ipcs_us_withdraw: withdrawing server sockets
> Oct 02 18:42:19 [27127] cib: info: qb_ipcs_us_withdraw: withdrawing server sockets
> Oct 02 18:42:19 [27127] cib: info: qb_ipcs_us_withdraw: withdrawing server sockets
> Oct 02 18:42:19 [27126] pacemakerd: error: pcmk_child_exit: Child process attrd exited (pid=27130, rc=1)
> Oct 02 18:42:19 [27126] pacemakerd: error: send_cpg_message: Sending message via cpg FAILED: (rc=9) Bad handle
> Oct 02 18:42:19 [27126] pacemakerd: error: pcmk_child_exit: Child process cib exited (pid=27127, rc=64)
> Oct 02 18:42:19 [27126] pacemakerd: error: send_cpg_message: Sending message via cpg FAILED: (rc=9) Bad handle
> Oct 02 18:42:19 [27126] pacemakerd: notice: pcmk_child_exit: Child process crmd terminated with signal 13 (pid=27177, core=0)
> Oct 02 18:42:19 [27126] pacemakerd: error: send_cpg_message: Sending message via cpg FAILED: (rc=9) Bad handle
> Oct 02 18:42:19 [27126] pacemakerd: notice: stop_child: Stopping pengine: Sent -15 to process 27131
> Oct 02 18:42:19 [27126] pacemakerd: info: pcmk_child_exit: Child process pengine exited (pid=27131, rc=0)
> Oct 02 18:42:19 [27126] pacemakerd: error: send_cpg_message: Sending message via cpg FAILED: (rc=9) Bad handle
> Oct 02 18:42:19 [27126] pacemakerd: notice: stop_child: Stopping lrmd: Sent -15 to process 27129
> Oct 02 18:42:19 [27129] lrmd: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> Oct 02 18:42:19 [27129] lrmd: info: lrmd_shutdown: Terminating with 0 clients
> Oct 02 18:42:19 [27129] lrmd: info: qb_ipcs_us_withdraw: withdrawing server sockets
> Oct 02 18:42:19 [27126] pacemakerd: info: pcmk_child_exit: Child process lrmd exited (pid=27129, rc=0)
> Oct 02 18:42:19 [27126] pacemakerd: error: send_cpg_message: Sending message via cpg FAILED: (rc=9) Bad handle
> Oct 02 18:42:19 [27126] pacemakerd: notice: stop_child: Stopping stonith-ng: Sent -15 to process 27128
> Oct 02 18:42:19 [27126] pacemakerd: notice: pcmk_child_exit: Child process stonith-ng terminated with signal 11 (pid=27128, core=128)
> Oct 02 18:42:19 [27126] pacemakerd: error: send_cpg_message: Sending message via cpg FAILED: (rc=9) Bad handle
> Oct 02 18:42:19 [27126] pacemakerd: notice: pcmk_shutdown_worker: Shutdown complete
> Oct 02 18:42:19 [27126] pacemakerd: info: qb_ipcs_us_withdraw: withdrawing server sockets
> Oct 02 18:42:19 [27126] pacemakerd: info: main: Exiting pacemakerd
>
> Andreas
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list