[Pacemaker] Corosync over DHCP IP
Dan Frincu
df.cluster at gmail.com
Mon Feb 11 09:59:23 UTC 2013
Hi,
On Sun, Feb 10, 2013 at 2:24 PM, Viacheslav Biriukov
<v.v.biriukov at gmail.com> wrote:
> Hi guys,
>
> Got a tricky issue with Corosync and Pacemaker over DHCP IP address using
> unicast. Corosync craches periodically.
>
> Packages are from centos 6 repos:
> corosync-1.4.1-7.el6_3.1.x86_64
> corosynclib-1.4.1-7.el6_3.1.x86_64
> pacemaker-cluster-libs-1.1.7-6.el6.x86_64
> pacemaker-libs-1.1.7-6.el6.x86_64
> pacemaker-cli-1.1.7-6.el6.x86_64
> pacemaker-1.1.7-6.el6.x86_64
>
>
> Logs
>
> Feb 09 23:24:33 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
> Feb 10 00:24:39 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
> Feb 10 01:24:44 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
> Feb 10 02:24:48 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
> Feb 10 03:24:51 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
> Feb 10 04:24:52 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
> Feb 10 05:24:54 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
> Feb 10 06:25:00 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
> Feb 10 07:25:06 host1 lrmd: [5248]: info: rsc:P_SESSION_IP:25: monitor
> Feb 10 07:56:22 corosync [TOTEM ] A processor failed, forming new
> configuration.
> Feb 10 07:56:22 corosync [TOTEM ] The network interface is down.
This ^^^ is your problem. Corosync doesn't like it, see
https://github.com/corosync/corosync/wiki/Corosync-and-ifdown-on-active-network-interface
Normally DHCP shouldn't take the interface down. Also, since changing
the network configuration in corosync means restarting it, why not go
with static IP's?
HTH,
Dan
> Feb 10 07:56:24 corosync [TOTEM ] The network interface [172.17.0.104] is
> now up.
> Feb 10 07:56:25 [5242] host1 pacemakerd: error: cfg_connection_destroy:
> Connection destroyed
> Feb 10 07:56:25 [5251] host1 crmd: error: ais_dispatch:
> Receiving message body failed: (2) Library error: Resource temporarily
> unavailable (11)
> Feb 10 07:56:25 [5246] host1 cib: error: ais_dispatch:
> Receiving message body failed: (2) Library error: Resource temporarily
> unavailable (11)
> Feb 10 07:56:25 [5249] host1 attrd: error: ais_dispatch:
> Receiving message body failed: (2) Library error: Resource temporarily
> unavailable (11)
> Feb 10 07:56:25 [5251] host1 crmd: error: ais_dispatch: AIS
> connection failed
> Feb 10 07:56:25 [5242] host1 pacemakerd: error: cpg_connection_destroy:
> Connection destroyed
> Feb 10 07:56:25 [5246] host1 cib: error: ais_dispatch: AIS
> connection failed
> Feb 10 07:56:25 [5251] host1 crmd: info: crmd_ais_destroy:
> connection closed
> Feb 10 07:56:25 [5249] host1 attrd: error: ais_dispatch: AIS
> connection failed
> Feb 10 07:56:25 [5247] host1 stonith-ng: error: ais_dispatch:
> Receiving message body failed: (2) Library error: Resource temporarily
> unavailable (11)
> Feb 10 07:56:25 [5246] host1 cib: error: cib_ais_destroy: AIS
> connection terminated
> Feb 10 07:56:25 [5249] host1 attrd: crit: attrd_ais_destroy: Lost
> connection to OpenAIS service!
> Feb 10 07:56:25 [5242] host1 pacemakerd: notice: pcmk_shutdown_worker:
> Shuting down Pacemaker
> Feb 10 07:56:25 [5247] host1 stonith-ng: error: ais_dispatch: AIS
> connection failed
> Feb 10 07:56:25 [5249] host1 attrd: notice: main: Exiting...
> Feb 10 07:56:25 [5247] host1 stonith-ng: error: stonith_peer_ais_destroy:
> AIS connection terminated
> Feb 10 07:56:25 [5242] host1 pacemakerd: notice: stop_child:
> Stopping crmd: Sent -15 to process 5251
> Feb 10 07:56:25 [5249] host1 attrd: error:
> attrd_cib_connection_destroy: Connection to the CIB terminated...
> Feb 10 07:56:25 [5251] host1 crmd: info: crm_signal_dispatch:
> Invoking handler for signal 15: Terminated
> Feb 10 07:56:25 [5251] host1 crmd: notice: crm_shutdown:
> Requesting shutdown, upper limit is 1200000ms
> Feb 10 07:56:25 [5251] host1 crmd: info: do_shutdown_req:
> Sending shutdown request to host2
> Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit: Child
> process stonith-ng exited (pid=5247, rc=1)
> Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message: IPC
> Channel to 5249 is not connected
> Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message: IPC
> Channel to 5246 is not connected
> Feb 10 07:56:25 [5242] host1 pacemakerd: warning: send_ipc_message: IPC
> Channel to 5247 is not connected
> Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message:
> Sending message via cpg FAILED: (rc=9) Bad handle
> Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit: Child
> process cib exited (pid=5246, rc=1)
> Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message:
> Sending message via cpg FAILED: (rc=9) Bad handle
> Feb 10 07:56:25 [5242] host1 pacemakerd: error: pcmk_child_exit: Child
> process attrd exited (pid=5249, rc=1)
> Feb 10 07:56:25 [5242] host1 pacemakerd: error: send_cpg_message:
> Sending message via cpg FAILED: (rc=9) Bad handle
> Feb 10 07:56:27 [5251] host1 crmd: error: send_ais_text:
> Sending message 68 via pcmk: FAILED (rc=2): Library error: Connection timed
> out (110)
> Feb 10 07:56:27 [5251] host1 crmd: error: do_log: FSA: Input
> I_ERROR from do_shutdown_req() received in state S_NOT_DC
> Feb 10 07:56:27 [5251] host1 crmd: notice: do_state_transition:
> State transition S_NOT_DC -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL
> origin=do_shutdown_req ]
> Feb 10 07:56:27 [5251] host1 crmd: error: do_recover:
> Action A_RECOVER (0000000001000000) not supported
> Feb 10 07:56:27 [5251] host1 crmd: error: do_log: FSA: Input
> I_TERMINATE from do_recover() received in state S_RECOVERY
> Feb 10 07:56:27 [5251] host1 crmd: notice: do_state_transition:
> State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE
> cause=C_FSA_INTERNAL origin=do_recover ]
> Feb 10 07:56:27 [5251] host1 crmd: info: do_shutdown:
> Disconnecting STONITH...
> Feb 10 07:56:27 [5251] host1 crmd: info:
> tengine_stonith_connection_destroy: Fencing daemon disconnected
> Feb 10 07:56:27 host1 lrmd: [5248]: info: cancel_op: operation monitor[25]
> on ocf::OpenStackFloatingIP::P_SESSION_IP for client 5251, its parameters:
> CRM_meta_name=[monitor] crm_feature_set=[3.0.6] CRM_meta_timeout=[20000]
> CRM_meta_interval=[5000] ip=[172.24.0.104] cancelled
> Feb 10 07:56:27 [5251] host1 crmd: error: verify_stopped:
> Resource P_SESSION_IP was active at shutdown. You may ignore this error if
> it is unmanaged.
> Feb 10 07:56:27 [5251] host1 crmd: info: do_lrm_control:
> Disconnected from the LRM
> Feb 10 07:56:27 [5251] host1 crmd: notice: terminate_ais_connection:
> Disconnecting from AIS
> Feb 10 07:56:27 [5251] host1 crmd: info: do_ha_control:
> Disconnected from OpenAIS
> Feb 10 07:56:27 [5251] host1 crmd: info: do_cib_control:
> Disconnecting CIB
> Feb 10 07:56:27 [5251] host1 crmd: error: send_ipc_message: IPC
> Channel to 5246 is not connected
> Feb 10 07:56:27 [5251] host1 crmd: error: send_ipc_message: IPC
> Channel to 5246 is not connected
> Feb 10 07:56:27 [5251] host1 crmd: error:
> cib_native_perform_op_delegate: Sending message to CIB service FAILED
> Feb 10 07:56:27 [5251] host1 crmd: info:
> crmd_cib_connection_destroy: Connection to the CIB terminated...
> Feb 10 07:56:27 [5251] host1 crmd: error: verify_stopped:
> Resource P_SESSION_IP was active at shutdown. You may ignore this error if
> it is unmanaged.
> Feb 10 07:56:27 [5251] host1 crmd: info: do_exit: Performing
> A_EXIT_0 - gracefully exiting the CRMd
> Feb 10 07:56:27 [5251] host1 crmd: error: do_exit: Could not
> recover from internal error
> Feb 10 07:56:27 [5251] host1 crmd: info: free_mem: Dropping
> I_TERMINATE: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ]
> Feb 10 07:56:27 [5251] host1 crmd: info: crm_xml_cleanup:
> Cleaning up memory from libxml2
> Feb 10 07:56:27 [5251] host1 crmd: info: do_exit: [crmd]
> stopped (2)
> Feb 10 07:56:27 [5242] host1 pacemakerd: error: pcmk_child_exit: Child
> process crmd exited (pid=5251, rc=2)
> Feb 10 07:56:27 [5242] host1 pacemakerd: warning: send_ipc_message: IPC
> Channel to 5251 is not connected
> Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message:
> Sending message via cpg FAILED: (rc=9) Bad handle
> Feb 10 07:56:27 [5242] host1 pacemakerd: notice: stop_child:
> Stopping pengine: Sent -15 to process 5250
> Feb 10 07:56:27 [5242] host1 pacemakerd: info: pcmk_child_exit: Child
> process pengine exited (pid=5250, rc=0)
> Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message:
> Sending message via cpg FAILED: (rc=9) Bad handle
> Feb 10 07:56:27 [5242] host1 pacemakerd: notice: stop_child:
> Stopping lrmd: Sent -15 to process 5248
> Feb 10 07:56:27 host1 lrmd: [5248]: info: lrmd is shutting down
> Feb 10 07:56:27 [5242] host1 pacemakerd: info: pcmk_child_exit: Child
> process lrmd exited (pid=5248, rc=0)
> Feb 10 07:56:27 [5242] host1 pacemakerd: error: send_cpg_message:
> Sending message via cpg FAILED: (rc=9) Bad handle
> Feb 10 07:56:27 [5242] host1 pacemakerd: notice: pcmk_shutdown_worker:
> Shutdown complete
> Feb 10 07:56:27 [5242] host1 pacemakerd: info: main: Exiting
> pacemakerd
>
>
> corosync.conf:
>
> compatibility: whitetank
>
> totem {
> version: 2
> secauth: off
> nodeid: 104
> interface {
> member {
> memberaddr: 172.17.0.104
> }
> member {
> memberaddr: 172.17.0.105
> }
> ringnumber: 0
> bindnetaddr: 172.17.0.0
> mcastport: 5426
> ttl: 1
> }
> transport: udpu
> }
>
> logging {
> fileline: off
> to_logfile: yes
> to_syslog: yes
> debug: on
> logfile: /var/log/cluster/corosync.log
> debug: off
> timestamp: on
> logger_subsys {
> subsys: AMF
> debug: off
> }
> }
> service {
> # Load the Pacemaker Cluster Resource Manager
> ver: 1
> name: pacemaker
> }
>
> aisexec {
> user: root
> group: root
> }
>
>
>
> Thank you!
>
> --
> Viacheslav Biriukov
> BR
> http://biriukov.me
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
--
Dan Frincu
CCNA, RHCE
More information about the Pacemaker
mailing list