[Pacemaker] pacemaker dies without logs
Alessandro Bono
alessandro.bono at gmail.com
Sun Sep 22 08:21:18 UTC 2013
Found logs in corosync(!?) log directory
these are for primary node ga1-ext
Sep 22 00:45:29 corosync [TOTEM ] A processor failed, forming new configuration.
Sep 22 00:45:31 corosync [CMAN ] quorum lost, blocking activity
Sep 22 00:45:31 corosync [QUORUM] This node is within the non-primary component and will NOT provide any services.
Sep 22 00:45:31 corosync [QUORUM] Members[1]: 1
Sep 22 00:45:31 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Sep 22 00:45:31 corosync [CPG ] chosen downlist: sender r(0) ip(10.12.23.1) ; members(old:2 left:1)
Sep 22 00:45:31 corosync [MAIN ] Completed service synchronization, ready to provide service.
Sep 22 00:45:31 [4418] ga1-ext cib: info: pcmk_cpg_membership: Left[4.0] cib.2
Sep 22 00:45:31 [4418] ga1-ext cib: info: crm_update_peer_proc: pcmk_cpg_membership: Node ga2-ext[2] - corosync-cpg is now offline
Sep 22 00:45:31 [4418] ga1-ext cib: info: pcmk_cpg_membership: Member[4.0] cib.1
Sep 22 00:45:31 [4423] ga1-ext crmd: info: pcmk_cpg_membership: Left[4.0] crmd.2
Sep 22 00:45:31 [4423] ga1-ext crmd: info: crm_update_peer_proc: pcmk_cpg_membership: Node ga2-ext[2] - corosync-cpg is now offline
Sep 22 00:45:31 [4423] ga1-ext crmd: info: peer_update_callback: Client ga2-ext/peer now has status [offline] (DC=true)
Sep 22 00:45:31 [4423] ga1-ext crmd: warning: match_down_event: No match for shutdown action on ga2-ext
Sep 22 00:45:31 [4423] ga1-ext crmd: notice: peer_update_callback: Stonith/shutdown of ga2-ext not matched
Sep 22 00:45:31 [4423] ga1-ext crmd: info: crm_update_peer_join: peer_update_callback: Node ga2-ext[2] - join-2 phase 4 -> 0
Sep 22 00:45:31 [4423] ga1-ext crmd: info: abort_transition_graph: peer_update_callback:214 - Triggered transition abort (complete=1) : Node failure
Sep 22 00:45:31 [4423] ga1-ext crmd: info: pcmk_cpg_membership: Member[4.0] crmd.1
Sep 22 00:45:31 [4423] ga1-ext crmd: notice: cman_event_callback: Membership 900: quorum lost
Sep 22 00:45:31 [4423] ga1-ext crmd: notice: crm_update_peer_state: cman_event_callback: Node ga2-ext[2] - state is now lost (was member)
Sep 22 00:45:31 [4423] ga1-ext crmd: info: peer_update_callback: ga2-ext is now lost (was member)
Sep 22 00:45:31 [4423] ga1-ext crmd: warning: match_down_event: No match for shutdown action on ga2-ext
Sep 22 00:45:31 [4423] ga1-ext crmd: notice: peer_update_callback: Stonith/shutdown of ga2-ext not matched
Sep 22 00:45:31 [4423] ga1-ext crmd: info: abort_transition_graph: peer_update_callback:214 - Triggered transition abort (complete=1) : Node failure
Sep 22 00:45:31 [4423] ga1-ext crmd: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ]
Sep 22 00:45:31 [4423] ga1-ext crmd: info: do_dc_join_offer_one: An unknown node joined - (re-)offer to any unconfirmed nodes
Sep 22 00:45:31 [4423] ga1-ext crmd: info: join_make_offer: Making join offers based on membership 900
Sep 22 00:45:31 [4423] ga1-ext crmd: info: join_make_offer: Skipping ga1-ext: already known 4
Sep 22 00:45:31 [4423] ga1-ext crmd: info: abort_transition_graph: do_te_invoke:158 - Triggered transition abort (complete=1) : Peer Halt
Sep 22 00:45:31 [4423] ga1-ext crmd: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
Sep 22 00:45:31 [4423] ga1-ext crmd: info: crmd_join_phase_log: join-2: ga2-ext=none
Sep 22 00:45:31 [4423] ga1-ext crmd: info: crmd_join_phase_log: join-2: ga1-ext=confirmed
Sep 22 00:45:31 [4423] ga1-ext crmd: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
Sep 22 00:45:31 [4423] ga1-ext crmd: info: abort_transition_graph: do_te_invoke:151 - Triggered transition abort (complete=1) : Peer Cancelled
Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=local/crmd/703, version=0.143.10)
Sep 22 00:45:31 [4419] ga1-ext stonith-ng: info: pcmk_cpg_membership: Left[3.0] stonith-ng.2
Sep 22 00:45:31 [4419] ga1-ext stonith-ng: info: crm_update_peer_proc: pcmk_cpg_membership: Node ga2-ext[2] - corosync-cpg is now offline
Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=local/crmd/704, version=0.143.10)
Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section cib: OK (rc=0, origin=local/crmd/705, version=0.143.11)
Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section nodes: OK (rc=0, origin=local/crmd/706, version=0.143.11)
Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=local/crmd/707, version=0.143.12)
Sep 22 00:45:31 [4421] ga1-ext attrd: notice: attrd_local_callback: Sending full refresh (origin=crmd)
Sep 22 00:45:31 [4421] ga1-ext attrd: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section nodes: OK (rc=0, origin=local/crmd/708, version=0.143.12)
Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=local/crmd/709, version=0.143.13)
Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section cib: OK (rc=0, origin=local/crmd/710, version=0.143.13)
Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: Completed cib_query operation for section 'all': OK (rc=0, origin=local/crmd/711, version=0.143.13)
Sep 22 00:45:31 [4422] ga1-ext pengine: notice: unpack_config: On loss of CCM Quorum: Ignore
Sep 22 00:45:31 [4422] ga1-ext pengine: info: determine_online_status: Node ga1-ext is online
Sep 22 00:45:31 [4422] ga1-ext pengine: info: unpack_rsc_op: Operation monitor found resource dovecot active on ga1-ext
Sep 22 00:45:31 [4422] ga1-ext pengine: notice: unpack_rsc_op: Operation monitor found resource drbd0:0 active in master mode on ga1-ext
Sep 22 00:45:31 [4422] ga1-ext pengine: info: unpack_rsc_op: Operation monitor found resource ClusterIP active on ga1-ext
Sep 22 00:45:31 [4422] ga1-ext pengine: info: unpack_rsc_op: Operation monitor found resource mail active on ga1-ext
Sep 22 00:45:31 [4422] ga1-ext pengine: info: unpack_rsc_op: Operation monitor found resource mysql active on ga1-ext
Sep 22 00:45:31 [4422] ga1-ext pengine: info: unpack_rsc_op: Operation monitor found resource drbdlinks active on ga1-ext
Sep 22 00:45:31 [4422] ga1-ext pengine: info: unpack_rsc_op: Operation monitor found resource SharedFS active on ga1-ext
Sep 22 00:45:31 [4422] ga1-ext pengine: info: clone_print: Master/Slave Set: ms_drbd0 [drbd0]
Sep 22 00:45:31 [4422] ga1-ext pengine: info: short_print: Masters: [ ga1-ext ]
Sep 22 00:45:31 [4422] ga1-ext pengine: info: short_print: Stopped: [ ga2-ext ]
Sep 22 00:45:31 [4422] ga1-ext pengine: info: group_print: Resource Group: service_group
Sep 22 00:45:31 [4422] ga1-ext pengine: info: native_print: SharedFS (ocf::heartbeat:Filesystem): Started ga1-ext
Sep 22 00:45:31 [4422] ga1-ext pengine: info: native_print: drbdlinks (ocf::tummy:drbdlinks): Started ga1-ext
Sep 22 00:45:31 [4422] ga1-ext pengine: info: native_print: ClusterIP (ocf::heartbeat:IPaddr): Started ga1-ext
Sep 22 00:45:31 [4422] ga1-ext pengine: info: native_print: mail (ocf::heartbeat:MailTo): Started ga1-ext
Sep 22 00:45:31 [4422] ga1-ext pengine: info: native_print: mysql (lsb:mysqld): Started ga1-ext
Sep 22 00:45:31 [4422] ga1-ext pengine: info: native_print: dovecot (lsb:dovecot): Started ga1-ext
Sep 22 00:45:31 [4422] ga1-ext pengine: info: native_color: Resource drbd0:1 cannot run anywhere
Sep 22 00:45:31 [4422] ga1-ext pengine: info: master_color: Promoting drbd0:0 (Master ga1-ext)
Sep 22 00:45:31 [4422] ga1-ext pengine: info: master_color: ms_drbd0: Promoted 1 instances of a possible 1 to master
Sep 22 00:45:31 [4422] ga1-ext pengine: info: LogActions: Leave drbd0:0 (Master ga1-ext)
Sep 22 00:45:31 [4422] ga1-ext pengine: info: LogActions: Leave drbd0:1 (Stopped)
Sep 22 00:45:31 [4422] ga1-ext pengine: info: LogActions: Leave SharedFS (Started ga1-ext)
Sep 22 00:45:31 [4422] ga1-ext pengine: info: LogActions: Leave drbdlinks (Started ga1-ext)
Sep 22 00:45:31 [4422] ga1-ext pengine: info: LogActions: Leave ClusterIP (Started ga1-ext)
Sep 22 00:45:31 [4422] ga1-ext pengine: info: LogActions: Leave mail (Started ga1-ext)
Sep 22 00:45:31 [4422] ga1-ext pengine: info: LogActions: Leave mysql (Started ga1-ext)
Sep 22 00:45:31 [4422] ga1-ext pengine: info: LogActions: Leave dovecot (Started ga1-ext)
Sep 22 00:45:31 [4419] ga1-ext stonith-ng: info: pcmk_cpg_membership: Member[3.0] stonith-ng.1
Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: Completed cib_query operation for section //cib/status//node_state[@id='ga1-ext']//transient_attributes//nvpair[@name='probe_complete']: OK (rc=0, origin=local/attrd/51, version=0.143.13)
Sep 22 00:45:31 [4421] ga1-ext attrd: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd0 (10000)
Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=local/attrd/52, version=0.143.13)
Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: Completed cib_query operation for section //cib/status//node_state[@id='ga1-ext']//transient_attributes//nvpair[@name='master-drbd0']: OK (rc=0, origin=local/attrd/53, version=0.143.13)
Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=local/attrd/54, version=0.143.13)
Sep 22 00:45:31 [4422] ga1-ext pengine: notice: process_pe_message: Calculated Transition 621: /var/lib/pacemaker/pengine/pe-input-288.bz2
Sep 22 00:45:31 [4423] ga1-ext crmd: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Sep 22 00:45:31 [4423] ga1-ext crmd: info: do_te_invoke: Processing graph 621 (ref=pe_calc-dc-1379803531-659) derived from /var/lib/pacemaker/pengine/pe-input-288.bz2
Sep 22 00:45:31 [4423] ga1-ext crmd: notice: run_graph: Transition 621 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-288.bz2): Complete
Sep 22 00:45:31 [4423] ga1-ext crmd: info: do_log: FSA: Input I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE
Sep 22 00:45:31 [4423] ga1-ext crmd: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Sep 22 00:45:45 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Sep 22 00:45:45 corosync [CMAN ] quorum regained, resuming activity
Sep 22 00:45:45 corosync [QUORUM] This node is within the primary component and will provide service.
Sep 22 00:45:45 corosync [QUORUM] Members[2]: 1 2
Sep 22 00:45:45 corosync [QUORUM] Members[2]: 1 2
Sep 22 00:45:45 [4423] ga1-ext crmd: notice: cman_event_callback: Membership 904: quorum acquired
Sep 22 00:45:45 [4423] ga1-ext crmd: notice: crm_update_peer_state: cman_event_callback: Node ga2-ext[2] - state is now member (was lost)
Sep 22 00:45:45 [4423] ga1-ext crmd: info: peer_update_callback: ga2-ext is now member (was lost)
Sep 22 00:45:45 [4418] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=local/crmd/712, version=0.143.14)
Sep 22 00:45:45 [4423] ga1-ext crmd: info: crm_cs_flush: Sent 0 CPG messages (1 remaining, last=37): Try again (6)
Sep 22 00:45:45 [4423] ga1-ext crmd: info: cman_event_callback: Membership 904: quorum retained
Sep 22 00:45:45 [4418] ga1-ext cib: info: crm_cs_flush: Sent 0 CPG messages (1 remaining, last=64): Try again (6)
Sep 22 00:45:45 [4418] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section cib: OK (rc=0, origin=local/crmd/713, version=0.143.15)
Sep 22 00:45:45 [4418] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section nodes: OK (rc=0, origin=local/crmd/714, version=0.143.15)
Sep 22 00:45:45 [4418] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=local/crmd/715, version=0.143.16)
Sep 22 00:45:45 [4418] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section nodes: OK (rc=0, origin=local/crmd/716, version=0.143.16)
Sep 22 00:45:45 [4418] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=local/crmd/717, version=0.143.16)
Sep 22 00:45:45 [4423] ga1-ext crmd: info: crm_cs_flush: Sent 0 CPG messages (2 remaining, last=37): Try again (6)
Sep 22 00:45:46 [4418] ga1-ext cib: info: crm_cs_flush: Sent 0 CPG messages (3 remaining, last=64): Try again (6)
Sep 22 00:45:46 corosync [CPG ] chosen downlist: sender r(0) ip(10.12.23.1) ; members(old:1 left:0)
Sep 22 00:45:46 [4423] ga1-ext crmd: info: crm_cs_flush: Sent 0 CPG messages (2 remaining, last=37): Try again (6)
Sep 22 00:45:46 [4419] ga1-ext stonith-ng: info: pcmk_cpg_membership: Joined[4.0] stonith-ng.2
Sep 22 00:45:46 [4419] ga1-ext stonith-ng: info: pcmk_cpg_membership: Member[4.0] stonith-ng.1
Sep 22 00:45:46 [4419] ga1-ext stonith-ng: info: pcmk_cpg_membership: Member[4.1] stonith-ng.2
Sep 22 00:45:46 [4419] ga1-ext stonith-ng: info: crm_update_peer_proc: pcmk_cpg_membership: Node ga2-ext[2] - corosync-cpg is now online
Sep 22 00:45:46 [4423] ga1-ext crmd: info: pcmk_cpg_membership: Joined[5.0] crmd.2
Sep 22 00:45:46 [4423] ga1-ext crmd: info: pcmk_cpg_membership: Member[5.0] crmd.1
Sep 22 00:45:46 [4423] ga1-ext crmd: info: pcmk_cpg_membership: Member[5.1] crmd.2
Sep 22 00:45:46 [4423] ga1-ext crmd: info: crm_update_peer_proc: pcmk_cpg_membership: Node ga2-ext[2] - corosync-cpg is now online
Sep 22 00:45:46 [4423] ga1-ext crmd: info: peer_update_callback: Client ga2-ext/peer now has status [online] (DC=true)
Sep 22 00:45:46 [4418] ga1-ext cib: info: pcmk_cpg_membership: Joined[5.0] cib.2
Sep 22 00:45:46 [4418] ga1-ext cib: info: pcmk_cpg_membership: Member[5.0] cib.1
Sep 22 00:45:46 [4418] ga1-ext cib: info: pcmk_cpg_membership: Member[5.1] cib.2
Sep 22 00:45:46 [4418] ga1-ext cib: info: crm_update_peer_proc: pcmk_cpg_membership: Node ga2-ext[2] - corosync-cpg is now online
Sep 22 00:45:46 [4423] ga1-ext crmd: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=peer_update_callback ]
Sep 22 00:45:46 [4423] ga1-ext crmd: info: do_dc_join_offer_one: An unknown node joined - (re-)offer to any unconfirmed nodes
Sep 22 00:45:46 [4423] ga1-ext crmd: info: join_make_offer: Making join offers based on membership 904
Sep 22 00:45:46 [4423] ga1-ext crmd: info: join_make_offer: join-2: Sending offer to ga2-ext
Sep 22 00:45:46 [4423] ga1-ext crmd: info: crm_update_peer_join: join_make_offer: Node ga2-ext[2] - join-2 phase 0 -> 1
Sep 22 00:45:46 [4423] ga1-ext crmd: info: join_make_offer: Skipping ga1-ext: already known 4
Sep 22 00:45:46 [4423] ga1-ext crmd: info: abort_transition_graph: do_te_invoke:158 - Triggered transition abort (complete=1) : Peer Halt
Sep 22 00:45:46 [4418] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=local/crmd/718, version=0.143.17)
Sep 22 00:45:46 [4419] ga1-ext stonith-ng: info: crm_cs_flush: Sent 0 CPG messages (1 remaining, last=5): Try again (6)
Sep 22 00:45:46 corosync [MAIN ] Completed service synchronization, ready to provide service.
Sep 22 00:45:46 [4412] ga1-ext pacemakerd: info: crm_cs_flush: Sent 0 CPG messages (1 remaining, last=9): Try again (6)
Sep 22 00:45:46 [4418] ga1-ext cib: info: crm_cs_flush: Sent 4 CPG messages (0 remaining, last=68): OK (1)
Sep 22 00:45:46 [4423] ga1-ext crmd: info: crm_cs_flush: Sent 3 CPG messages (0 remaining, last=40): OK (1)
Sep 22 00:45:48 [4418] ga1-ext cib: info: cib_process_request: Completed cib_sync_one operation for section 'all': OK (rc=0, origin=ga2-ext/ga2-ext/(null), version=0.143.17)
Sep 22 00:45:48 [4412] ga1-ext pacemakerd: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Sep 22 00:45:48 [4419] ga1-ext stonith-ng: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Sep 22 00:45:48 [4421] ga1-ext attrd: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Sep 22 00:45:48 [4418] ga1-ext cib: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Sep 22 00:45:48 [4418] ga1-ext cib: error: cib_cs_destroy: Corosync connection lost! Exiting.
Sep 22 00:45:48 [4418] ga1-ext cib: info: terminate_cib: cib_cs_destroy: Exiting fast...
Sep 22 00:45:48 [4423] ga1-ext crmd: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Sep 22 00:45:48 [4418] ga1-ext cib: info: qb_ipcs_us_withdraw: withdrawing server sockets
Sep 22 00:45:48 [4418] ga1-ext cib: info: crm_client_destroy: Destroying 0 events
Sep 22 00:45:48 [4418] ga1-ext cib: info: crm_client_destroy: Destroying 0 events
Sep 22 00:45:48 [4418] ga1-ext cib: info: qb_ipcs_us_withdraw: withdrawing server sockets
Sep 22 00:45:48 [4418] ga1-ext cib: info: crm_client_destroy: Destroying 0 events
Sep 22 00:45:48 [4418] ga1-ext cib: info: qb_ipcs_us_withdraw: withdrawing server sockets
Sep 22 00:45:48 [4423] ga1-ext crmd: error: crmd_cs_destroy: connection terminated
Sep 22 00:45:48 [4412] ga1-ext pacemakerd: error: mcp_cpg_destroy: Connection destroyed
Sep 22 00:45:48 [4418] ga1-ext cib: info: crm_xml_cleanup: Cleaning up memory from libxml2
Sep 22 00:45:48 [4419] ga1-ext stonith-ng: error: stonith_peer_cs_destroy: Corosync connection terminated
Sep 22 00:45:48 [4419] ga1-ext stonith-ng: info: stonith_shutdown: Terminating with 1 clients
Sep 22 00:45:48 [4419] ga1-ext stonith-ng: info: cib_connection_destroy: Connection to the CIB closed.
Sep 22 00:45:48 [4419] ga1-ext stonith-ng: info: crm_client_destroy: Destroying 0 events
Sep 22 00:45:48 [4419] ga1-ext stonith-ng: info: qb_ipcs_us_withdraw: withdrawing server sockets
Sep 22 00:45:48 [4419] ga1-ext stonith-ng: info: main: Done
Sep 22 00:45:48 [4419] ga1-ext stonith-ng: info: crm_xml_cleanup: Cleaning up memory from libxml2
Sep 22 00:45:48 [4412] ga1-ext pacemakerd: info: crm_xml_cleanup: Cleaning up memory from libxml2
Sep 22 00:45:48 [4421] ga1-ext attrd: crit: attrd_cs_destroy: Lost connection to Corosync service!
Sep 22 00:45:48 [4421] ga1-ext attrd: notice: main: Exiting...
Sep 22 00:45:48 [4421] ga1-ext attrd: notice: main: Disconnecting client 0x1987990, pid=4423...
Sep 22 00:45:48 [4421] ga1-ext attrd: error: attrd_cib_connection_destroy: Connection to the CIB terminated...
Sep 22 00:45:48 [4423] ga1-ext crmd: info: qb_ipcs_us_withdraw: withdrawing server sockets
Sep 22 00:45:48 [4423] ga1-ext crmd: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
Sep 22 00:45:48 [4423] ga1-ext crmd: notice: crmd_exit: Forcing immediate exit: Link has been severed (67)
Sep 22 00:45:48 [4423] ga1-ext crmd: info: crm_xml_cleanup: Cleaning up memory from libxml2
Sep 22 00:45:48 [4420] ga1-ext lrmd: info: cancel_recurring_action: Cancelling operation ClusterIP_monitor_30000
Sep 22 00:45:48 [4420] ga1-ext lrmd: warning: qb_ipcs_event_sendv: new_event_notification (4420-4423-6): Bad file descriptor (9)
Sep 22 00:45:48 [4420] ga1-ext lrmd: warning: send_client_notify: Notification of client crmd/84c7e6b7-398c-40da-bec9-48b5e36dce2b failed
Sep 22 00:45:48 [4420] ga1-ext lrmd: info: crm_client_destroy: Destroying 1 events
Sep 22 00:45:48 [4422] ga1-ext pengine: info: crm_client_destroy: Destroying 0 events
no logs on ga2-ext, seems strange
no corosync configuration on cluster nodes
[root at ga2-ext ~]# find /etc/corosync/
/etc/corosync/
/etc/corosync/uidgid.d
/etc/corosync/amf.conf.example
/etc/corosync/corosync.conf.old
/etc/corosync/corosync.conf.example
/etc/corosync/corosync.conf.example.udpu
/etc/corosync/service.d
[root at ga1-ext ~]# find /etc/corosync/
/etc/corosync/
/etc/corosync/corosync.conf.example
/etc/corosync/service.d
/etc/corosync/corosync.conf.example.udpu
/etc/corosync/uidgid.d
/etc/corosync/corosync.conf.old
/etc/corosync/amf.conf.example
same packages on both nodes
corosync-1.4.1-15.el6_4.1.x86_64
corosynclib-1.4.1-15.el6_4.1.x86_64
drbd-bash-completion-8.3.15-1.el6.x86_64
drbdlinks-1.23-1.el6.noarch
drbd-pacemaker-8.3.15-1.el6.x86_64
drbd-udev-8.3.15-1.el6.x86_64
drbd-utils-8.3.15-1.el6.x86_64
pacemaker-1.1.10-1.el6.x86_64
pacemaker-cli-1.1.10-1.el6.x86_64
pacemaker-cluster-libs-1.1.10-1.el6.x86_64
pacemaker-debuginfo-1.1.10-1.el6.x86_64
pacemaker-libs-1.1.10-1.el6.x86_64
On Sun, 22 Sep 2013 07:14:27 +0000, Alessandro Bono wrote:
> Hi
>
> I have a problem with a cluster where pacemaker dies without logs or something
> Problem started when I switched to centos 6.4 and converted cluster from corosync to cman
> this happen typically when system is under high load
> tonight I received notification of drbd split brian and found on primary machine only these programs running
>
> 4420 ? Ss 1:29 /usr/libexec/pacemaker/lrmd
> 4422 ? Ss 0:42 /usr/libexec/pacemaker/pengine
>
> on secondary machine pacemaker is ok
> on logs only drbd disconnect and split brain notification
> I tried pacemaker 1.1.8 from centos and 1.1.9 and 1.1.10 from clusterlabs with same result
>
> howto debug this problem?
> /etc/sysconfig/pacemaker has lots configuration but not sure which one to use
>
>
> pacemaker configuration is:
>
> node ga1-ext \
> attributes standby="off"
> node ga2-ext \
> attributes standby="off"
> primitive ClusterIP ocf:heartbeat:IPaddr \
> params ip="10.12.23.3" cidr_netmask="24" \
> op monitor interval="30s"
> primitive SharedFS ocf:heartbeat:Filesystem \
> params device="/dev/drbd/by-res/r0" directory="/shared" fstype="ext4" options="noatime,nobarrier"
> primitive dovecot lsb:dovecot
> primitive drbd0 ocf:linbit:drbd \
> params drbd_resource="r0" \
> op monitor interval="15s"
> primitive drbdlinks ocf:tummy:drbdlinks
> primitive mail ocf:heartbeat:MailTo \
> params email="root at company.com" subject="ga-ext cluster - "
> primitive mysql lsb:mysqld
> group service_group SharedFS drbdlinks ClusterIP mail mysql dovecot \
> meta target-role="Started"
> ms ms_drbd0 drbd0 \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
> colocation service_on_drbd inf: service_group ms_drbd0:Master
> order service_after_drbd inf: ms_drbd0:promote service_group:start
> property $id="cib-bootstrap-options" \
> dc-version="1.1.10-1.el6-368c726" \
> cluster-infrastructure="cman" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1379831462" \
> maintenance-mode="false"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100"
>
>
> cman configuration
>
> cat /etc/cluster/cluster.conf
>
> <cluster config_version="6" name="ga-ext_cluster">
> <logging debug="off"/>
> <clusternodes>
> <clusternode name="ga1-ext" nodeid="1">
> <fence>
> <method name="pcmk-redirect">
> <device name="pcmk" port="ga1-ext"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="ga2-ext" nodeid="2">
> <fence>
> <method name="pcmk-redirect">
> <device name="pcmk" port="ga2-ext"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <fencedevices>
> <fencedevice agent="fence_pcmk" name="pcmk"/>
> </fencedevices>
> </cluster>
>
> tell me you need other information
>
> thank you
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list