[Pacemaker] crmd failure

John Osborne john.osborne at arrisi.com
Fri Oct 17 14:52:47 UTC 2014


I have a two node cluster which manages 4 resources in a resource group.
Node 1 was active and was rebooted. Resources started on the second node. At
the exact time the first node completed rebooting, crmd failed on the second
node. Logs below. These nodes are running pacemaker-1.1.10-0.15.25
rpm. 

Any ideas on how to determine what happened here? Problem with crmd?

Oct 15 04:46:46 vho-1-mc2 crmd[12132]:    error: crmd_node_update_complete:
Node update 51 failed: Timer expired (-62)
Oct 15 04:46:46 vho-1-mc2 crmd[12132]:    error: do_log: FSA: Input I_ERROR
from crmd_node_update_complete() received in state S_IDLE
Oct 15 04:46:46 vho-1-mc2 crmd[12132]:   notice: do_state_transition: State
transition S_IDLE -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL
origin=crmd_node_update_complete ]
Oct 15 04:46:46 vho-1-mc2 crmd[12132]:  warning: do_recover: Fast-tracking
shutdown in response to errors
Oct 15 04:46:46 vho-1-mc2 crmd[12132]:  warning: do_election_vote: Not
voting in election, we're in state S_RECOVERY
Oct 15 04:46:46 vho-1-mc2 crmd[12132]:    error: do_log: FSA: Input
I_TERMINATE from do_recover() received in state S_RECOVERY
Oct 15 04:46:46 vho-1-mc2 crmd[12132]:   notice: lrm_state_verify_stopped:
Stopped 0 recurring operations at shutdown (5 ops remaining)
Oct 15 04:46:46 vho-1-mc2 crmd[12132]:   notice: lrm_state_verify_stopped:
Recurring action cdssRA:17 (cdssRA_monitor_15000) incomplete at shutdown
Oct 15 04:46:46 vho-1-mc2 crmd[12132]:   notice: lrm_state_verify_stopped:
Recurring action mcast_IP:22 (mcast_IP_monitor_5000) incomplete at shutdown
Oct 15 04:46:46 vho-1-mc2 crmd[12132]:   notice: lrm_state_verify_stopped:
Recurring action mgmt_IP:27 (mgmt_IP_monitor_5000) incomplete at shutdown
Oct 15 04:46:46 vho-1-mc2 crmd[12132]:   notice: lrm_state_verify_stopped:
Recurring action cdssDB:12 (cdssDB_monitor_30000) incomplete at shutdown
Oct 15 04:46:46 vho-1-mc2 crmd[12132]:   notice: lrm_state_verify_stopped:
Recurring action mcast-route:32 (mcast-route_monitor_10000) incomplete at
shutdown
Oct 15 04:46:46 vho-1-mc2 crmd[12132]:    error: lrm_state_verify_stopped: 6
resources were active at shutdown.
Oct 15 04:46:46 vho-1-mc2 crmd[12132]:   notice: do_lrm_control:
Disconnected from the LRM
Oct 15 04:46:46 vho-1-mc2 crmd[12132]:   notice: terminate_cs_connection:
Disconnecting from Corosync
Oct 15 04:46:46 vho-1-mc2 corosync[12120]:  [pcmk  ] info: pcmk_ipc_exit:
Client crmd (conn=0x65e6d0, async-conn=0x65e6d0) left
Oct 15 04:46:46 vho-1-mc2 crmd[12132]:    error: crmd_fast_exit: Could not
recover from internal error
Oct 15 04:46:47 vho-1-mc2 corosync[12120]:  [pcmk  ] ERROR:
pcmk_wait_dispatch: Child process crmd exited (pid=12132, rc=201)
Oct 15 04:46:47 vho-1-mc2 corosync[12120]:  [pcmk  ] info: update_member:
Node vho-1-mc2 now has process list: 00000000000000000000000000151112 (1380626)
Oct 15 04:46:47 vho-1-mc2 corosync[12120]:  [pcmk  ] notice:
pcmk_wait_dispatch: Respawning failed child process: crmd







More information about the Pacemaker mailing list