[Pacemaker] crmd internal error during failover

Andrew Beekhof andrew at beekhof.net
Sun Mar 23 19:35:24 EDT 2014


On 21 Mar 2014, at 3:57 am, Drapeau, Mathieu <mathieu.drapeau at intel.com> wrote:

> Hello,
> From pacemaker 1.1.8-7 from EL6, crmd died unexpected generating this logs during a failover:

Please update to 1.1.10 from the EL6 update channels:
   http://blog.clusterlabs.org/blog/2014/potential-for-data-corruption-in-pacemaker-1-dot-1-6-through-1-dot-1-9/

> 
> 
> crmd[10419]:    error: crmd_node_update_complete: Node update 79 failed: Timer expired (-62)

It looks like your hardware is overloaded and an operation that shouldn't have taken very long has timed out.

> crmd[10419]:    error: do_log: FSA: Input I_ERROR from crmd_node_update_complete() received in state S_IDLE
> crmd[10419]:   notice: do_state_transition: State transition S_IDLE -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL origin=crmd_node_update_complete ]
> crmd[10419]:  warning: do_recover: Fast-tracking shutdown in response to errors
> crmd[10419]:  warning: do_election_vote: Not voting in election, we're in state S_RECOVERY
> crmd[10419]:    error: do_log: FSA: Input I_TERMINATE from do_recover() received in state S_RECOVERY
> crmd[10419]:   notice: lrm_state_verify_stopped: Stopped 0 recurring operations at shutdown (2 ops remaining)
> crmd[10419]:   notice: lrm_state_verify_stopped: Recurring action testfs-MDT0000_6cda68:21 (testfs-MDT0000_6cda68_monitor_5000) incomplete at shutdown
> crmd[10419]:   notice: lrm_state_verify_stopped: Recurring action MGS_f055b7:30 (MGS_f055b7_monitor_5000) incomplete at shutdown
> crmd[10419]:    error: lrm_state_verify_stopped: 3 resources were active at shutdown.
> crmd[10419]:   notice: do_lrm_control: Disconnected from the LRM
> crmd[10419]:   notice: terminate_cs_connection: Disconnecting from Corosync
> corosync[10370]:   [pcmk  ] info: pcmk_ipc_exit: Client crmd (conn=0x2589f40, async-conn=0x2589f40) left
> crmd[10419]:    error: crmd_fast_exit: Could not recover from internal error
> pacemakerd[10408]:    error: pcmk_child_exit: Child process crmd (10419) exited: Generic Pacemaker error (201)
> pacemakerd[10408]:   notice: pcmk_process_exit: Respawning failed child process: crmd
> 
> What could have happened and how to avoid crmd to die?
> 
> Thanks,
> Mat
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140324/bd8fcfb8/attachment-0003.sig>


More information about the Pacemaker mailing list