[Pacemaker] crmd internal error during failover
Andrew Beekhof
andrew at beekhof.net
Wed Apr 9 07:00:18 UTC 2014
On 25 Mar 2014, at 1:03 am, Drapeau, Mathieu <mathieu.drapeau at intel.com> wrote:
> Actually, I was wrong, the version used is 1.1.10.
> So, how I can know which process is taking so long?
top :)
It will tell you where all the CPU is going.
Do you have many resources configured?
>
> thanks
>
> On 3/23/14, 7:35 PM, "Andrew Beekhof" <andrew at beekhof.net> wrote:
>
>>
>> On 21 Mar 2014, at 3:57 am, Drapeau, Mathieu <mathieu.drapeau at intel.com>
>> wrote:
>>
>>> Hello,
>>> From pacemaker 1.1.8-7 from EL6, crmd died unexpected generating this
>>> logs during a failover:
>>
>> Please update to 1.1.10 from the EL6 update channels:
>>
>> http://blog.clusterlabs.org/blog/2014/potential-for-data-corruption-in-pac
>> emaker-1-dot-1-6-through-1-dot-1-9/
>>
>>>
>>>
>>> crmd[10419]: error: crmd_node_update_complete: Node update 79
>>> failed: Timer expired (-62)
>>
>> It looks like your hardware is overloaded and an operation that shouldn't
>> have taken very long has timed out.
>>
>>> crmd[10419]: error: do_log: FSA: Input I_ERROR from
>>> crmd_node_update_complete() received in state S_IDLE
>>> crmd[10419]: notice: do_state_transition: State transition S_IDLE ->
>>> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL
>>> origin=crmd_node_update_complete ]
>>> crmd[10419]: warning: do_recover: Fast-tracking shutdown in response
>>> to errors
>>> crmd[10419]: warning: do_election_vote: Not voting in election, we're
>>> in state S_RECOVERY
>>> crmd[10419]: error: do_log: FSA: Input I_TERMINATE from do_recover()
>>> received in state S_RECOVERY
>>> crmd[10419]: notice: lrm_state_verify_stopped: Stopped 0 recurring
>>> operations at shutdown (2 ops remaining)
>>> crmd[10419]: notice: lrm_state_verify_stopped: Recurring action
>>> testfs-MDT0000_6cda68:21 (testfs-MDT0000_6cda68_monitor_5000) incomplete
>>> at shutdown
>>> crmd[10419]: notice: lrm_state_verify_stopped: Recurring action
>>> MGS_f055b7:30 (MGS_f055b7_monitor_5000) incomplete at shutdown
>>> crmd[10419]: error: lrm_state_verify_stopped: 3 resources were
>>> active at shutdown.
>>> crmd[10419]: notice: do_lrm_control: Disconnected from the LRM
>>> crmd[10419]: notice: terminate_cs_connection: Disconnecting from
>>> Corosync
>>> corosync[10370]: [pcmk ] info: pcmk_ipc_exit: Client crmd
>>> (conn=0x2589f40, async-conn=0x2589f40) left
>>> crmd[10419]: error: crmd_fast_exit: Could not recover from internal
>>> error
>>> pacemakerd[10408]: error: pcmk_child_exit: Child process crmd
>>> (10419) exited: Generic Pacemaker error (201)
>>> pacemakerd[10408]: notice: pcmk_process_exit: Respawning failed child
>>> process: crmd
>>>
>>> What could have happened and how to avoid crmd to die?
>>>
>>> Thanks,
>>> Mat
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140409/f2c5eac9/attachment-0003.sig>
More information about the Pacemaker
mailing list