[Pacemaker] If 256 resources are load(ed), crmd will reboot.
Andrew Beekhof
andrew at beekhof.net
Thu May 29 06:30:51 UTC 2014
On 29 May 2014, at 3:40 pm, Yusuke Iida <yusk.iida at gmail.com> wrote:
> Hi, Andrew
>
> 2014-05-29 14:00 GMT+09:00 Andrew Beekhof <andrew at beekhof.net>:
>>
>> On 29 May 2014, at 12:28 pm, Yusuke Iida <yusk.iida at gmail.com> wrote:
>>
>>> Hi, Andrew
>>>
>>> I'm sorry.
>>> It seems that the notation of the node name became another by syslog.
>>> In order to dispel misunderstanding, the report was newly acquired.
>>> I think that the signs are appearing in vm02/ha-log.
>>
>> Got it :)
>>
>> Ok, step 1 - stop logging debug.
>> Debug is accounting for 30% of the logs and all that writing to disk would be adding significantly to the cluster's workload.
> I understand.
>
>>
>> Question: How have you got logging configured? Anything in /etc/sysconfig/pacemaker ?
>>
>> I ask because pacemaker.log appears to have a jumble of syslog and regular file output:
>>
>> May 29 10:45:26 vm02 cib[25603]: info: cib_perform_op: + /cib: @num_updates=1295
>> May 29 10:45:26 [25603] vm02 cib: info: cib_perform_op: + /cib: @num_updates=1295
> The position of pid is different although seldom cared.
> I attach the /etc/sysconfig/pacemaker of my environment.
The format isn't a problem, it just indicates that there are two mechanisms logging to the same place.
So its redundant.
The question is... how, your configs look fine to me :-/
>
>>
>>
>> Step 2 - can you try this patch:
>>
>> diff --git a/crmd/te_callbacks.c b/crmd/te_callbacks.c
>> index 4d330a6..eba5f11 100644
>> --- a/crmd/te_callbacks.c
>> +++ b/crmd/te_callbacks.c
>> @@ -381,12 +381,15 @@ te_update_diff(const char *event, xmlNode * msg)
>>
>> } else if(strstr(xpath, "/cib/configuration")) {
>> abort_transition(INFINITY, tg_restart, "Non-status change", change);
>> + break; /* Wont be packaged with any resource operations we may be waiting for */
>>
>> } else if(strstr(xpath, "/"XML_CIB_TAG_TICKETS) || safe_str_eq(name, XML_CIB_TAG_TICKETS)) {
>> abort_transition(INFINITY, tg_restart, "Ticket attribute change", change);
>> + break; /* Wont be packaged with any resource operations we may be waiting for */
>>
>> } else if(strstr(xpath, "/"XML_TAG_TRANSIENT_NODEATTRS"[") || safe_str_eq(name, XML_TAG_TRANSIENT_NODEATTRS)) {
>> abort_transition(INFINITY, tg_restart, "Transient attribute change", change);
>> + break; /* Wont be packaged with any resource operations we may be waiting for */
>>
>> } else if(strstr(xpath, "/"XML_LRM_TAG_RSC_OP"[") && safe_str_eq(op, "delete")) {
>> crm_action_t *cancel = NULL;
>
> Thank you for the patch.
> It replies by checking a motion.
Do you mean it works now?
>
> Regards,
> Yusuke
>>
>>
>>>
>>> May 29 10:43:37 vm02 crmd[25608]: error: config_query_callback:
>>> Local CIB query resulted in an error: Timer expired
>>> May 29 10:43:37 vm02 crmd[25608]: info: register_fsa_error_adv:
>>> Resetting the current action list
>>> May 29 10:43:37 vm02 crmd[25608]: error: do_log: FSA: Input I_ERROR
>>> from config_query_callback() received in state S_POLICY_ENGINE
>>> May 29 10:43:37 vm02 crmd[25608]: warning: do_state_transition: State
>>> transition S_POLICY_ENGINE -> S_RECOVERY [ input=I_ERROR
>>> cause=C_FSA_INTERNAL origin=config_query_callback ]
>>> May 29 10:43:37 vm02 crmd[25608]: warning: do_recover: Fast-tracking
>>> shutdown in response to errors
>>> May 29 10:43:37 vm02 crmd[25608]: warning: do_election_vote: Not
>>> voting in election, we're in state S_RECOVERY
>>>
>>> https://drive.google.com/file/d/0BwMFJItoO-fVSEd2MkRiOGxkelk/edit?usp=sharing
>>>
>>> Regards,
>>> Yusuke
>>>
>>> 2014-05-29 10:26 GMT+09:00 Andrew Beekhof <andrew at beekhof.net>:
>>>>
>>>> On 28 May 2014, at 6:42 pm, Yusuke Iida <yusk.iida at gmail.com> wrote:
>>>>
>>>>> Hi, Andrew
>>>>>
>>>>> I made the cluster load a setup to which 256 resources are started using crmsh.
>>>>> At this time, crmd changed into the S_RECOVERY state and rebooted.
>>>>>
>>>>> May 28 17:08:00 [14194] vm02 crmd: error:
>>>>> config_query_callback: Local CIB query resulted in an error: Timer
>>>>> expired
>>>>> May 28 17:08:00 [14194] vm02 crmd: info:
>>>>> register_fsa_error_adv: Resetting the current action list
>>>>> May 28 17:08:00 [14194] vm02 crmd: error: do_log: FSA: Input
>>>>> I_ERROR from config_query_callback() received in state S_POLICY_ENGINE
>>>>> May 28 17:08:00 [14194] vm02 crmd: warning:
>>>>> do_state_transition: State transition S_POLICY_ENGINE -> S_RECOVERY [
>>>>> input=I_ERROR cause=C_FSA_INTERNAL origin=config_query_callback ]
>>>>> May 28 17:08:00 [14194] vm02 crmd: warning: do_recover:
>>>>> Fast-tracking shutdown in response to errors
>>>>> May 28 17:08:00 [14194] vm02 crmd: warning: do_election_vote:
>>>>> Not voting in election, we're in state S_RECOVERY
>>>>>
>>>>> I think that query performed in large quantities cannot be processed.
>>>>> Before implementing cib_performance, abort_transition() was called only once.
>>>>>
>>>>> Is this corrected?
>>>>>
>>>>> report when a problem occurs is attached.
>>>>> https://drive.google.com/file/d/0BwMFJItoO-fVX0gxM1ptcE52WWs/edit?usp=sharing
>>>>
>>>> That doesn't appear to match the symptoms above.
>>>>
>>>>>
>>>>> Regards,
>>>>> Yusuke
>>>>> --
>>>>> ----------------------------------------
>>>>> METRO SYSTEMS CO., LTD
>>>>>
>>>>> Yusuke Iida
>>>>> Mail: yusk.iida at gmail.com
>>>>> ----------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>>
>>> --
>>> ----------------------------------------
>>> METRO SYSTEMS CO., LTD
>>>
>>> Yusuke Iida
>>> Mail: yusk.iida at gmail.com
>>> ----------------------------------------
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
>
> --
> ----------------------------------------
> METRO SYSTEMS CO., LTD
>
> Yusuke Iida
> Mail: yusk.iida at gmail.com
> ----------------------------------------
> <pacemaker>_______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140529/478bc8ed/attachment-0004.sig>
More information about the Pacemaker
mailing list