[Pacemaker] If 256 resources are load(ed), crmd will reboot.

Thu May 29 10:43:21 UTC 2014

Hi, Andrew

2014-05-29 15:30 GMT+09:00 Andrew Beekhof <andrew at beekhof.net>:
>
> On 29 May 2014, at 3:40 pm, Yusuke Iida <yusk.iida at gmail.com> wrote:
>
>> Hi, Andrew
>>
>> 2014-05-29 14:00 GMT+09:00 Andrew Beekhof <andrew at beekhof.net>:
>>>
>>> On 29 May 2014, at 12:28 pm, Yusuke Iida <yusk.iida at gmail.com> wrote:
>>>
>>>> Hi, Andrew
>>>>
>>>> I'm sorry.
>>>> It seems that the notation of the node name became another by syslog.
>>>> In order to dispel misunderstanding, the report was newly acquired.
>>>> I think that the signs are appearing in vm02/ha-log.
>>>
>>> Got it :)
>>>
>>> Ok, step 1 - stop logging debug.
>>> Debug is accounting for 30% of the logs and all that writing to disk would be adding significantly to the cluster's workload.
>> I understand.
>>
>>>
>>> Question:  How have you got logging configured? Anything in /etc/sysconfig/pacemaker ?
>>>
>>> I ask because pacemaker.log appears to have a jumble of syslog and regular file output:
>>>
>>> May 29 10:45:26 vm02 cib[25603]:     info: cib_perform_op: +  /cib:  @num_updates=1295
>>> May 29 10:45:26 [25603] vm02        cib:     info: cib_perform_op:      +  /cib:  @num_updates=1295
>> The position of pid is different although seldom cared.
>> I attach the /etc/sysconfig/pacemaker of my environment.
>
> The format isn't a problem, it just indicates that there are two mechanisms logging to the same place.
> So its redundant.
>
> The question is... how, your configs look fine to me :-/
This was my setting mistake.
syslog was set up to output "local1.*" to "/var/log/pacemaker.log."
I am sorry to cause confusion.

>
>>
>>>
>>>
>>> Step 2 - can you try this patch:
>>>
>>> diff --git a/crmd/te_callbacks.c b/crmd/te_callbacks.c
>>> index 4d330a6..eba5f11 100644
>>> --- a/crmd/te_callbacks.c
>>> +++ b/crmd/te_callbacks.c
>>> @@ -381,12 +381,15 @@ te_update_diff(const char *event, xmlNode * msg)
>>>
>>>         } else if(strstr(xpath, "/cib/configuration")) {
>>>             abort_transition(INFINITY, tg_restart, "Non-status change", change);
>>> +            break; /* Wont be packaged with any resource operations we may be waiting for */
>>>
>>>         } else if(strstr(xpath, "/"XML_CIB_TAG_TICKETS) || safe_str_eq(name, XML_CIB_TAG_TICKETS)) {
>>>             abort_transition(INFINITY, tg_restart, "Ticket attribute change", change);
>>> +            break; /* Wont be packaged with any resource operations we may be waiting for */
>>>
>>>         } else if(strstr(xpath, "/"XML_TAG_TRANSIENT_NODEATTRS"[") || safe_str_eq(name, XML_TAG_TRANSIENT_NODEATTRS)) {
>>>             abort_transition(INFINITY, tg_restart, "Transient attribute change", change);
>>> +            break; /* Wont be packaged with any resource operations we may be waiting for */
>>>
>>>         } else if(strstr(xpath, "/"XML_LRM_TAG_RSC_OP"[") && safe_str_eq(op, "delete")) {
>>>             crm_action_t *cancel = NULL;
>>
>> Thank you for the patch.
>> It replies by checking a motion.
>
> Do you mean it works now?
I think the patch is running without any problems.
When a setup was loaded, it changed so that abort_transition() might
be called only once.
I want this correction to be included in Pacemaker-1.1.12.

A report when a patch is applied is attached.
https://drive.google.com/file/d/0BwMFJItoO-fVWWV0VmxqclMzT2M/edit?usp=sharing

Regards,
Yusuke
>
>>
>> Regards,
>> Yusuke
>>>
>>>
>>>>
>>>> May 29 10:43:37 vm02 crmd[25608]:    error: config_query_callback:
>>>> Local CIB query resulted in an error: Timer expired
>>>> May 29 10:43:37 vm02 crmd[25608]:     info: register_fsa_error_adv:
>>>> Resetting the current action list
>>>> May 29 10:43:37 vm02 crmd[25608]:    error: do_log: FSA: Input I_ERROR
>>>> from config_query_callback() received in state S_POLICY_ENGINE
>>>> May 29 10:43:37 vm02 crmd[25608]:  warning: do_state_transition: State
>>>> transition S_POLICY_ENGINE -> S_RECOVERY [ input=I_ERROR
>>>> cause=C_FSA_INTERNAL origin=config_query_callback ]
>>>> May 29 10:43:37 vm02 crmd[25608]:  warning: do_recover: Fast-tracking
>>>> shutdown in response to errors
>>>> May 29 10:43:37 vm02 crmd[25608]:  warning: do_election_vote: Not
>>>> voting in election, we're in state S_RECOVERY
>>>>
>>>> https://drive.google.com/file/d/0BwMFJItoO-fVSEd2MkRiOGxkelk/edit?usp=sharing
>>>>
>>>> Regards,
>>>> Yusuke
>>>>
>>>> 2014-05-29 10:26 GMT+09:00 Andrew Beekhof <andrew at beekhof.net>:
>>>>>
>>>>> On 28 May 2014, at 6:42 pm, Yusuke Iida <yusk.iida at gmail.com> wrote:
>>>>>
>>>>>> Hi, Andrew
>>>>>>
>>>>>> I made the cluster load a setup to which 256 resources are started using crmsh.
>>>>>> At this time, crmd changed into the S_RECOVERY state and rebooted.
>>>>>>
>>>>>> May 28 17:08:00 [14194] vm02       crmd:    error:
>>>>>> config_query_callback: Local CIB query resulted in an error: Timer
>>>>>> expired
>>>>>> May 28 17:08:00 [14194] vm02       crmd:     info:
>>>>>> register_fsa_error_adv: Resetting the current action list
>>>>>> May 28 17:08:00 [14194] vm02       crmd:    error: do_log: FSA: Input
>>>>>> I_ERROR from config_query_callback() received in state S_POLICY_ENGINE
>>>>>> May 28 17:08:00 [14194] vm02       crmd:  warning:
>>>>>> do_state_transition: State transition S_POLICY_ENGINE -> S_RECOVERY [
>>>>>> input=I_ERROR cause=C_FSA_INTERNAL origin=config_query_callback ]
>>>>>> May 28 17:08:00 [14194] vm02       crmd:  warning: do_recover:
>>>>>> Fast-tracking shutdown in response to errors
>>>>>> May 28 17:08:00 [14194] vm02       crmd:  warning: do_election_vote:
>>>>>> Not voting in election, we're in state S_RECOVERY
>>>>>>
>>>>>> I think that query performed in large quantities cannot be processed.
>>>>>> Before implementing cib_performance, abort_transition() was called only once.
>>>>>>
>>>>>> Is this corrected?
>>>>>>
>>>>>> report when a problem occurs is attached.
>>>>>> https://drive.google.com/file/d/0BwMFJItoO-fVX0gxM1ptcE52WWs/edit?usp=sharing
>>>>>
>>>>> That doesn't appear to match the symptoms above.
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Yusuke
>>>>>> --
>>>>>> ----------------------------------------
>>>>>> METRO SYSTEMS CO., LTD
>>>>>>
>>>>>> Yusuke Iida
>>>>>> Mail: yusk.iida at gmail.com
>>>>>> ----------------------------------------
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ----------------------------------------
>>>> METRO SYSTEMS CO., LTD
>>>>
>>>> Yusuke Iida
>>>> Mail: yusk.iida at gmail.com
>>>> ----------------------------------------
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>>
>> --
>> ----------------------------------------
>> METRO SYSTEMS CO., LTD
>>
>> Yusuke Iida
>> Mail: yusk.iida at gmail.com
>> ----------------------------------------
>> <pacemaker>_______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
----------------------------------------
METRO SYSTEMS CO., LTD

Yusuke Iida
Mail: yusk.iida at gmail.com
----------------------------------------