[Pacemaker] Handling transient resource failures
Satomi TANIGUCHI
taniguchis at intellilink.co.jp
Fri Oct 10 09:46:48 UTC 2008
Hi Andrew,
Thank you so much for your reply.
In conclusion, now I agree with you and Lars.
Andrew Beekhof wrote:
> My apologies... we (me, Lars and Keisuke) discussed this at the cluster
> summit and I was supposed to summarize the results (but I didn't find
> the time until now).
>
> Essentially we decided that my idea, which you have implemented here,
> wouldn't work :-(
Keisuke told me a part of that heated discussion. :-)
>
>
>
> - If the initial request is lost due to congestion, then the loop will
> only be executed once
> (Assuming the RA makes a request to a server/daemon as part of the
> resource's health check)
>
> This makes the loop no better than a single monitor operation with a
> long timeout.
Certainly.
>
> - Looping the monitor action as a whole (whether driven by the pengine,
> lrmd or RA) is not a good idea
> - Re-executing the complete loop is inefficient.
>
> For example, there is no need to re-check the contents of a PID or
> configuration file each time.
> This indicates that any looping should occur within the monitor
> operation itself.
I agree.
As far as I know, there are just 2 commands (cases) which might need to retry.
First is ps command.
It has a bug which is caused by kernel's problem,
and it can't show correct information in very few case.
The other is the case which the file in /proc directory is used for checking status.
>
> - It unnecessarily delays the cluster's recovery of some failures.
>
> For example, if the daemon's process doesn't exist, then no amount
> of looping will bring it back.
> In such cases, the RA should return immediately. However
> the presence of a loop prohibits this.
Yes, you're right.
This is the most serious problem of the function which retries monitor op.
>
> - Lars also expressed the fear that others would enable this
> functionality for the wrong reasons and the general quality of the
> monitor actions would decrease as a result.
Though I considered that general re-try function (like monitor-loop
or something) is useful, his consideration is understandable.
>
>
> The most important part though is that because only parts of the monitor
> operation should be repeated (and only under some circumstances), the
> loop must be _inside_ the monitor operation
>
> This rules out crmd/PE/lrmd involvement and means that each RA requiring
> this functionality would need to be modified individually.
>
> This is consistent with the idea that only the RA knows enough about the
> resource to know when it has truly failed and therefor monitor must do
> whatever it needs to do in order to return a definitive result.
I understand.
My implementation infringes the rule of each modules and RA, right?
RA has to return correct result _certainly_,
and crmd/PE/lrmd have to work according to that result _without a doubt_.
I'll modify each RA to solve problems depending on the situation.
>
>
> It might be necessary to write a small utility in C to assist the RA in
> running specific parts of the monitor action with a timeout, however
> wget may be sufficient for the few resources that require this
> functionality (as it already allows the number of retries and timeouts
> to be specified).
Thank you for your idea.
But I'll set a longer value for operation-timeout as you said,
for the time being.
>
>
> Please let me know if anything about was not clear.
Now everything is clear.
Thank you very much for everything!!
Best Regards,
Satomi TANIGUCHI
>
> Andrew
>
> On Oct 7, 2008, at 12:55 PM, Satomi TANIGUCHI wrote:
>
>> Hi,
>>
>>
>> I'm posting patches to add "monitor-loop" operation.
>> Each patch's roles are:
>> (1) monitor_loop_hb.patch: add ocf_monitor_loop() in .ocf-shellfuncs.
>> This is for Heartbeat(83a87f2b6554).
>> (2) monitor_loop_pm.patch: add "monitor-loop" operation to cib.
>> This is for Pacemaker(0f6fc6f8c01f).
>>
>> 1. Specifications
>> monitor-loop operation calls monitor op consecutively until:
>> (1) monitor op returns normal value (OCF_SUCCESS or OCF_RUNNING_MASTER).
>> (2) count of failures becomes more than threshold.
>>
>> To set the threshold value, add a new attribute "maxfailures"
>> in each resource's <instance_attributes>.
>> If you don't set the threshold, or if you set zero,
>> monitor-loop op never returns until it detects monitor op's success.
>> And an operation timeout will occur.
>>
>> 2. How to USE
>> (1) Add the following 1 line between "case $__OCF_ACTION in" and "esac"
>> in your RA.
>> monitor-loop) ocf_monitor_loop ${OCF_RESKEY_maxfailures};;
>> As an example, I attached a patch for Dummy resource
>> (monitor_loop_Dummy.patch).
>> (2) Describe cib.xml.
>> Add "maxfailures" in <instance_attributes>, and add "monitor-loop"
>> operation
>> instead of a regular monitor op.
>> ex.)
>> <primitive id="prmDummy1" class="ocf" type="Dummy"
>> provider="heartbeat">
>> <instance_attributes id="prmDummy1-instance-attributes">
>> <nvpair id="prmDummy1-instance-attrs-maxfailures"
>> name="maxfailures" val
>> ue="3"/>
>> </instance_attributes>
>> <operations>
>> <op id="prmDummy1-operations-start" name="start" interval="0"
>> timeout="3
>> 00" on-fail="restart"/>
>> <op id="prmDummy1-operations-monitor-loop" name="monitor-loop"
>> interval=
>> "10" timeout="60" on-fail="restart"/>
>> <op id="prmDummy1-operations-stop" name="stop" interval="0"
>> timeout="300
>> " on-fail="block"/>
>> </operations>
>> </primitive>
>>
>> 3. NOTE
>> monitor-loop operation is only for OCF resources, not for STONITH
>> resources.
>>
>>
>> Thank you very much for your advices, Andrew and Lars!
>> With just a little alteration, I could realize what I considered.
>>
>> Now I would like to hear your opinions.
>> For OCF resources, it's easy to add monitor-loop operation due to
>> .ocf-shellfuncs.
>> But STONITH resources don't have any common file like that.
>> So, when I want to add monitor-loop (or status-loop) operation in
>> STONITH resources, I have to add a function each of them.
>> It is almost the same as to modify each status function of them...
>>
>> Even if we leave out monitor-loop operation,
>> STONITH resources should have same common file like OCF resources?
>>
>>
>> Your comments and suggestions are really appreciated.
>>
>>
>> Best Regards,
>> Satomi TANIGUCHI
>>
>>
>>
>>
>>
>> Lars Marowsky-Bree wrote:
>>> On 2008-09-17T10:09:21, Andrew Beekhof <beekhof at gmail.com
>>> <mailto:beekhof at gmail.com>> wrote:
>>>> I can't help but feel this is all a work-around for badly written
>>>> RAs and/or overly aggressive timeouts. There's nothing wrong with
>>>> setting large timeouts... if you set 1 hour and the op returns in 1
>>>> second, then we don't wait around doing nothing for the other 59
>>>> minutes and 59 seconds.
>>> Agreed. RAs shouldn't fail randomly. RAs are considered part of the
>>> "trusted" infrastructure.
>>>> But if you really really only want to report an error if N monitors
>>>> fail in M seconds (I still think this is crazy, but whatever), then
>>>> simply implement monitor_loop() which calls monitor() up to N times
>>>> looking for $OCF_SUCCESS and add:
>>>>
>>>> <op id=... name="monitor_loop" timeout="M" interval=... />
>>>>
>>>> instead of a regular monitor op. Or even in addition to a regular
>>>> monitor op with on_fail=ignore if you want.
>>> Best idea so far.
>>> Regards,
>>> Lars
>>
>> diff -r 83a87f2b6554 resources/OCF/.ocf-shellfuncs.in
>> --- a/resources/OCF/.ocf-shellfuncs.in Sat Oct 04 15:54:26 2008 +0200
>> +++ b/resources/OCF/.ocf-shellfuncs.in Tue Oct 07 17:43:38 2008 +0900
>> @@ -234,4 +234,35 @@
>> trap "rm -f $lockfile" EXIT
>> }
>>
>> +ocf_monitor_loop() {
>> + local max=0
>> + local cnt=0
>> +
>> + if [ -n "$1" ]; then
>> + max=$1
>> + fi
>> +
>> + if [ ${max} -lt 0 ]; then
>> + ocf_log error "ocf_monitor_loop: ${OCF_RESOURCE_INSTANCE}:
>> maxfailures has invalid value ${max}."
>> + max=0
>> + fi
>> +
>> + while :
>> + do
>> + $0 monitor
>> + ret=$?
>> + ocf_log debug "ocf_monitor_loop: ${OCF_RESOURCE_INSTANCE}:
>> monitor's return code is ${ret}."
>> +
>> + if [ ${ret} -eq $OCF_SUCCESS -o ${ret} -eq
>> $OCF_RUNNING_MASTER ]; then
>> + break
>> + fi
>> + cnt=`expr ${cnt} + 1`
>> + ocf_log warn "ocf_monitor_loop: ${OCF_RESOURCE_INSTANCE}:
>> monitor is failed ${cnt} times."
>> +
>> + if [ ${max} -gt 0 -a ${cnt} -ge ${max} ]; then
>> + break
>> + fi
>> + done
>> + return ${ret}
>> +}
>> __ocf_set_defaults "$@"
>> diff -r 0f6fc6f8c01f include/crm/crm.h
>> --- a/include/crm/crm.h Mon Oct 06 18:27:13 2008 +0200
>> +++ b/include/crm/crm.h Tue Oct 07 17:43:57 2008 +0900
>> @@ -190,6 +190,7 @@
>> #define CRMD_ACTION_NOTIFIED "notified"
>>
>> #define CRMD_ACTION_STATUS "monitor"
>> +#define CRMD_ACTION_STATUS_LOOP "monitor-loop"
>>
>> /* short names */
>> #define RSC_DELETE CRMD_ACTION_DELETE
>> diff -r 0f6fc6f8c01f include/crm/pengine/common.h
>> --- a/include/crm/pengine/common.h Mon Oct 06 18:27:13 2008 +0200
>> +++ b/include/crm/pengine/common.h Tue Oct 07 17:43:57 2008 +0900
>> @@ -52,7 +52,8 @@
>> action_demote,
>> action_demoted,
>> shutdown_crm,
>> - stonith_node
>> + stonith_node,
>> + monitor_loop_rsc
>> };
>>
>> enum rsc_recovery_type {
>> diff -r 0f6fc6f8c01f lib/pengine/common.c
>> --- a/lib/pengine/common.c Mon Oct 06 18:27:13 2008 +0200
>> +++ b/lib/pengine/common.c Tue Oct 07 17:43:57 2008 +0900
>> @@ -212,6 +212,8 @@
>> return no_action;
>> } else if(safe_str_eq(task, "all_stopped")) {
>> return no_action;
>> + } else if(safe_str_eq(task, CRMD_ACTION_STATUS_LOOP)) {
>> + return monitor_loop_rsc;
>> }
>> crm_debug("Unsupported action: %s", task);
>> return no_action;
>> @@ -265,6 +267,9 @@
>> break;
>> case action_demoted:
>> result = CRMD_ACTION_DEMOTED;
>> + break;
>> + case monitor_loop_rsc:
>> + result = CRMD_ACTION_STATUS_LOOP;
>> break;
>> }
>>
>> diff -r 0f6fc6f8c01f pengine/group.c
>> --- a/pengine/group.c Mon Oct 06 18:27:13 2008 +0200
>> +++ b/pengine/group.c Tue Oct 07 17:43:57 2008 +0900
>> @@ -431,6 +431,7 @@
>> switch(task) {
>> case no_action:
>> case monitor_rsc:
>> + case monitor_loop_rsc:
>> case action_notify:
>> case action_notified:
>> case shutdown_crm:
>> diff -r 0f6fc6f8c01f pengine/utils.c
>> --- a/pengine/utils.c Mon Oct 06 18:27:13 2008 +0200
>> +++ b/pengine/utils.c Tue Oct 07 17:43:57 2008 +0900
>> @@ -335,6 +335,7 @@
>> task--;
>> break;
>> case monitor_rsc:
>> + case monitor_loop_rsc:
>> case shutdown_crm:
>> case stonith_node:
>> task = no_action;
>> diff -r 83a87f2b6554 resources/OCF/Dummy
>> --- a/resources/OCF/Dummy Sat Oct 04 15:54:26 2008 +0200
>> +++ b/resources/OCF/Dummy Tue Oct 07 19:11:31 2008 +0900
>> @@ -142,6 +142,7 @@
>> start) dummy_start;;
>> stop) dummy_stop;;
>> monitor) dummy_monitor;;
>> +monitor-loop) ocf_monitor_loop ${OCF_RESKEY_maxfailures};;
>> migrate_to) ocf_log info "Migrating ${OCF_RESOURCE_INSTANCE} to
>> ${OCF_RESKEY_CRM_meta_migrate_to}."
>> dummy_stop
>> ;;
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at clusterlabs.org <mailto:Pacemaker at clusterlabs.org>
>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at clusterlabs.org
> http://list.clusterlabs.org/mailman/listinfo/pacemaker
More information about the Pacemaker
mailing list