[Pacemaker] RFC: What part of the XML configuration do you hate the most?
Satomi TANIGUCHI
taniguchis at intellilink.co.jp
Tue Oct 7 10:55:09 UTC 2008
Hi,
I'm posting patches to add "monitor-loop" operation.
Each patch's roles are:
(1) monitor_loop_hb.patch: add ocf_monitor_loop() in .ocf-shellfuncs.
This is for Heartbeat(83a87f2b6554).
(2) monitor_loop_pm.patch: add "monitor-loop" operation to cib.
This is for Pacemaker(0f6fc6f8c01f).
1. Specifications
monitor-loop operation calls monitor op consecutively until:
(1) monitor op returns normal value (OCF_SUCCESS or OCF_RUNNING_MASTER).
(2) count of failures becomes more than threshold.
To set the threshold value, add a new attribute "maxfailures"
in each resource's <instance_attributes>.
If you don't set the threshold, or if you set zero,
monitor-loop op never returns until it detects monitor op's success.
And an operation timeout will occur.
2. How to USE
(1) Add the following 1 line between "case $__OCF_ACTION in" and "esac"
in your RA.
monitor-loop) ocf_monitor_loop ${OCF_RESKEY_maxfailures};;
As an example, I attached a patch for Dummy resource
(monitor_loop_Dummy.patch).
(2) Describe cib.xml.
Add "maxfailures" in <instance_attributes>, and add "monitor-loop" operation
instead of a regular monitor op.
ex.)
<primitive id="prmDummy1" class="ocf" type="Dummy" provider="heartbeat">
<instance_attributes id="prmDummy1-instance-attributes">
<nvpair id="prmDummy1-instance-attrs-maxfailures" name="maxfailures" val
ue="3"/>
</instance_attributes>
<operations>
<op id="prmDummy1-operations-start" name="start" interval="0" timeout="3
00" on-fail="restart"/>
<op id="prmDummy1-operations-monitor-loop" name="monitor-loop" interval=
"10" timeout="60" on-fail="restart"/>
<op id="prmDummy1-operations-stop" name="stop" interval="0" timeout="300
" on-fail="block"/>
</operations>
</primitive>
3. NOTE
monitor-loop operation is only for OCF resources, not for STONITH resources.
Thank you very much for your advices, Andrew and Lars!
With just a little alteration, I could realize what I considered.
Now I would like to hear your opinions.
For OCF resources, it's easy to add monitor-loop operation due to
.ocf-shellfuncs.
But STONITH resources don't have any common file like that.
So, when I want to add monitor-loop (or status-loop) operation in
STONITH resources, I have to add a function each of them.
It is almost the same as to modify each status function of them...
Even if we leave out monitor-loop operation,
STONITH resources should have same common file like OCF resources?
Your comments and suggestions are really appreciated.
Best Regards,
Satomi TANIGUCHI
Lars Marowsky-Bree wrote:
> On 2008-09-17T10:09:21, Andrew Beekhof <beekhof at gmail.com> wrote:
>
>> I can't help but feel this is all a work-around for badly written RAs
>> and/or overly aggressive timeouts. There's nothing wrong with setting
>> large timeouts... if you set 1 hour and the op returns in 1 second, then we
>> don't wait around doing nothing for the other 59 minutes and 59 seconds.
>
> Agreed. RAs shouldn't fail randomly. RAs are considered part of the
> "trusted" infrastructure.
>
>> But if you really really only want to report an error if N monitors fail in
>> M seconds (I still think this is crazy, but whatever), then simply
>> implement monitor_loop() which calls monitor() up to N times looking for
>> $OCF_SUCCESS and add:
>>
>> <op id=... name="monitor_loop" timeout="M" interval=... />
>>
>> instead of a regular monitor op. Or even in addition to a regular monitor
>> op with on_fail=ignore if you want.
>
> Best idea so far.
>
>
>
> Regards,
> Lars
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: monitor_loop_hb.patch
Type: text/x-patch
Size: 1079 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20081007/49114fef/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: monitor_loop_pm.patch
Type: text/x-patch
Size: 1922 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20081007/49114fef/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: monitor_loop_Dummy.patch
Type: text/x-patch
Size: 433 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20081007/49114fef/attachment-0005.bin>
More information about the Pacemaker
mailing list