[Pacemaker] Resource-Monitoring with an "On Fail"-Action

Wed Mar 17 09:17:52 UTC 2010

Hi Tom

have a look at the logs and see whether the monitor op really returns
99. (grep for the resource-id). If so, I'm not sure what the cluster
does with rc=99. As far as I know, rc=4 would be status=failed (unknown
actually).

Regards
Dominik

Tom Tux wrote:
> Thanks for your hint.
> 
> I've configured an lsb-resource like this (with migration-threshold):
> 
> primitive MySQL_MonitorAgent_Resource lsb:mysql-monitor-agent \
>         meta target-role="Started" migration-threshold="3" \
>         op monitor interval="10s" timeout="20s" on-fail="restart"
> 
> I have now modified the init-script "/etc/init.d/mysql-monitor-agent",
> to exit with a returncode not equal "0" (example exit 99), when the
> monitor-operation is querying the status. But the cluster does not
> recognise a failed monitor-action. Why this behaviour? For the
> cluster, everything seems ok.
> 
> node1:/ # showcores.sh MySQL_MonitorAgent_Resource
> Resource                             Score     Node     Stickiness
> #Fail    Migration-Threshold
> MySQL_MonitorAgent_Resource          -1000000  node1 100        0        3
> MySQL_MonitorAgent_Resource          100       node2 100        0        3
> 
> I also saw, that the "last-run"-entry (crm_mon -fort1) for this
> resource is not up-to-date. For me it seems, that the monitor-action
> does not occurs every 10 seconds. Why? Any hints for this behaviour?
> 
> Thanks a lot.
> Tom
> 
> 
> 2010/3/16 Dominik Klein <dk at in-telegence.net>:
>> Tom Tux wrote:
>>> Hi
>>>
>>> I've have a question about the resource-monitoring:
>>> I'm monitoring an ip-resource every 20 seconds. I have configured the
>>> "On Fail"-action with "restart". This works fine. If the
>>> "monitor"-operation fails, then the resource will be restartet.
>>>
>>> But how can I define this resource, to migrate to the other node, if
>>> the resource still fails after 10 restarts? Is this possible? How will
>>> the "failcount" interact with this scenario?
>>>
>>> In the documentation I read, that the resource-"fail_count" will
>>> encrease every time, when the resource restarts. But I can't see this
>>> fail_count.
>> Look at the meta attribute "migration-threshold".
>>
>> Regards
>> Dominik