[Pacemaker] Resource-Monitoring with an "On Fail"-Action
Dominik Klein
dk at in-telegence.net
Wed Mar 17 09:17:52 UTC 2010
Hi Tom
have a look at the logs and see whether the monitor op really returns
99. (grep for the resource-id). If so, I'm not sure what the cluster
does with rc=99. As far as I know, rc=4 would be status=failed (unknown
actually).
Regards
Dominik
Tom Tux wrote:
> Thanks for your hint.
>
> I've configured an lsb-resource like this (with migration-threshold):
>
> primitive MySQL_MonitorAgent_Resource lsb:mysql-monitor-agent \
> meta target-role="Started" migration-threshold="3" \
> op monitor interval="10s" timeout="20s" on-fail="restart"
>
> I have now modified the init-script "/etc/init.d/mysql-monitor-agent",
> to exit with a returncode not equal "0" (example exit 99), when the
> monitor-operation is querying the status. But the cluster does not
> recognise a failed monitor-action. Why this behaviour? For the
> cluster, everything seems ok.
>
> node1:/ # showcores.sh MySQL_MonitorAgent_Resource
> Resource Score Node Stickiness
> #Fail Migration-Threshold
> MySQL_MonitorAgent_Resource -1000000 node1 100 0 3
> MySQL_MonitorAgent_Resource 100 node2 100 0 3
>
> I also saw, that the "last-run"-entry (crm_mon -fort1) for this
> resource is not up-to-date. For me it seems, that the monitor-action
> does not occurs every 10 seconds. Why? Any hints for this behaviour?
>
> Thanks a lot.
> Tom
>
>
> 2010/3/16 Dominik Klein <dk at in-telegence.net>:
>> Tom Tux wrote:
>>> Hi
>>>
>>> I've have a question about the resource-monitoring:
>>> I'm monitoring an ip-resource every 20 seconds. I have configured the
>>> "On Fail"-action with "restart". This works fine. If the
>>> "monitor"-operation fails, then the resource will be restartet.
>>>
>>> But how can I define this resource, to migrate to the other node, if
>>> the resource still fails after 10 restarts? Is this possible? How will
>>> the "failcount" interact with this scenario?
>>>
>>> In the documentation I read, that the resource-"fail_count" will
>>> encrease every time, when the resource restarts. But I can't see this
>>> fail_count.
>> Look at the meta attribute "migration-threshold".
>>
>> Regards
>> Dominik
More information about the Pacemaker
mailing list