No subject

Mon Mar 29 07:57:18 UTC 2010

handle this issue.

I'm thinking, that the monitoring is working now. But I'm irritated
with the output of the command "crm_mon -t1", which shows me the
"last-rc-change" and the "last-run" of the monitor-operation. I have
defined the monitor-operation for an certain resource every 10
seconds, but the "last-run"-field of the "crm_mon -t1"-output doesn't
change it's value. It changes it's value only, when he got no
returncode with value "0" back and the failcount will be increased. Is
this behaviour correct?

Thanks a lot for your help.
Kind regards,
Tom

2010/3/19 Tom Tux <tomtux80 at gmail.com>:
> Hi
>
> Thanks a lot for your help.
>
> So now it's Novell's turn.....:-)
>
> Regards,
> Tom
>
>
> 2010/3/18 Dejan Muhamedagic <dejanmm at fastmail.fm>:
>> Hi,
>>
>> On Thu, Mar 18, 2010 at 02:15:07PM +0100, Tom Tux wrote:
>>> Hi Dejan
>>>
>>> hb_report -V says:
>>> cluster-glue: 1.0.2 (b75bd738dc09263a578accc69342de2cb2eb8db6)
>>
>> Yes, unfortunately that one is buggy.
>>
>>> I've opened a case by Novell. They will fix this problem with updating
>>> to the newest cluster-glue-release.
>>>
>>> Could it be, that I have another configuration-issue in my
>>> cluster-config? I think with the following setting, the resource
>>> should be monitored:
>>>
>>> ...
>>> ...
>>> primitive MySQL_MonitorAgent_Resource lsb:mysql-monitor-agent \
>>> =A0 =A0 =A0 =A0 meta migration-threshold=3D"3" \
>>> =A0 =A0 =A0 =A0 op monitor interval=3D"10s" timeout=3D"20s" on-fail=3D"=
restart"
>>> op_defaults $id=3D"op_defaults-options" \
>>> =A0 =A0 =A0 =A0 on-fail=3D"restart" \
>>> =A0 =A0 =A0 =A0 enabled=3D"true"
>>> property $id=3D"cib-bootstrap-options" \
>>> =A0 =A0 =A0 =A0 expected-quorum-votes=3D"2" \
>>> =A0 =A0 =A0 =A0 dc-version=3D"1.0.6-c48e3360eb18c53fd68bb7e7dbe39279ccb=
c0354" \
>>> =A0 =A0 =A0 =A0 cluster-infrastructure=3D"openais" \
>>> =A0 =A0 =A0 =A0 stonith-enabled=3D"true" \
>>> =A0 =A0 =A0 =A0 no-quorum-policy=3D"ignore" \
>>> =A0 =A0 =A0 =A0 stonith-action=3D"reboot" \
>>> =A0 =A0 =A0 =A0 last-lrm-refresh=3D"1268838090"
>>> ...
>>> ...
>>>
>>>
>>> And when I look the last-run-time with "crm_mon -fort1", then it result=
s me:
>>> =A0 =A0MySQL_Server_Resource: migration-threshold=3D3
>>> =A0 =A0 + (32) stop: last-rc-change=3D'Wed Mar 17 10:49:55 2010'
>>> last-run=3D'Wed Mar 17 10:49:55 2010' exec-time=3D5060ms queue-time=3D0=
ms
>>> rc=3D0 (ok)
>>> =A0 =A0 + (40) start: last-rc-change=3D'Wed Mar 17 11:09:06 2010'
>>> last-run=3D'Wed Mar 17 11:09:06 2010' exec-time=3D4080ms queue-time=3D0=
ms
>>> rc=3D0 (ok)
>>> =A0 =A0 + (41) monitor: interval=3D20000ms last-rc-change=3D'Wed Mar 17
>>> 11:09:10 2010' last-run=3D'Wed Mar 17 11:09:10 2010' exec-time=3D20ms
>>> queue-time=3D0ms rc=3D0 (ok)
>>>
>>> And the results above was yesterday....
>>
>> The configuration looks fine to me.
>>
>> Cheers,
>>
>> Dejan
>>
>>> Thanks for your help.
>>> Kind regards,
>>> Tom
>>>
>>>
>>>
>>> 2010/3/18 Dejan Muhamedagic <dejanmm at fastmail.fm>:
>>> > Hi,
>>> >
>>> > On Wed, Mar 17, 2010 at 12:38:47PM +0100, Tom Tux wrote:
>>> >> Hi Dejan
>>> >>
>>> >> Thanks for your answer.
>>> >>
>>> >> I'm using this cluster with the packages from the HAE
>>> >> (HighAvailability-Extension)-Repository from SLES11. Therefore, is i=
t
>>> >> possible, to upgrade the cluster-glue from source?
>>> >
>>> > Yes, though I don't think that any SLE11 version has this bug.
>>> > When was your version released? What does hb_report -V say?
>>> >
>>> >> I think, the better
>>> >> way is to wait for updates in the hae-repository from novell. Or do
>>> >> you have experience, upgrading the cluster-glue from source (even if
>>> >> it is installed with zypper/rpm)?
>>> >>
>>> >> Do you know, when the HAE-Repository will be upgraded?
>>> >
>>> > Can't say. Best would be if you talk to Novell about the issue.
>>> >
>>> > Cheers,
>>> >
>>> > Dejan
>>> >
>>> >> Thanks a lot.
>>> >> Tom
>>> >>
>>> >>
>>> >> 2010/3/17 Dejan Muhamedagic <dejanmm at fastmail.fm>:
>>> >> > Hi,
>>> >> >
>>> >> > On Wed, Mar 17, 2010 at 10:57:16AM +0100, Tom Tux wrote:
>>> >> >> Hi Dominik
>>> >> >>
>>> >> >> The problem is, that the cluster does not do the monitor-action e=
very
>>> >> >> 20s. The last time, when he did the action was at 09:21. And now =
we
>>> >> >> have 10:37:
>>> >> >
>>> >> > There was a serious bug in some cluster-glue packages. What
>>> >> > you're experiencing sounds like that. I can't say which
>>> >> > packages (probably sth like 1.0.1, they were never released). At
>>> >> > any rate, I'd suggest upgrading to cluster-glue 1.0.3.
>>> >> >
>>> >> > Thanks,
>>> >> >
>>> >> > Dejan
>>> >> >
>>> >> >> =A0MySQL_MonitorAgent_Resource: migration-threshold=3D3
>>> >> >> =A0 =A0 + (479) stop: last-rc-change=3D'Wed Mar 17 09:21:28 2010'
>>> >> >> last-run=3D'Wed Mar 17 09:21:28 2010' exec-time=3D3010ms queue-ti=
me=3D0ms
>>> >> >> rc=3D0 (ok)
>>> >> >> =A0 =A0 + (480) start: last-rc-change=3D'Wed Mar 17 09:21:31 2010=
'
>>> >> >> last-run=3D'Wed Mar 17 09:21:31 2010' exec-time=3D3010ms queue-ti=
me=3D0ms
>>> >> >> rc=3D0 (ok)
>>> >> >> =A0 =A0 + (481) monitor: interval=3D10000ms last-rc-change=3D'Wed=
 Mar 17
>>> >> >> 09:21:34 2010' last-run=3D'Wed Mar 17 09:21:34 2010' exec-time=3D=
20ms
>>> >> >> queue-time=3D0ms rc=3D0 (ok)
>>> >> >>
>>> >> >> If I restart the whole cluster, then the new returncode (exit99 o=
r
>>> >> >> exit4) will be saw by the cluster-monitor.
>>> >> >>
>>> >> >>
>>> >> >> 2010/3/17 Dominik Klein <dk at in-telegence.net>:
>>> >> >> > Hi Tom
>>> >> >> >
>>> >> >> > have a look at the logs and see whether the monitor op really r=
eturns
>>> >> >> > 99. (grep for the resource-id). If so, I'm not sure what the cl=
uster
>>> >> >> > does with rc=3D99. As far as I know, rc=3D4 would be status=3Df=
ailed (unknown
>>> >> >> > actually).
>>> >> >> >
>>> >> >> > Regards
>>> >> >> > Dominik
>>> >> >> >
>>> >> >> > Tom Tux wrote:
>>> >> >> >> Thanks for your hint.
>>> >> >> >>
>>> >> >> >> I've configured an lsb-resource like this (with migration-thre=
shold):
>>> >> >> >>
>>> >> >> >> primitive MySQL_MonitorAgent_Resource lsb:mysql-monitor-agent =
\
>>> >> >> >> =A0 =A0 =A0 =A0 meta target-role=3D"Started" migration-thresho=
ld=3D"3" \
>>> >> >> >> =A0 =A0 =A0 =A0 op monitor interval=3D"10s" timeout=3D"20s" on=
-fail=3D"restart"
>>> >> >> >>
>>> >> >> >> I have now modified the init-script "/etc/init.d/mysql-monitor=
-agent",
>>> >> >> >> to exit with a returncode not equal "0" (example exit 99), whe=
n the
>>> >> >> >> monitor-operation is querying the status. But the cluster does=
 not
>>> >> >> >> recognise a failed monitor-action. Why this behaviour? For the
>>> >> >> >> cluster, everything seems ok.
>>> >> >> >>
>>> >> >> >> node1:/ # showcores.sh MySQL_MonitorAgent_Resource
>>> >> >> >> Resource =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 Score =A0 =A0 Node =A0 =A0 Stickiness
>>> >> >> >> #Fail =A0 =A0Migration-Threshold
>>> >> >> >> MySQL_MonitorAgent_Resource =A0 =A0 =A0 =A0 =A0-1000000 =A0nod=
e1 100 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A03
>>> >> >> >> MySQL_MonitorAgent_Resource =A0 =A0 =A0 =A0 =A0100 =A0 =A0 =A0=
 node2 100 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A03
>>> >> >> >>
>>> >> >> >> I also saw, that the "last-run"-entry (crm_mon -fort1) for thi=
s
>>> >> >> >> resource is not up-to-date. For me it seems, that the monitor-=
action
>>> >> >> >> does not occurs every 10 seconds. Why? Any hints for this beha=
viour?
>>> >> >> >>
>>> >> >> >> Thanks a lot.
>>> >> >> >> Tom
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> 2010/3/16 Dominik Klein <dk at in-telegence.net>:
>>> >> >> >>> Tom Tux wrote:
>>> >> >> >>>> Hi
>>> >> >> >>>>
>>> >> >> >>>> I've have a question about the resource-monitoring:
>>> >> >> >>>> I'm monitoring an ip-resource every 20 seconds. I have confi=
gured the
>>> >> >> >>>> "On Fail"-action with "restart". This works fine. If the
>>> >> >> >>>> "monitor"-operation fails, then the resource will be restart=
et.
>>> >> >> >>>>
>>> >> >> >>>> But how can I define this resource, to migrate to the other =
node, if
>>> >> >> >>>> the resource still fails after 10 restarts? Is this possible=
? How will
>>> >> >> >>>> the "failcount" interact with this scenario?
>>> >> >> >>>>
>>> >> >> >>>> In the documentation I read, that the resource-"fail_count" =
will
>>> >> >> >>>> encrease every time, when the resource restarts. But I can't=
 see this
>>> >> >> >>>> fail_count.
>>> >> >> >>> Look at the meta attribute "migration-threshold".
>>> >> >> >>>
>>> >> >> >>> Regards
>>> >> >> >>> Dominik
>>> >> >> >
>>> >> >> >
>>> >> >> > _______________________________________________
>>> >> >> > Pacemaker mailing list
>>> >> >> > Pacemaker at oss.clusterlabs.org
>>> >> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> >> >> >
>>> >> >>
>>> >> >> _______________________________________________
>>> >> >> Pacemaker mailing list
>>> >> >> Pacemaker at oss.clusterlabs.org
>>> >> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> >> >
>>> >> > _______________________________________________
>>> >> > Pacemaker mailing list
>>> >> > Pacemaker at oss.clusterlabs.org
>>> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> >> >
>>> >>
>>> >> _______________________________________________
>>> >> Pacemaker mailing list
>>> >> Pacemaker at oss.clusterlabs.org
>>> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> >
>>> > _______________________________________________
>>> > Pacemaker mailing list
>>> > Pacemaker at oss.clusterlabs.org
>>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> >
>>>
>>> _______________________________________________
>>> Pacemaker mailing list
>>> Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>