[Pacemaker] Migration form mon to pacemaker
Florian Haas
florian.haas at linbit.com
Fri Feb 11 08:41:38 UTC 2011
On 2011-02-11 09:16, Uwe Schmeling wrote:
> Hi,
>
> I'm just migrating my recent mon/heartbeat configuration to pacemaker.
> The point of interest is the webservice behavior. Before the monitor
> checked if the service failed twice within 20 sec, switch to other node
> was initiated if this happens. Now I'm trying to configuring the same
> behavior using pacemaker. The webservice is monitored every 10 seconds
> (interval=10), failure timeout is set to 20s (expecting to ignore all
> failures within this time frame)
That is *not* what failure-timeout means. Please reread the docs.
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-failure-migration.html
> and it should only happen if a "valid
> failure" occurs twice (migration-theshold=2). Valid-failure means: the
> service fails twice within 20s but is ignored if the service is back
> within 20s.
There is no such thing in Pacemaker as the "valid failure" you're
talking about.
This is the configuration, which is used to implement this
> behavior:
>
> node lbv01 \
> attributes standby="off"
> node lbv02 \
> attributes standby="off"
> primitive apacheIP ocf:heartbeat:IPaddr2 \
> params ip="10.6.151.190" \
> op monitor interval="10s" \
> meta is-managed="true"
> primitive haproxyIP ocf:heartbeat:IPaddr2 \
> params ip="10.6.151.191" \
> op monitor interval="10s"
> primitive pingd ocf:pacemaker:ping \
> params host_list="10.6.151.11" multiplier="100" \
> op monitor interval="15s" timeout="5s"
> *primitive webservice ocf:heartbeat:webservices \
> op monitor on-fail="ignore" interval="10s" \
> meta failure-timeout="20s" migration-threshold="2"*
> group webservice-ips haproxyIP apacheIP webservice \
> meta target-role="Started"
> colocation all-resources inf: webservice-ips pingd
> property $id="cib-bootstrap-options" \
> dc-version="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1297249441" \
> cluster-delay="30"
>
> If a webservice monitoring failure is forced, the switchover immediately
> is performed, ignoring timeout and threshold.
I already pointed out that you've got a false impression of
failure-timeout, so that's irrelevant here.
Could it be that you are not just forcing the monitoring failure, but
also keeping the service from restarting? Some "chmod -x" trick? Because
that makes your monitor fail *and* the subsequent restart, and its that
failing restart that would cause your migration.
Or else your "webservices" agent exits with $OCF_ERR_INSTALLED on your
monitor failure, which will also cause a prompt migration.
Btw, when you write your own RA, *please* don't install it into the
"heartbeat" provider directory, instead create your own directory.
Otherwise a casual observer will think you're talking about a resource
agent that lives in our upstream repo, which for your "webservices"
agent is clearly not the case.
Florian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110211/4c664455/attachment-0004.sig>
More information about the Pacemaker
mailing list