[Pacemaker] Long failover
Dmitry Matveichev
d.matveichev at mfisoft.ru
Fri Nov 14 13:33:19 UTC 2014
We've already tried to set it but it didn't help.
------------------------
Kind regards,
Dmitriy Matveichev.
-----Original Message-----
From: Andrei Borzenkov [mailto:arvidjaar at gmail.com]
Sent: Friday, November 14, 2014 4:12 PM
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] Long failover
On Fri, Nov 14, 2014 at 2:57 PM, Dmitry Matveichev <d.matveichev at mfisoft.ru> wrote:
> Hello,
>
>
>
> We have a cluster configured via pacemaker+corosync+crm. The
> configuration
> is:
>
>
>
> node master
>
> node slave
>
> primitive HA-VIP1 IPaddr2 \
>
> params ip=192.168.22.71 nic=bond0 \
>
> op monitor interval=1s
>
> primitive HA-variator lsb: variator \
>
> op monitor interval=1s \
>
> meta migration-threshold=1 failure-timeout=1s
>
> group HA-Group HA-VIP1 HA-variator
>
> property cib-bootstrap-options: \
>
> dc-version=1.1.10-14.el6-368c726 \
>
> cluster-infrastructure="classic openais (with plugin)" \
>
> expected-quorum-votes=2 \
>
> stonith-enabled=false \
>
> no-quorum-policy=ignore \
>
> last-lrm-refresh=1383871087
>
> rsc_defaults rsc-options: \
>
> resource-stickiness=100
>
>
>
> Firstly I make the variator service down on the master node (actually
> I delete the service binary and kill the variator process, so the
> variator fails to restart). Resources very quickly move on the slave
> node as expected. Then I return the binary on the master and restart
> the variator service. Now I make the same stuff with binary and service on slave node.
> The crm status command quickly shows me HA-variator (lsb: variator):
> Stopped. But it take to much time (for us) before recourses are switched on
> the master node (around 1 min). Then line
>
> Failed actions:
>
> HA- variator _monitor_1000 on slave 'unknown error' (1): call=-1,
> status=Timed Out, last-rc-change='Sat Dec 21 03:59:45 2013',
> queued=0ms, exec=0ms
>
> appears in the crm status and recourses are switched.
>
>
>
> What is that timeout? Where I can change it?
>
This is operation timeout. You can change it in operation definition:
op monitor interval=1s timeout=5s
_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list