[Pacemaker] Speed up resource failover?
Dejan Muhamedagic
dejanmm at fastmail.fm
Wed Jan 12 17:20:34 UTC 2011
Hi,
On Wed, Jan 12, 2011 at 09:33:41AM -0700, Patrick H. wrote:
> Sent: Wed Jan 12 2011 09:25:39 GMT-0700 (Mountain Standard Time)
> From: Patrick H. <pacemaker at feystorm.net>
> To: pacemaker at oss.clusterlabs.org
> Subject: Re: [Pacemaker] Speed up resource failover?
>> Sent: Wed Jan 12 2011 01:56:31 GMT-0700 (Mountain Standard Time)
>> From: Lars Ellenberg <lars.ellenberg at linbit.com>
>> To: pacemaker at oss.clusterlabs.org
>> Subject: Re: [Pacemaker] Speed up resource failover?
>>> On Wed, Jan 12, 2011 at 09:30:41AM +0100, Robert van Leeuwen wrote:
>>>
>>>> -----Original message-----
>>>> To: pacemaker at oss.clusterlabs.org; From: Patrick H.
>>>> <pacemaker at feystorm.net>
>>>> Sent: Wed 12-01-2011 00:06
>>>> Subject: [Pacemaker] Speed up resource failover?
>>>> Attachment: inline.txt
>>>>
>>>>> As it is right now, pacemaker seems to take a long time (in
>>>>> computer terms) to fail over resources from one node to the
>>>>> other. Right now, I have 477 IPaddr2 resources evenly distributed
>>>>> among 2 nodes. When I put one node in standby, it takes
>>>>> approximately 5 minutes to move the half of those from one node
>>>>> to the other. And before you ask, theyre because of SSL http
>>>>> virtual hosting. I have no order rules, colocations or anything
>>>>> on those resources, so it should be able migrate the entire list
>>>>> simultaneously, but it seems to do them sequentially. Is there
>>>>> any way to make it migrate the resources in parallel? Or at the
>>>>> very least speed it up?
>>>>>
>>>> Patrick,
>>>>
>>>> It's probably not so much the cluster suite but is has to do with
>>>> the specific resource script. For a proper takeover of a IP you
>>>> have to do an arp "deregister/register".
>>>> This will take a few seconds.
>>>>
>> This is apparently not true :-/
>> I have attached a portion of the lrmd log showing an example of this.
>> Notice that the very first line it starts the vip_55.63 resource, and
>> then immediately on the next line it exits successfully.
>> Another point of note is that somehow after the script already exited,
>> lrmd logs the stderr output from it. I'm not sure if its just delayed
>> logging or what. However, even if the script is still running, notice
>> that there is a huge time gap between 16:11:01 and 16:11:25 where its
>> just sitting there doing nothing.
>> I even did a series of `ps` commands to watch for the processes, and
>> it starts up a bunch of them, and then they all exit, and it sits
>> there for a long period before starting up more. So it is definitely
>> not the resource script slowing it down.
>>
>> Also, in the log, notice that its only starting up a few scripts every
>> second. It should be able to fire off every single script at the exact
>> same time.
>>
>>>> As long as a resource script is busy the cluster suite will not start the next action.
>>>> Parallel execution is not possible in the cluster suite as far as I know.
>>>> (without being a programmer myself I would expect it is pretty tricky to implement parallelization "code-wise" and making 100% sure the cluster does not break)
>>>>
>>>> You could consider to edit the IPaddr2 resource script so it does not wait for the arp commands.
>>>> At you're own risk of course ;-)
>>>>
>>>
>>> There is the cluster option "batch-limit" (in the cib), see
>>> "configuration explained".
>>> and there is lrmd "max-children" (can be set in some /etc/defaults/ or
>>> /etc/sysconfig file, should be set by the init script).
>>> you can set it manually with lrmadmin -p max-children $some_number
>>> That should help you a bit.
>>> But don't overdo. Raise them slowly ;-)
>>>
>>>
>>
>> batch-limit it says defaults to 30 which seems like a sane value. I
>> tried playing with the max-children and upped it to 30 as well, but to
>> no effect. It does seem to be launching 30 instances of the IPaddr2
>> script at a time (as can be seen from the attached log), but the
>> problem is apparently that its sitting there for long periods of time
>> before starting up the next batch. I would think that when one of the
>> 30 completes, it would launch another to take its place. But instead
>> it launches 30, then sits there for a while, then launches another 30.
Strange thing. Can you please file a bugzilla with hb_report.
File it initially for the LRM component (product Linux-HA).
> Oh, and its not waiting for the resource to stop on the other node
> before it starts it up either.
> Here's the lrmd log for resource vip_55.63 from the 'ha02' node (the
> node I put into standby)
> Jan 12 16:10:24 ha02 lrmd: [5180]: info: rsc:vip_55.63:1444: stop
> Jan 12 16:10:24 ha02 lrmd: [5180]: info: Managed vip_55.63:stop process
> 19063 exited with return code 0.
>
>
> And here's the lrmd log for the same resource on 'ha01'
> Jan 12 16:10:50 ha01 lrmd: [4707]: info: rsc:vip_55.63:1390: start
> Jan 12 16:10:50 ha01 lrmd: [4707]: info: Managed vip_55.63:start process
> 8826 exited with return code 0.
>
>
> Notice that it stopped it a full 36 seconds before it tried to start it
> on the other node. The times on both boxes are in sync, so its not that
> either.
Is this the case when you wanted to fail-over a single resource
or was it part of the node standby process?
Thanks,
Dejan
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
More information about the Pacemaker
mailing list