[Pacemaker] Speed up resource failover?
Andrew Beekhof
andrew at beekhof.net
Tue Jan 18 08:49:18 UTC 2011
On Fri, Jan 14, 2011 at 12:45 PM, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> Hi,
>
> On Wed, Jan 12, 2011 at 02:41:31PM -0700, Patrick H. wrote:
>>
>> >>Oh, and its not waiting for the resource to stop on the other
>> >>node before it starts it up either.
>> >>Here's the lrmd log for resource vip_55.63 from the 'ha02' node
>> >>(the node I put into standby)
>> >>Jan 12 16:10:24 ha02 lrmd: [5180]: info: rsc:vip_55.63:1444: stop
>> >>Jan 12 16:10:24 ha02 lrmd: [5180]: info: Managed vip_55.63:stop
>> >>process 19063 exited with return code 0.
>> >>
>> >>
>> >>And here's the lrmd log for the same resource on 'ha01'
>> >>Jan 12 16:10:50 ha01 lrmd: [4707]: info: rsc:vip_55.63:1390: start
>> >>Jan 12 16:10:50 ha01 lrmd: [4707]: info: Managed vip_55.63:start
>> >>process 8826 exited with return code 0.
>> >>
>> >>
>> >>Notice that it stopped it a full 36 seconds before it tried to
>> >>start it on the other node. The times on both boxes are in
>> >>sync, so its not that either.
>> >
>> >Is this the case when you wanted to fail-over a single resource
>> >or was it part of the node standby process?
>> >
>> >Thanks,
>> >
>> >Dejan
>> In that case I put the node in standby.
>>
>>
>> While digging around a bit more, I noticed this:
>> Jan 12 17:24:56 ha01 crmd: [4710]: info: te_rsc_command: Initiating
>> action 966: stop vip_55.236_stop_0 on ha01 (local)
>> Jan 12 17:24:56 ha01 crmd: [4710]: info: do_lrm_rsc_op: Performing
>> key=966:14345:0:0e860f83-8611-4873-829f-2a0c6fcf6667
>> op=vip_55.236_stop_0 )
>> Jan 12 17:24:56 ha01 lrmd: [4707]: info: rsc:vip_55.236:1714: stop
>> Jan 12 17:24:56 ha01 lrmd: [4707]: info: Managed vip_55.236:stop
>> process 11414 exited with return code 0.
>> Jan 12 17:24:56 ha01 crmd: [4710]: info: process_lrm_event: LRM
>> operation vip_55.236_stop_0 (call=1714, rc=0, cib-update=19621,
>> confirmed=true) ok
>> Jan 12 17:25:04 ha01 crmd: [4710]: info: match_graph_event: Action
>> vip_55.236_stop_0 (966) confirmed on ha01 (rc=0)
>> Jan 12 17:25:04 ha01 crmd: [4710]: info: te_rsc_command: Initiating
>> action 967: start vip_55.236_start_0 on ha02
>> Jan 12 17:25:28 ha01 crmd: [4710]: info: match_graph_event: Action
>> vip_55.236_start_0 (967) confirmed on ha02 (rc=0)
>>
>> Notice the huge delays before the match_graph_event on both stop and
>> start. So it seems everything is waiting on match_graph_event. What
>> is this?
>
> Can't say, but perhaps Andrew would know, though I'm not sure if
> there's enough information here. Best to open a bugzilla and
> attach hb_report.
Did a bug get created for this?
More information about the Pacemaker
mailing list