[Pacemaker] Failure after intermittent network outage
Pavel Levshin
pavel at levshin.spb.ru
Thu Mar 10 12:03:28 UTC 2011
Hi,
No, I think you've missed the point. RA did not answer at all. Monitor
actions had been lost due to a cluster transition:
Mar 1 11:16:00 wapgw1-log crmd: [24547]: info: do_lrm_rsc_op:
Performing key=33:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd
op=p-drbd-mdirect1-1:0_monitor_0 )
Mar 1 11:16:00 wapgw1-log crmd: [24547]: info: do_lrm_rsc_op:
Discarding attempt to perform action monitor on p-drbd-mdirect1-1:0 in
state S_ELECTION
Mar 1 11:16:00 wapgw1-log crmd: [24547]: info: send_direct_ack: ACK'ing
resource op p-drbd-mdirect1-1:0_monitor_0 from
33:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd:
lrm_invoke-lrmd-1298967360-58
Mar 1 11:16:00 wapgw1-log crmd: [24547]: info: process_te_message:
Processing (N)ACK lrm_invoke-lrmd-1298967360-58 from wapgw1-log
Mar 1 11:16:00 wapgw1-log crmd: [24547]: info: process_graph_event:
Action p-drbd-mdirect1-1:0_monitor_0/33
(4:99;33:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd) initiated by a
different transitioner
Mar 1 11:16:00 wapgw1-log crmd: [24547]: info: abort_transition_graph:
process_graph_event:456 - Triggered transition abort (complete=1,
tag=lrm_rsc_op, id=p-drbd-mdirect1-1:0_monitor_0,
magic=4:99;33:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd) : Foreign event
So, RA had not have a chance to answer anything.
Apart from this, should I fake all RA's which are supposed to be unused
on the particular nodes in the cluster? It seemes to me like a partial
solution only.
Suppose that I want to use Virtual machine "X" on hardware nodes A and
B, and VM "Y" on nodes B and C. Using DRBD, this is very common
configuration, because "X" cannot access it's disk device on hardware
node "C". Currently, I must configure "X" and "Y" on every hardware
node, or RA will fail with status "not configured". It's not
minimalistic configuration, so it is more error prone than needed.
I would be happy to tell the cluster never to touch resource "X" on node
C in this case. What do you think?
10.03.2011 14:09, Andrew Beekhof wrote:
> Your basic problem is this...
>
> Mar 1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op:
> Processing failed op vm-mproxy1-1_monitor_0 on wapgw1-log: unknown
> error (1)
>
> We asked what state the resource was in and it replied "arrrggghhhh"
> instead of "not installed".
> Had it replied with not installed, we'd have no reason to call stop or
> fence the node to try and clean it up.
--
Pavel Levshin //flicker
More information about the Pacemaker
mailing list