[Pacemaker] Failure after intermittent network outage

Pavel Levshin pavel at levshin.spb.ru
Fri Mar 11 12:31:14 UTC 2011


Hi Andrew.


I'm sorry, but I can not agree.

Look again at the DC log. Here it says: "Action lost". This is why I use 
this term.

Then it declares every monitor action as it has failed with rc=1, which 
is not true. Note that even those actions which were directed to 
inexistent RA are listed as failed with rc=1. (DRBD is not installed on 
target server, so there is no ocf:linbit:drbd).


Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 30]: In-flight (id: 
ilo-wapgw1-1:0_monitor_0, loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
30: ilo-wapgw1-1:0_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 31]: In-flight (id: 
ilo-wapgw1-2:0_monitor_0, loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
31: ilo-wapgw1-2:0_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 32]: In-flight (id: 
ilo-wapgw1-log:0_monitor_0, loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
32: ilo-wapgw1-log:0_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 33]: In-flight (id: 
p-drbd-mdirect1-1:0_monitor_0, loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
33: p-drbd-mdirect1-1:0_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 34]: In-flight (id: 
p-drbd-mdirect1-2:0_monitor_0, loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
34: p-drbd-mdirect1-2:0_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 35]: In-flight (id: 
p-drbd-mproxy1-1:0_monitor_0, loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
35: p-drbd-mproxy1-1:0_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 36]: In-flight (id: 
p-drbd-mproxy1-2:0_monitor_0, loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
36: p-drbd-mproxy1-2:0_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 37]: In-flight (id: 
p-drbd-mrouter1-1:0_monitor_0, loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
37: p-drbd-mrouter1-1:0_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 38]: In-flight (id: 
p-drbd-mrouter1-2:0_monitor_0, loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
38: p-drbd-mrouter1-2:0_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 39]: In-flight (id: 
vm-mdirect1-1_monitor_0, loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
39: vm-mdirect1-1_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 40]: In-flight (id: 
vm-mdirect1-2_monitor_0, loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
40: vm-mdirect1-2_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 41]: In-flight (id: 
vm-mproxy1-1_monitor_0, loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
41: vm-mproxy1-1_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 42]: In-flight (id: 
vm-mproxy1-2_monitor_0, loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
42: vm-mproxy1-2_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 43]: In-flight (id: 
vm-mrouter1-1_monitor_0, loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
43: vm-mrouter1-1_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 44]: In-flight (id: 
vm-mrouter1-2_monitor_0, loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
44: vm-mrouter1-2_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 45]: In-flight (id: 
ip-puppetmaster_monitor_0, loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
45: ip-puppetmaster_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 46]: In-flight (id: 
ip-logserver_monitor_0, loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
46: ip-logserver_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 47]: In-flight (id: 
vm-vradius1_monitor_0, loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
47: vm-vradius1_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 48]: In-flight (id: 
p-drbd-vradius1:0_monitor_0, loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
48: p-drbd-vradius1:0_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 49]: In-flight (id: vm-ppg1_monitor_0, 
loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
49: vm-ppg1_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: action_timer_callback: 
Timer popped (timeout=20000, abort_level=1000000, complete=false)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: ERROR: print_elem: Aborting 
transition, action lost: [Action 50]: In-flight (id: 
p-drbd-ppg1:0_monitor_0, loc: wapgw1-log, priority: 0)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
action_timer_callback:486 - Triggered transition abort (complete=0) : 
Action lost
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: cib_action_update: rsc_op 
50: p-drbd-ppg1:0_monitor_0 on wapgw1-log timed out
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: WARN: status_from_rc: Action 30 
(ilo-wapgw1-1:0_monitor_0) on wapgw1-log failed (target: 7 vs. rc: 1): Error
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
match_graph_event:272 - Triggered transition abort (complete=0, 
tag=lrm_rsc_op, id=ilo-wapgw1-1:0_monitor_0, 
magic=2:1;30:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.8) : 
Event failed
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: match_graph_event: Action 
ilo-wapgw1-1:0_monitor_0 (30) confirmed on wapgw1-log (rc=4)
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: te_rsc_command: Initiating 
action 29: probe_complete probe_complete on wapgw1-log - no waiting
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: run_graph: 
====================================================
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: notice: run_graph: Transition 
1353 (Complete=22, Pending=0, Fired=0, Skipped=13, Incomplete=2, 
Source=/var/lib/pengine/pe-input-1525.bz2): Stopped
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: te_graph_trigger: 
Transition 1353 is now complete
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
do_te_invoke:191 - Triggered transition abort (complete=1) : Peer Cancelled
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3669: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3670: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: process_graph_event: Action 
ilo-wapgw1-2:0_monitor_0 arrived after a completed transition
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
process_graph_event:467 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=ilo-wapgw1-2:0_monitor_0, 
magic=2:1;31:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.9) : 
Inactive graph
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3671: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: process_graph_event: Action 
ilo-wapgw1-log:0_monitor_0 arrived after a completed transition
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
process_graph_event:467 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=ilo-wapgw1-log:0_monitor_0, 
magic=2:1;32:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.10) 
: Inactive graph
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3672: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: process_graph_event: Action 
p-drbd-mdirect1-1:0_monitor_0 arrived after a completed transition
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
process_graph_event:467 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=p-drbd-mdirect1-1:0_monitor_0, 
magic=2:1;33:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.11) 
: Inactive graph
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3673: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: process_graph_event: Action 
p-drbd-mdirect1-2:0_monitor_0 arrived after a completed transition
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
process_graph_event:467 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=p-drbd-mdirect1-2:0_monitor_0, 
magic=2:1;34:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.12) 
: Inactive graph
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3674: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: process_graph_event: Action 
p-drbd-mproxy1-1:0_monitor_0 arrived after a completed transition
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
process_graph_event:467 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=p-drbd-mproxy1-1:0_monitor_0, 
magic=2:1;35:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.13) 
: Inactive graph
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3675: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: process_graph_event: Action 
p-drbd-mproxy1-2:0_monitor_0 arrived after a completed transition
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
process_graph_event:467 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=p-drbd-mproxy1-2:0_monitor_0, 
magic=2:1;36:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.14) 
: Inactive graph
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3676: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: process_graph_event: Action 
p-drbd-mrouter1-1:0_monitor_0 arrived after a completed transition
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
process_graph_event:467 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=p-drbd-mrouter1-1:0_monitor_0, 
magic=2:1;37:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.15) 
: Inactive graph
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3677: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: process_graph_event: Action 
p-drbd-mrouter1-2:0_monitor_0 arrived after a completed transition
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
process_graph_event:467 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=p-drbd-mrouter1-2:0_monitor_0, 
magic=2:1;38:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.16) 
: Inactive graph
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3678: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: process_graph_event: Action 
vm-mdirect1-1_monitor_0 arrived after a completed transition
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
process_graph_event:467 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=vm-mdirect1-1_monitor_0, 
magic=2:1;39:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.17) 
: Inactive graph
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3679: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: process_graph_event: Action 
vm-mdirect1-2_monitor_0 arrived after a completed transition
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
process_graph_event:467 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=vm-mdirect1-2_monitor_0, 
magic=2:1;40:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.18) 
: Inactive graph
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3680: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: process_graph_event: Action 
vm-mproxy1-1_monitor_0 arrived after a completed transition
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
process_graph_event:467 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=vm-mproxy1-1_monitor_0, 
magic=2:1;41:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.19) 
: Inactive graph
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3681: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: process_graph_event: Action 
vm-mproxy1-2_monitor_0 arrived after a completed transition
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
process_graph_event:467 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=vm-mproxy1-2_monitor_0, 
magic=2:1;42:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.20) 
: Inactive graph
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3682: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: process_graph_event: Action 
vm-mrouter1-1_monitor_0 arrived after a completed transition
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
process_graph_event:467 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=vm-mrouter1-1_monitor_0, 
magic=2:1;43:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.21) 
: Inactive graph
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3683: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: process_graph_event: Action 
vm-mrouter1-2_monitor_0 arrived after a completed transition
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
process_graph_event:467 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=vm-mrouter1-2_monitor_0, 
magic=2:1;44:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.22) 
: Inactive graph
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3684: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: process_graph_event: Action 
ip-puppetmaster_monitor_0 arrived after a completed transition
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
process_graph_event:467 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=ip-puppetmaster_monitor_0, 
magic=2:1;45:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.23) 
: Inactive graph
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3685: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: process_graph_event: Action 
ip-logserver_monitor_0 arrived after a completed transition
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
process_graph_event:467 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=ip-logserver_monitor_0, 
magic=2:1;46:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.24) 
: Inactive graph
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3686: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: process_graph_event: Action 
vm-vradius1_monitor_0 arrived after a completed transition
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
process_graph_event:467 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=vm-vradius1_monitor_0, 
magic=2:1;47:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.25) 
: Inactive graph
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3687: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: process_graph_event: Action 
p-drbd-vradius1:0_monitor_0 arrived after a completed transition
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
process_graph_event:467 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=p-drbd-vradius1:0_monitor_0, 
magic=2:1;48:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.26) 
: Inactive graph
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3688: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: process_graph_event: Action 
vm-ppg1_monitor_0 arrived after a completed transition
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
process_graph_event:467 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=vm-ppg1_monitor_0, 
magic=2:1;49:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.27) 
: Inactive graph
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3689: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: process_graph_event: Action 
p-drbd-ppg1:0_monitor_0 arrived after a completed transition
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: abort_transition_graph: 
process_graph_event:467 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=p-drbd-ppg1:0_monitor_0, 
magic=2:1;50:1353:7:22dc5497-478f-49ff-b07f-9fcd6da325cd, cib=0.799.28) 
: Inactive graph
Mar  1 11:17:20 wapgw1-2 crmd: [5749]: info: do_pe_invoke: Query 3690: 
Requesting the current CIB: S_POLICY_ENGINE
Mar  1 11:17:21 wapgw1-2 crmd: [5749]: info: do_pe_invoke_callback: 
Invoking the PE: query=3690, ref=pe_calc-dc-1298967441-1625, seq=3504, 
quorate=1
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: info: unpack_config: Node 
scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: info: determine_online_status: 
Node wapgw1-log is online
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op ip-puppetmaster_monitor_0 on wapgw1-log: unknown 
error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op ilo-wapgw1-log:0_monitor_0 on wapgw1-log: unknown 
error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op p-drbd-mproxy1-2:0_monitor_0 on wapgw1-log: unknown 
error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op p-drbd-mdirect1-1:0_monitor_0 on wapgw1-log: 
unknown error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op p-drbd-mrouter1-1:0_monitor_0 on wapgw1-log: 
unknown error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op p-drbd-ppg1:0_monitor_0 on wapgw1-log: unknown 
error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op ilo-wapgw1-1:0_start_0 on wapgw1-log: unknown error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op ilo-wapgw1-1:0_monitor_0 on wapgw1-log: unknown 
error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op vm-mdirect1-1_monitor_0 on wapgw1-log: unknown 
error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op vm-mproxy1-1_monitor_0 on wapgw1-log: unknown error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op vm-mdirect1-2_monitor_0 on wapgw1-log: unknown 
error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op vm-mproxy1-2_monitor_0 on wapgw1-log: unknown error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op vm-mrouter1-1_monitor_0 on wapgw1-log: unknown 
error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op p-drbd-mdirect1-2:0_monitor_0 on wapgw1-log: 
unknown error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op p-drbd-mrouter1-2:0_monitor_0 on wapgw1-log: 
unknown error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op vm-mrouter1-2_monitor_0 on wapgw1-log: unknown 
error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op p-drbd-mproxy1-1:0_monitor_0 on wapgw1-log: unknown 
error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op vm-vradius1_monitor_0 on wapgw1-log: unknown error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op vm-ppg1_monitor_0 on wapgw1-log: unknown error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op ilo-wapgw1-2:0_start_0 on wapgw1-log: unknown error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op ilo-wapgw1-2:0_monitor_0 on wapgw1-log: unknown 
error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op ip-logserver_monitor_0 on wapgw1-log: unknown error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op p-drbd-vradius1:0_monitor_0 on wapgw1-log: unknown 
error (1)
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: info: determine_online_status: 
Node wapgw1-1 is online
Mar  1 11:17:21 wapgw1-2 pengine: [5748]: WARN: unpack_rsc_op: 
Processing failed op ilo-wapgw1-log:1_start_0 on wapgw1-1: unknown error (1)

This is the part of code in te_callbacks.c which is responsible for this:

===============
gboolean
action_timer_callback(gpointer data)
{
         crm_action_timer_t *timer = NULL;

         CRM_CHECK(data != NULL, return FALSE);

         timer = (crm_action_timer_t*)data;
         stop_te_timer(timer);
         crm_warn("Timer popped (timeout=%d, abort_level=%d, complete=%s)",
                  timer->timeout,
                  transition_graph->abort_priority,
                  transition_graph->complete?"true":"false");

         CRM_CHECK(timer->action != NULL, return FALSE);

         if(transition_graph->complete) {
                 crm_warn("Ignoring timeout while not in transition");

         } else if(timer->reason == timeout_action_warn) {
                 print_action(
                         LOG_WARNING,"Action missed its timeout: ", 
timer->action);

         /* Don't check the FSA state
          *
          * We might also be in S_INTEGRATION or some other state 
waiting for this
          * action so we can close the transition and continue
          */

         } else {
             /* fail the action */
             gboolean send_update = TRUE;
             const char *task = crm_element_value(timer->action->xml, 
XML_LRM_ATTR_TASK);
             print_action(LOG_ERR, "Aborting transition, action lost: ", 
timer->action);

             timer->action->failed = TRUE;
             timer->action->confirmed = TRUE;
             abort_transition(INFINITY, tg_restart, "Action lost", NULL);

             update_graph(transition_graph, timer->action);
             trigger_graph();

             if(timer->action->type != action_type_rsc) {
                 send_update = FALSE;
             } else if(safe_str_eq(task, "cancel")) {
                 /* we dont need to update the CIB with these */
                 send_update = FALSE;
             }

             if(send_update) {
                 /* cib_action_update(timer->action, LRM_OP_PENDING, 
EXECRA_STATUS_UNKNOWN); */
                 cib_action_update(timer->action, LRM_OP_TIMEOUT, 
EXECRA_UNKNOWN_ERROR);
             }
         }
==========

CIB had been updated with EXECRA_UNKNOWN_ERROR, and so on.


>  Either remove the RA, or make sure it returns something sensible when
>  tools or configuration it needs are not available.

This is what I mean by "error-prone". Such RA may appear again from fresh RPM. And errors in RAs just happen.

OK, I see, there is a way: I could copy each RA to the new location (like ocf:safe:VirtualDomain), so they will not be touched by RPMS.

I could even give each resource it's own RA, such as VirtualDomain-X, VirtualDomain-Y and so on, and place them only on those nodes where resource can run.

I only think it is not the best possible way to go.

>  No.  For safety we still need to verify that X is not running on node
>  C before we allow it to be active anywhere else.
>  That you know the X is unavailable on C is one thing, but the cluster
>  needs to know too.

Therefore, I propose an addition to the Pacemaker: a way to tell the cluster that resource X cannot be executed on node C. Currently, it is done through status section of the CIB. I wish there was a way to do the same via configuration. Then the cluster could get rid of quirks with unneeded RAs.

Maybe anyone will support my proposal?


--
Pavel Levshin //flicker






More information about the Pacemaker mailing list