[ClusterLabs] Cluster failover failure with Unresolved dependency
Lorand Kelemen
lorand.kelemen at gmail.com
Wed Mar 16 10:49:58 UTC 2016
Dear Ken,
Thanks for the reply! I lowered migration-threshold to 1 and rearranged
contraints like you suggested:
Location Constraints:
Ordering Constraints:
promote mail-clone then start fs-services (kind:Mandatory)
promote spool-clone then start fs-services (kind:Mandatory)
start fs-services then start network-services (kind:Mandatory)
start network-services then start mail-services (kind:Mandatory)
Colocation Constraints:
fs-services with spool-clone (score:INFINITY) (rsc-role:Started)
(with-rsc-role:Master)
fs-services with mail-clone (score:INFINITY) (rsc-role:Started)
(with-rsc-role:Master)
network-services with mail-services (score:INFINITY)
mail-services with fs-services (score:INFINITY)
Now virtualip and postfix becomes stopped, I guess these are relevant but I
attach also full logs:
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
native_color: Resource postfix cannot run anywhere
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
native_color: Resource virtualip-1 cannot run anywhere
Interesting, will try to play around with ordering - colocation, the
solution must be in these settings...
Best regards,
Lorand
Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: --- 0.215.7 2
Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: +++ 0.215.8 (null)
Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: + /cib: @num_updates=8
Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: ++
/cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']:
<lrm_rsc_op id="postfix_last_failure_0"
operation_key="postfix_monitor_45000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="86:2962:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:7;86:2962:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail1" call-id="1333" rc-code="7"
Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info:
abort_transition_graph: Transition aborted by postfix_monitor_45000
'create' on mail1: Inactive graph
(magic=0:7;86:2962:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, cib=0.215.8,
source=process_graph_event:598, 1)
Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info:
update_failcount: Updating failcount for postfix on mail1 after failed
monitor: rc=7 (update=value++, time=1458124686)
Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info:
process_graph_event: Detected action (2962.86)
postfix_monitor_45000.1333=not running: failed
Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
attrd_client_update: Expanded fail-count-postfix=value++ to 1
Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
cib_process_request: Completed cib_modify operation for section status: OK
(rc=0, origin=mail1/crmd/253, version=0.215.8)
Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
attrd_peer_update: Setting fail-count-postfix[mail1]: (null) -> 1 from
mail2
Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: notice:
do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [
input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
write_attribute: Sent update 406 with 2 changes for
fail-count-postfix, id=<n/a>, set=(null)
Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
attrd_peer_update: Setting last-failure-postfix[mail1]: 1458124291 ->
1458124686 from mail2
Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
write_attribute: Sent update 407 with 2 changes for
last-failure-postfix, id=<n/a>, set=(null)
Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
cib_process_request: Forwarding cib_modify operation for section status to
master (origin=local/attrd/406)
Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
cib_process_request: Forwarding cib_modify operation for section status to
master (origin=local/attrd/407)
Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: --- 0.215.8 2
Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: +++ 0.215.9 (null)
Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: + /cib: @num_updates=9
Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: ++
/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']:
<nvpair id="status-1-fail-count-postfix" name="fail-count-postfix"
value="1"/>
Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
cib_process_request: Completed cib_modify operation for section status: OK
(rc=0, origin=mail2/attrd/406, version=0.215.9)
Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: --- 0.215.9 2
Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: +++ 0.215.10 (null)
Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: + /cib: @num_updates=10
Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: +
/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix']:
@value=1458124686
Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
attrd_cib_callback: Update 406 for fail-count-postfix: OK (0)
Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
attrd_cib_callback: Update 406 for fail-count-postfix[mail1]=1: OK (0)
Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
cib_process_request: Completed cib_modify operation for section status: OK
(rc=0, origin=mail2/attrd/407, version=0.215.10)
Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
attrd_cib_callback: Update 406 for fail-count-postfix[mail2]=(null): OK
(0)
Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
attrd_cib_callback: Update 407 for last-failure-postfix: OK (0)
Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
attrd_cib_callback: Update 407 for
last-failure-postfix[mail1]=1458124686: OK (0)
Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
attrd_cib_callback: Update 407 for
last-failure-postfix[mail2]=1457610376: OK (0)
Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info:
abort_transition_graph: Transition aborted by
status-1-fail-count-postfix, fail-count-postfix=1: Transient attribute
change (create cib=0.215.9, source=abort_unless_down:319,
path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1'],
1)
Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info:
abort_transition_graph: Transition aborted by
status-1-last-failure-postfix, last-failure-postfix=1458124686: Transient
attribute change (modify cib=0.215.10, source=abort_unless_down:319,
path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix'],
1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice:
unpack_config: On loss of CCM Quorum: Ignore
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
determine_online_status: Node mail1 is online
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
determine_online_status: Node mail2 is online
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource mail:0 active in
master mode on mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource spool:0 active in
master mode on mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource fs-spool active on
mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource fs-spool active on
mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource fs-mail active on
mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource fs-mail active on
mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: warning:
unpack_rsc_op_failure: Processing failed op monitor for postfix on
mail1: not running (7)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource spool:1 active in
master mode on mail2
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource mail:1 active in
master mode on mail2
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
group_print: Resource Group: network-services
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
native_print: virtualip-1 (ocf::heartbeat:IPaddr2): Started
mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
clone_print: Master/Slave Set: spool-clone [spool]
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
short_print: Masters: [ mail1 ]
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
short_print: Slaves: [ mail2 ]
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
clone_print: Master/Slave Set: mail-clone [mail]
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
short_print: Masters: [ mail1 ]
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
short_print: Slaves: [ mail2 ]
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
group_print: Resource Group: fs-services
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
native_print: fs-spool (ocf::heartbeat:Filesystem): Started mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
native_print: fs-mail (ocf::heartbeat:Filesystem): Started mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
group_print: Resource Group: mail-services
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
native_print: postfix (ocf::heartbeat:postfix): FAILED mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
master_color: Promoting mail:0 (Master mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
master_color: mail-clone: Promoted 1 instances of a possible 1 to master
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
master_color: Promoting spool:0 (Master mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
master_color: spool-clone: Promoted 1 instances of a possible 1 to master
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
RecurringOp: Start recurring monitor (45s) for postfix on mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave virtualip-1 (Started mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave spool:0 (Master mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave spool:1 (Slave mail2)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave mail:0 (Master mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave mail:1 (Slave mail2)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave fs-spool (Started mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave fs-mail (Started mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice:
LogActions: Recover postfix (Started mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice:
process_pe_message: Calculated Transition 2963:
/var/lib/pacemaker/pengine/pe-input-330.bz2
Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info:
handle_response: pe_calc calculation pe_calc-dc-1458124686-5541 is
obsolete
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice:
unpack_config: On loss of CCM Quorum: Ignore
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
determine_online_status: Node mail1 is online
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
determine_online_status: Node mail2 is online
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource mail:0 active in
master mode on mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource spool:0 active in
master mode on mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource fs-spool active on
mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource fs-spool active on
mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource fs-mail active on
mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource fs-mail active on
mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: warning:
unpack_rsc_op_failure: Processing failed op monitor for postfix on
mail1: not running (7)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource spool:1 active in
master mode on mail2
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource mail:1 active in
master mode on mail2
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
group_print: Resource Group: network-services
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
native_print: virtualip-1 (ocf::heartbeat:IPaddr2): Started
mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
clone_print: Master/Slave Set: spool-clone [spool]
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
short_print: Masters: [ mail1 ]
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
short_print: Slaves: [ mail2 ]
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
clone_print: Master/Slave Set: mail-clone [mail]
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
short_print: Masters: [ mail1 ]
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
short_print: Slaves: [ mail2 ]
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
group_print: Resource Group: fs-services
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
native_print: fs-spool (ocf::heartbeat:Filesystem): Started mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
native_print: fs-mail (ocf::heartbeat:Filesystem): Started mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
group_print: Resource Group: mail-services
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
native_print: postfix (ocf::heartbeat:postfix): FAILED mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
get_failcount_full: postfix has failed 1 times on mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: warning:
common_apply_stickiness: Forcing postfix away from mail1 after 1
failures (max=1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
master_color: Promoting mail:0 (Master mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
master_color: mail-clone: Promoted 1 instances of a possible 1 to master
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
master_color: Promoting spool:0 (Master mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
master_color: spool-clone: Promoted 1 instances of a possible 1 to master
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
rsc_merge_weights: fs-mail: Rolling back scores from postfix
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
rsc_merge_weights: postfix: Rolling back scores from virtualip-1
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
native_color: Resource postfix cannot run anywhere
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
native_color: Resource virtualip-1 cannot run anywhere
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice:
LogActions: Stop virtualip-1 (mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave spool:0 (Master mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave spool:1 (Slave mail2)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave mail:0 (Master mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave mail:1 (Slave mail2)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave fs-spool (Started mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave fs-mail (Started mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice:
LogActions: Stop postfix (mail1)
Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info:
do_state_transition: State transition S_POLICY_ENGINE ->
S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
origin=handle_response ]
Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice:
process_pe_message: Calculated Transition 2964:
/var/lib/pacemaker/pengine/pe-input-331.bz2
Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info:
do_te_invoke: Processing graph 2964 (ref=pe_calc-dc-1458124686-5542)
derived from /var/lib/pacemaker/pengine/pe-input-331.bz2
Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: notice:
te_rsc_command: Initiating action 5: stop postfix_stop_0 on mail1
Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: --- 0.215.10 2
Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: +++ 0.215.11 (null)
Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: + /cib: @num_updates=11
Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: +
/cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']/lrm_rsc_op[@id='postfix_last_0']:
@operation_key=postfix_stop_0, @operation=stop,
@transition-key=5:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
@transition-magic=0:0;5:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
@call-id=1335, @last-run=1458124686, @last-rc-change=1458124686,
@exec-time=435
Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: info:
match_graph_event: Action postfix_stop_0 (5) confirmed on mail1 (rc=0)
Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info:
cib_process_request: Completed cib_modify operation for section status: OK
(rc=0, origin=mail1/crmd/254, version=0.215.11)
Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: notice:
te_rsc_command: Initiating action 12: stop virtualip-1_stop_0 on mail1
Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: --- 0.215.11 2
Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: +++ 0.215.12 (null)
Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: + /cib: @num_updates=12
Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: +
/cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='virtualip-1']/lrm_rsc_op[@id='virtualip-1_last_0']:
@operation_key=virtualip-1_stop_0, @operation=stop,
@transition-key=12:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
@transition-magic=0:0;12:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
@call-id=1337, @last-run=1458124687, @last-rc-change=1458124687,
@exec-time=56
Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: info:
match_graph_event: Action virtualip-1_stop_0 (12) confirmed on mail1
(rc=0)
Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info:
cib_process_request: Completed cib_modify operation for section status: OK
(rc=0, origin=mail1/crmd/255, version=0.215.12)
Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: notice:
run_graph: Transition 2964 (Complete=7, Pending=0, Fired=0, Skipped=0,
Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-331.bz2): Complete
Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: info: do_log:
FSA: Input I_TE_SUCCESS from notify_crmd() received in state
S_TRANSITION_ENGINE
Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: notice:
do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [
input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Mar 16 11:38:12 [7415] HWJ-626.domain.local cib: info:
cib_process_ping: Reporting our current digest to mail2:
ed43bc3ecf0f15853900ba49fc514870 for 0.215.12 (0x152b110 0)
On Mon, Mar 14, 2016 at 6:44 PM, Ken Gaillot <kgaillot at redhat.com> wrote:
> On 03/10/2016 09:49 AM, Lorand Kelemen wrote:
> > Dear List,
> >
> > After the creation and testing of a simple 2 node active-passive
> > drbd+postfix cluster nearly everything works flawlessly (standby, failure
> > of a filesystem resource + failover, splitbrain + manual recovery)
> however
> > when delibarately killing the postfix instance, after reaching the
> > migration threshold failover does not occur and resources revert to the
> > Stopped state (except the master-slave drbd resource, which works as
> > expected).
> >
> > Ordering and colocation is configured, STONITH and quorum disabled, the
> > goal is to always have one node running all the resources and at any sign
> > of error it should fail over to the passive node, nothing fancy.
> >
> > Is my configuration wrong or am I hitting a bug?
> >
> > All software from centos 7 + elrepo repositories.
>
> With these versions, you can set "two_node: 1" in
> /etc/corosync/corosync.conf (which will be done automatically if you
> used "pcs cluster setup" initially), and then you don't need to ignore
> quorum in pacemaker.
>
> > Regarding STONITH: the machines are running on free ESXi instances on
> > separate machines, so the Vmware fencing agents won't work because in the
> > free version the API is read-only.
> > Still trying to figure out a way to go, until then manual recovery + huge
> > arp cache times on the upstream firewall...
> >
> > Please find pe-input*.bz files attached, logs and config below. The
> > situation: on node mail1 postfix was killed 3 times (migration
> threshold),
> > it should have failed over to mail2.
> > When killing a filesystem resource three times this happens flawlessly.
> >
> > Thanks for your input!
> >
> > Best regards,
> > Lorand
> >
> >
> > Cluster Name: mailcluster
> > Corosync Nodes:
> > mail1 mail2
> > Pacemaker Nodes:
> > mail1 mail2
> >
> > Resources:
> > Group: network-services
> > Resource: virtualip-1 (class=ocf provider=heartbeat type=IPaddr2)
> > Attributes: ip=10.20.64.10 cidr_netmask=24 nic=lan0
> > Operations: start interval=0s timeout=20s
> (virtualip-1-start-interval-0s)
> > stop interval=0s timeout=20s
> (virtualip-1-stop-interval-0s)
> > monitor interval=30s (virtualip-1-monitor-interval-30s)
> > Master: spool-clone
> > Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> > notify=true
> > Resource: spool (class=ocf provider=linbit type=drbd)
> > Attributes: drbd_resource=spool
> > Operations: start interval=0s timeout=240 (spool-start-interval-0s)
> > promote interval=0s timeout=90 (spool-promote-interval-0s)
> > demote interval=0s timeout=90 (spool-demote-interval-0s)
> > stop interval=0s timeout=100 (spool-stop-interval-0s)
> > monitor interval=10s (spool-monitor-interval-10s)
> > Master: mail-clone
> > Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> > notify=true
> > Resource: mail (class=ocf provider=linbit type=drbd)
> > Attributes: drbd_resource=mail
> > Operations: start interval=0s timeout=240 (mail-start-interval-0s)
> > promote interval=0s timeout=90 (mail-promote-interval-0s)
> > demote interval=0s timeout=90 (mail-demote-interval-0s)
> > stop interval=0s timeout=100 (mail-stop-interval-0s)
> > monitor interval=10s (mail-monitor-interval-10s)
> > Group: fs-services
> > Resource: fs-spool (class=ocf provider=heartbeat type=Filesystem)
> > Attributes: device=/dev/drbd0 directory=/var/spool/postfix fstype=ext4
> > options=nodev,nosuid,noexec
> > Operations: start interval=0s timeout=60 (fs-spool-start-interval-0s)
> > stop interval=0s timeout=60 (fs-spool-stop-interval-0s)
> > monitor interval=20 timeout=40
> (fs-spool-monitor-interval-20)
> > Resource: fs-mail (class=ocf provider=heartbeat type=Filesystem)
> > Attributes: device=/dev/drbd1 directory=/var/spool/mail fstype=ext4
> > options=nodev,nosuid,noexec
> > Operations: start interval=0s timeout=60 (fs-mail-start-interval-0s)
> > stop interval=0s timeout=60 (fs-mail-stop-interval-0s)
> > monitor interval=20 timeout=40
> (fs-mail-monitor-interval-20)
> > Group: mail-services
> > Resource: postfix (class=ocf provider=heartbeat type=postfix)
> > Operations: start interval=0s timeout=20s (postfix-start-interval-0s)
> > stop interval=0s timeout=20s (postfix-stop-interval-0s)
> > monitor interval=45s (postfix-monitor-interval-45s)
> >
> > Stonith Devices:
> > Fencing Levels:
> >
> > Location Constraints:
> > Ordering Constraints:
> > start network-services then promote mail-clone (kind:Mandatory)
> > (id:order-network-services-mail-clone-mandatory)
> > promote mail-clone then promote spool-clone (kind:Mandatory)
> > (id:order-mail-clone-spool-clone-mandatory)
> > promote spool-clone then start fs-services (kind:Mandatory)
> > (id:order-spool-clone-fs-services-mandatory)
> > start fs-services then start mail-services (kind:Mandatory)
> > (id:order-fs-services-mail-services-mandatory)
> > Colocation Constraints:
> > network-services with spool-clone (score:INFINITY) (rsc-role:Started)
> > (with-rsc-role:Master)
> (id:colocation-network-services-spool-clone-INFINITY)
> > network-services with mail-clone (score:INFINITY) (rsc-role:Started)
> > (with-rsc-role:Master)
> (id:colocation-network-services-mail-clone-INFINITY)
> > network-services with fs-services (score:INFINITY)
> > (id:colocation-network-services-fs-services-INFINITY)
> > network-services with mail-services (score:INFINITY)
> > (id:colocation-network-services-mail-services-INFINITY)
>
> I'm not sure whether it's causing your issue, but I would make the
> constraints reflect the logical relationships better.
>
> For example, network-services only needs to be colocated with
> mail-services logically; it's mail-services that needs to be with
> fs-services, and fs-services that needs to be with
> spool-clone/mail-clone master. In other words, don't make the
> highest-level resource depend on everything else, make each level depend
> on the level below it.
>
> Also, I would guess that the virtual IP only needs to be ordered before
> mail-services, and mail-clone and spool-clone could both be ordered
> before fs-services, rather than ordering mail-clone before spool-clone.
>
> > Resources Defaults:
> > migration-threshold: 3
> > Operations Defaults:
> > on-fail: restart
> >
> > Cluster Properties:
> > cluster-infrastructure: corosync
> > cluster-name: mailcluster
> > cluster-recheck-interval: 5min
> > dc-version: 1.1.13-10.el7_2.2-44eb2dd
> > default-resource-stickiness: infinity
> > have-watchdog: false
> > last-lrm-refresh: 1457613674
> > no-quorum-policy: ignore
> > pe-error-series-max: 1024
> > pe-input-series-max: 1024
> > pe-warn-series-max: 1024
> > stonith-enabled: false
> >
> >
> >
> >
> >
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: Diff: --- 0.197.15 2
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: Diff: +++ 0.197.16 (null)
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: + /cib: @num_updates=16
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: +
> >
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']/lrm_rsc_op[@id='postfix_last_failure_0']:
> > @transition-key=4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> > @transition-magic=0:7;4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> > @call-id=1274, @last-rc-change=1457613440
> > Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info:
> > abort_transition_graph: Transition aborted by postfix_monitor_45000
> > 'modify' on mail1: Inactive graph
> > (magic=0:7;4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, cib=0.197.16,
> > source=process_graph_event:598, 1)
> > Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info:
> > update_failcount: Updating failcount for postfix on mail1 after
> failed
> > monitor: rc=7 (update=value++, time=1457613440)
> > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
> > attrd_client_update: Expanded fail-count-postfix=value++ to 3
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> > cib_process_request: Completed cib_modify operation for section status:
> OK
> > (rc=0, origin=mail1/crmd/196, version=0.197.16)
> > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
> > attrd_peer_update: Setting fail-count-postfix[mail1]: 2 -> 3 from
> mail2
> > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
> > write_attribute: Sent update 400 with 2 changes for
> > fail-count-postfix, id=<n/a>, set=(null)
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> > cib_process_request: Forwarding cib_modify operation for section status
> to
> > master (origin=local/attrd/400)
> > Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info:
> > process_graph_event: Detected action (1234.4)
> > postfix_monitor_45000.1274=not running: failed
> > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
> > attrd_peer_update: Setting last-failure-postfix[mail1]: 1457613347 ->
> > 1457613440 from mail2
> > Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: notice:
> > do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [
> > input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
> > write_attribute: Sent update 401 with 2 changes for
> > last-failure-postfix, id=<n/a>, set=(null)
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: Diff: --- 0.197.16 2
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: Diff: +++ 0.197.17 (null)
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: + /cib: @num_updates=17
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: +
> >
> /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-fail-count-postfix']:
> > @value=3
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> > cib_process_request: Completed cib_modify operation for section status:
> OK
> > (rc=0, origin=mail2/attrd/400, version=0.197.17)
> > Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info:
> > abort_transition_graph: Transition aborted by
> > status-1-fail-count-postfix, fail-count-postfix=3: Transient attribute
> > change (modify cib=0.197.17, source=abort_unless_down:319,
> >
> path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-fail-count-postfix'],
> > 1)
> > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
> > attrd_cib_callback: Update 400 for fail-count-postfix: OK (0)
> > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
> > attrd_cib_callback: Update 400 for fail-count-postfix[mail1]=3: OK (0)
> > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
> > attrd_cib_callback: Update 400 for fail-count-postfix[mail2]=(null): OK
> > (0)
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_process_request: Forwarding cib_modify operation for section status
> to
> > master (origin=local/attrd/401)
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: Diff: --- 0.197.17 2
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: Diff: +++ 0.197.18 (null)
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: + /cib: @num_updates=18
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: +
> >
> /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix']:
> > @value=1457613440
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_process_request: Completed cib_modify operation for section status:
> OK
> > (rc=0, origin=mail2/attrd/401, version=0.197.18)
> > Mar 10 13:37:21 [7418] HWJ-626.domain.local attrd: info:
> > attrd_cib_callback: Update 401 for last-failure-postfix: OK (0)
> > Mar 10 13:37:21 [7418] HWJ-626.domain.local attrd: info:
> > attrd_cib_callback: Update 401 for
> > last-failure-postfix[mail1]=1457613440: OK (0)
> > Mar 10 13:37:21 [7418] HWJ-626.domain.local attrd: info:
> > attrd_cib_callback: Update 401 for
> > last-failure-postfix[mail2]=1457610376: OK (0)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
> > abort_transition_graph: Transition aborted by
> > status-1-last-failure-postfix, last-failure-postfix=1457613440: Transient
> > attribute change (modify cib=0.197.18, source=abort_unless_down:319,
> >
> path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix'],
> > 1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> > unpack_config: On loss of CCM Quorum: Ignore
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > determine_online_status: Node mail1 is online
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > determine_online_status: Node mail2 is online
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > determine_op_status: Operation monitor found resource mail:0 active in
> > master mode on mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > determine_op_status: Operation monitor found resource spool:0 active in
> > master mode on mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > determine_op_status: Operation monitor found resource fs-spool active on
> > mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > determine_op_status: Operation monitor found resource fs-mail active on
> > mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning:
> > unpack_rsc_op_failure: Processing failed op monitor for postfix on
> > mail1: not running (7)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > determine_op_status: Operation monitor found resource spool:1 active in
> > master mode on mail2
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > determine_op_status: Operation monitor found resource mail:1 active in
> > master mode on mail2
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > group_print: Resource Group: network-services
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > native_print: virtualip-1 (ocf::heartbeat:IPaddr2):
> Started
> > mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > clone_print: Master/Slave Set: spool-clone [spool]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > short_print: Masters: [ mail1 ]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > short_print: Slaves: [ mail2 ]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > clone_print: Master/Slave Set: mail-clone [mail]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > short_print: Masters: [ mail1 ]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > short_print: Slaves: [ mail2 ]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > group_print: Resource Group: fs-services
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > native_print: fs-spool (ocf::heartbeat:Filesystem): Started
> mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > native_print: fs-mail (ocf::heartbeat:Filesystem): Started
> mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > group_print: Resource Group: mail-services
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > native_print: postfix (ocf::heartbeat:postfix): FAILED
> mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > get_failcount_full: postfix has failed 3 times on mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning:
> > common_apply_stickiness: Forcing postfix away from mail1 after 3
> > failures (max=3)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > master_color: Promoting mail:0 (Master mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > master_color: mail-clone: Promoted 1 instances of a possible 1 to master
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > master_color: Promoting spool:0 (Master mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > master_color: spool-clone: Promoted 1 instances of a possible 1 to master
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > rsc_merge_weights: postfix: Rolling back scores from virtualip-1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > native_color: Resource virtualip-1 cannot run anywhere
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > RecurringOp: Start recurring monitor (45s) for postfix on mail2
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> > LogActions: Stop virtualip-1 (mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > LogActions: Leave spool:0 (Master mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > LogActions: Leave spool:1 (Slave mail2)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > LogActions: Leave mail:0 (Master mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > LogActions: Leave mail:1 (Slave mail2)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> > LogActions: Stop fs-spool (Started mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> > LogActions: Stop fs-mail (Started mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> > LogActions: Stop postfix (Started mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> > process_pe_message: Calculated Transition 1235:
> > /var/lib/pacemaker/pengine/pe-input-302.bz2
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
> > handle_response: pe_calc calculation pe_calc-dc-1457613441-3756 is
> > obsolete
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> > unpack_config: On loss of CCM Quorum: Ignore
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > determine_online_status: Node mail1 is online
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > determine_online_status: Node mail2 is online
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > determine_op_status: Operation monitor found resource mail:0 active in
> > master mode on mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > determine_op_status: Operation monitor found resource spool:0 active in
> > master mode on mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > determine_op_status: Operation monitor found resource fs-spool active on
> > mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > determine_op_status: Operation monitor found resource fs-mail active on
> > mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning:
> > unpack_rsc_op_failure: Processing failed op monitor for postfix on
> > mail1: not running (7)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > determine_op_status: Operation monitor found resource spool:1 active in
> > master mode on mail2
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > determine_op_status: Operation monitor found resource mail:1 active in
> > master mode on mail2
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > group_print: Resource Group: network-services
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > native_print: virtualip-1 (ocf::heartbeat:IPaddr2):
> Started
> > mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > clone_print: Master/Slave Set: spool-clone [spool]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > short_print: Masters: [ mail1 ]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > short_print: Slaves: [ mail2 ]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > clone_print: Master/Slave Set: mail-clone [mail]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > short_print: Masters: [ mail1 ]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > short_print: Slaves: [ mail2 ]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > group_print: Resource Group: fs-services
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > native_print: fs-spool (ocf::heartbeat:Filesystem): Started
> mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > native_print: fs-mail (ocf::heartbeat:Filesystem): Started
> mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > group_print: Resource Group: mail-services
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > native_print: postfix (ocf::heartbeat:postfix): FAILED
> mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > get_failcount_full: postfix has failed 3 times on mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning:
> > common_apply_stickiness: Forcing postfix away from mail1 after 3
> > failures (max=3)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > master_color: Promoting mail:0 (Master mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > master_color: mail-clone: Promoted 1 instances of a possible 1 to master
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > master_color: Promoting spool:0 (Master mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > master_color: spool-clone: Promoted 1 instances of a possible 1 to master
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > rsc_merge_weights: postfix: Rolling back scores from virtualip-1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > native_color: Resource virtualip-1 cannot run anywhere
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > RecurringOp: Start recurring monitor (45s) for postfix on mail2
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> > LogActions: Stop virtualip-1 (mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > LogActions: Leave spool:0 (Master mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > LogActions: Leave spool:1 (Slave mail2)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > LogActions: Leave mail:0 (Master mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> > LogActions: Leave mail:1 (Slave mail2)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> > LogActions: Stop fs-spool (Started mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> > LogActions: Stop fs-mail (Started mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> > LogActions: Stop postfix (Started mail1)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
> > do_state_transition: State transition S_POLICY_ENGINE ->
> > S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
> > origin=handle_response ]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> > process_pe_message: Calculated Transition 1236:
> > /var/lib/pacemaker/pengine/pe-input-303.bz2
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
> > do_te_invoke: Processing graph 1236 (ref=pe_calc-dc-1457613441-3757)
> > derived from /var/lib/pacemaker/pengine/pe-input-303.bz2
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> > te_rsc_command: Initiating action 12: stop virtualip-1_stop_0 on
> mail1
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> > te_rsc_command: Initiating action 5: stop postfix_stop_0 on mail1
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: Diff: --- 0.197.18 2
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: Diff: +++ 0.197.19 (null)
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: + /cib: @num_updates=19
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: +
> >
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='virtualip-1']/lrm_rsc_op[@id='virtualip-1_last_0']:
> > @operation_key=virtualip-1_stop_0, @operation=stop,
> > @transition-key=12:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> > @transition-magic=0:0;12:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> > @call-id=1276, @last-run=1457613441, @last-rc-change=1457613441,
> > @exec-time=66
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_process_request: Completed cib_modify operation for section status:
> OK
> > (rc=0, origin=mail1/crmd/197, version=0.197.19)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
> > match_graph_event: Action virtualip-1_stop_0 (12) confirmed on mail1
> > (rc=0)
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: Diff: --- 0.197.19 2
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: Diff: +++ 0.197.20 (null)
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: + /cib: @num_updates=20
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: +
> >
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']/lrm_rsc_op[@id='postfix_last_0']:
> > @operation_key=postfix_stop_0, @operation=stop,
> > @transition-key=5:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> > @transition-magic=0:0;5:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> > @call-id=1278, @last-run=1457613441, @last-rc-change=1457613441,
> > @exec-time=476
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
> > match_graph_event: Action postfix_stop_0 (5) confirmed on mail1 (rc=0)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> > te_rsc_command: Initiating action 79: stop fs-mail_stop_0 on mail1
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_process_request: Completed cib_modify operation for section status:
> OK
> > (rc=0, origin=mail1/crmd/198, version=0.197.20)
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: Diff: --- 0.197.20 2
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: Diff: +++ 0.197.21 (null)
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: + /cib: @num_updates=21
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: +
> >
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='fs-mail']/lrm_rsc_op[@id='fs-mail_last_0']:
> > @operation_key=fs-mail_stop_0, @operation=stop,
> > @transition-key=79:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> > @transition-magic=0:0;79:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> > @call-id=1280, @last-run=1457613441, @last-rc-change=1457613441,
> > @exec-time=88, @queue-time=1
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_process_request: Completed cib_modify operation for section status:
> OK
> > (rc=0, origin=mail1/crmd/199, version=0.197.21)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
> > match_graph_event: Action fs-mail_stop_0 (79) confirmed on mail1
> (rc=0)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> > te_rsc_command: Initiating action 77: stop fs-spool_stop_0 on mail1
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: Diff: --- 0.197.21 2
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: Diff: +++ 0.197.22 (null)
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: + /cib: @num_updates=22
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_perform_op: +
> >
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='fs-spool']/lrm_rsc_op[@id='fs-spool_last_0']:
> > @operation_key=fs-spool_stop_0, @operation=stop,
> > @transition-key=77:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> > @transition-magic=0:0;77:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> > @call-id=1282, @last-run=1457613441, @last-rc-change=1457613441,
> > @exec-time=86
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> > cib_process_request: Completed cib_modify operation for section status:
> OK
> > (rc=0, origin=mail1/crmd/200, version=0.197.22)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
> > match_graph_event: Action fs-spool_stop_0 (77) confirmed on mail1
> (rc=0)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: warning:
> > run_graph: Transition 1236 (Complete=11, Pending=0, Fired=0,
> Skipped=0,
> > Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-303.bz2):
> > Terminated
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: warning:
> > te_graph_trigger: Transition failed: terminated
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> > print_graph: Graph 1236 with 12 actions: batch-limit=12 jobs,
> > network-delay=0ms
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> > print_synapse: [Action 16]: Completed pseudo op
> > network-services_stopped_0 on N/A (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> > print_synapse: [Action 15]: Completed pseudo op
> > network-services_stop_0 on N/A (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> > print_synapse: [Action 12]: Completed rsc op virtualip-1_stop_0
> > on mail1 (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> > print_synapse: [Action 84]: Completed pseudo op
> > fs-services_stopped_0 on N/A (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> > print_synapse: [Action 83]: Completed pseudo op
> fs-services_stop_0
> > on N/A (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> > print_synapse: [Action 77]: Completed rsc op fs-spool_stop_0
> > on mail1 (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> > print_synapse: [Action 79]: Completed rsc op fs-mail_stop_0
> > on mail1 (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> > print_synapse: [Action 90]: Completed pseudo op
> > mail-services_stopped_0 on N/A (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> > print_synapse: [Action 89]: Completed pseudo op
> > mail-services_stop_0 on N/A (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> > print_synapse: [Action 86]: Pending rsc op postfix_monitor_45000
> > on mail2 (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> > print_synapse: * [Input 85]: Unresolved dependency rsc op
> > postfix_start_0 on mail2
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> > print_synapse: [Action 5]: Completed rsc op postfix_stop_0
> > on mail1 (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> > print_synapse: [Action 8]: Completed pseudo op all_stopped
> > on N/A (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: do_log:
> > FSA: Input I_TE_SUCCESS from notify_crmd() received in state
> > S_TRANSITION_ENGINE
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> > do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [
> > input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> > Mar 10 13:37:26 [7415] HWJ-626.domain.local cib: info:
> > cib_process_ping: Reporting our current digest to mail2:
> > 3896ee29cdb6ba128330b0ef6e41bd79 for 0.197.22 (0x1544a30 0)
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160316/423c4da6/attachment.htm>
More information about the Users
mailing list