[ClusterLabs] Cluster failover failure with Unresolved dependency
Lorand Kelemen
lorand.kelemen at gmail.com
Thu Mar 10 15:49:29 UTC 2016
Dear List,
After the creation and testing of a simple 2 node active-passive
drbd+postfix cluster nearly everything works flawlessly (standby, failure
of a filesystem resource + failover, splitbrain + manual recovery) however
when delibarately killing the postfix instance, after reaching the
migration threshold failover does not occur and resources revert to the
Stopped state (except the master-slave drbd resource, which works as
expected).
Ordering and colocation is configured, STONITH and quorum disabled, the
goal is to always have one node running all the resources and at any sign
of error it should fail over to the passive node, nothing fancy.
Is my configuration wrong or am I hitting a bug?
All software from centos 7 + elrepo repositories.
Regarding STONITH: the machines are running on free ESXi instances on
separate machines, so the Vmware fencing agents won't work because in the
free version the API is read-only.
Still trying to figure out a way to go, until then manual recovery + huge
arp cache times on the upstream firewall...
Please find pe-input*.bz files attached, logs and config below. The
situation: on node mail1 postfix was killed 3 times (migration threshold),
it should have failed over to mail2.
When killing a filesystem resource three times this happens flawlessly.
Thanks for your input!
Best regards,
Lorand
Cluster Name: mailcluster
Corosync Nodes:
mail1 mail2
Pacemaker Nodes:
mail1 mail2
Resources:
Group: network-services
Resource: virtualip-1 (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=10.20.64.10 cidr_netmask=24 nic=lan0
Operations: start interval=0s timeout=20s (virtualip-1-start-interval-0s)
stop interval=0s timeout=20s (virtualip-1-stop-interval-0s)
monitor interval=30s (virtualip-1-monitor-interval-30s)
Master: spool-clone
Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true
Resource: spool (class=ocf provider=linbit type=drbd)
Attributes: drbd_resource=spool
Operations: start interval=0s timeout=240 (spool-start-interval-0s)
promote interval=0s timeout=90 (spool-promote-interval-0s)
demote interval=0s timeout=90 (spool-demote-interval-0s)
stop interval=0s timeout=100 (spool-stop-interval-0s)
monitor interval=10s (spool-monitor-interval-10s)
Master: mail-clone
Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true
Resource: mail (class=ocf provider=linbit type=drbd)
Attributes: drbd_resource=mail
Operations: start interval=0s timeout=240 (mail-start-interval-0s)
promote interval=0s timeout=90 (mail-promote-interval-0s)
demote interval=0s timeout=90 (mail-demote-interval-0s)
stop interval=0s timeout=100 (mail-stop-interval-0s)
monitor interval=10s (mail-monitor-interval-10s)
Group: fs-services
Resource: fs-spool (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/drbd0 directory=/var/spool/postfix fstype=ext4
options=nodev,nosuid,noexec
Operations: start interval=0s timeout=60 (fs-spool-start-interval-0s)
stop interval=0s timeout=60 (fs-spool-stop-interval-0s)
monitor interval=20 timeout=40 (fs-spool-monitor-interval-20)
Resource: fs-mail (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/drbd1 directory=/var/spool/mail fstype=ext4
options=nodev,nosuid,noexec
Operations: start interval=0s timeout=60 (fs-mail-start-interval-0s)
stop interval=0s timeout=60 (fs-mail-stop-interval-0s)
monitor interval=20 timeout=40 (fs-mail-monitor-interval-20)
Group: mail-services
Resource: postfix (class=ocf provider=heartbeat type=postfix)
Operations: start interval=0s timeout=20s (postfix-start-interval-0s)
stop interval=0s timeout=20s (postfix-stop-interval-0s)
monitor interval=45s (postfix-monitor-interval-45s)
Stonith Devices:
Fencing Levels:
Location Constraints:
Ordering Constraints:
start network-services then promote mail-clone (kind:Mandatory)
(id:order-network-services-mail-clone-mandatory)
promote mail-clone then promote spool-clone (kind:Mandatory)
(id:order-mail-clone-spool-clone-mandatory)
promote spool-clone then start fs-services (kind:Mandatory)
(id:order-spool-clone-fs-services-mandatory)
start fs-services then start mail-services (kind:Mandatory)
(id:order-fs-services-mail-services-mandatory)
Colocation Constraints:
network-services with spool-clone (score:INFINITY) (rsc-role:Started)
(with-rsc-role:Master) (id:colocation-network-services-spool-clone-INFINITY)
network-services with mail-clone (score:INFINITY) (rsc-role:Started)
(with-rsc-role:Master) (id:colocation-network-services-mail-clone-INFINITY)
network-services with fs-services (score:INFINITY)
(id:colocation-network-services-fs-services-INFINITY)
network-services with mail-services (score:INFINITY)
(id:colocation-network-services-mail-services-INFINITY)
Resources Defaults:
migration-threshold: 3
Operations Defaults:
on-fail: restart
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: mailcluster
cluster-recheck-interval: 5min
dc-version: 1.1.13-10.el7_2.2-44eb2dd
default-resource-stickiness: infinity
have-watchdog: false
last-lrm-refresh: 1457613674
no-quorum-policy: ignore
pe-error-series-max: 1024
pe-input-series-max: 1024
pe-warn-series-max: 1024
stonith-enabled: false
Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: --- 0.197.15 2
Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: +++ 0.197.16 (null)
Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: + /cib: @num_updates=16
Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: +
/cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']/lrm_rsc_op[@id='postfix_last_failure_0']:
@transition-key=4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
@transition-magic=0:7;4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
@call-id=1274, @last-rc-change=1457613440
Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info:
abort_transition_graph: Transition aborted by postfix_monitor_45000
'modify' on mail1: Inactive graph
(magic=0:7;4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, cib=0.197.16,
source=process_graph_event:598, 1)
Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info:
update_failcount: Updating failcount for postfix on mail1 after failed
monitor: rc=7 (update=value++, time=1457613440)
Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
attrd_client_update: Expanded fail-count-postfix=value++ to 3
Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
cib_process_request: Completed cib_modify operation for section status: OK
(rc=0, origin=mail1/crmd/196, version=0.197.16)
Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
attrd_peer_update: Setting fail-count-postfix[mail1]: 2 -> 3 from mail2
Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
write_attribute: Sent update 400 with 2 changes for
fail-count-postfix, id=<n/a>, set=(null)
Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
cib_process_request: Forwarding cib_modify operation for section status to
master (origin=local/attrd/400)
Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info:
process_graph_event: Detected action (1234.4)
postfix_monitor_45000.1274=not running: failed
Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
attrd_peer_update: Setting last-failure-postfix[mail1]: 1457613347 ->
1457613440 from mail2
Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: notice:
do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [
input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
write_attribute: Sent update 401 with 2 changes for
last-failure-postfix, id=<n/a>, set=(null)
Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: --- 0.197.16 2
Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: +++ 0.197.17 (null)
Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: + /cib: @num_updates=17
Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: +
/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-fail-count-postfix']:
@value=3
Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
cib_process_request: Completed cib_modify operation for section status: OK
(rc=0, origin=mail2/attrd/400, version=0.197.17)
Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info:
abort_transition_graph: Transition aborted by
status-1-fail-count-postfix, fail-count-postfix=3: Transient attribute
change (modify cib=0.197.17, source=abort_unless_down:319,
path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-fail-count-postfix'],
1)
Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
attrd_cib_callback: Update 400 for fail-count-postfix: OK (0)
Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
attrd_cib_callback: Update 400 for fail-count-postfix[mail1]=3: OK (0)
Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
attrd_cib_callback: Update 400 for fail-count-postfix[mail2]=(null): OK
(0)
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_process_request: Forwarding cib_modify operation for section status to
master (origin=local/attrd/401)
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: --- 0.197.17 2
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: +++ 0.197.18 (null)
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: + /cib: @num_updates=18
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: +
/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix']:
@value=1457613440
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_process_request: Completed cib_modify operation for section status: OK
(rc=0, origin=mail2/attrd/401, version=0.197.18)
Mar 10 13:37:21 [7418] HWJ-626.domain.local attrd: info:
attrd_cib_callback: Update 401 for last-failure-postfix: OK (0)
Mar 10 13:37:21 [7418] HWJ-626.domain.local attrd: info:
attrd_cib_callback: Update 401 for
last-failure-postfix[mail1]=1457613440: OK (0)
Mar 10 13:37:21 [7418] HWJ-626.domain.local attrd: info:
attrd_cib_callback: Update 401 for
last-failure-postfix[mail2]=1457610376: OK (0)
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
abort_transition_graph: Transition aborted by
status-1-last-failure-postfix, last-failure-postfix=1457613440: Transient
attribute change (modify cib=0.197.18, source=abort_unless_down:319,
path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix'],
1)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
unpack_config: On loss of CCM Quorum: Ignore
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
determine_online_status: Node mail1 is online
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
determine_online_status: Node mail2 is online
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource mail:0 active in
master mode on mail1
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource spool:0 active in
master mode on mail1
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource fs-spool active on
mail1
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource fs-mail active on
mail1
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning:
unpack_rsc_op_failure: Processing failed op monitor for postfix on
mail1: not running (7)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource spool:1 active in
master mode on mail2
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource mail:1 active in
master mode on mail2
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
group_print: Resource Group: network-services
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
native_print: virtualip-1 (ocf::heartbeat:IPaddr2): Started
mail1
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
clone_print: Master/Slave Set: spool-clone [spool]
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
short_print: Masters: [ mail1 ]
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
short_print: Slaves: [ mail2 ]
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
clone_print: Master/Slave Set: mail-clone [mail]
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
short_print: Masters: [ mail1 ]
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
short_print: Slaves: [ mail2 ]
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
group_print: Resource Group: fs-services
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
native_print: fs-spool (ocf::heartbeat:Filesystem): Started mail1
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
native_print: fs-mail (ocf::heartbeat:Filesystem): Started mail1
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
group_print: Resource Group: mail-services
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
native_print: postfix (ocf::heartbeat:postfix): FAILED mail1
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
get_failcount_full: postfix has failed 3 times on mail1
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning:
common_apply_stickiness: Forcing postfix away from mail1 after 3
failures (max=3)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
master_color: Promoting mail:0 (Master mail1)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
master_color: mail-clone: Promoted 1 instances of a possible 1 to master
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
master_color: Promoting spool:0 (Master mail1)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
master_color: spool-clone: Promoted 1 instances of a possible 1 to master
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
rsc_merge_weights: postfix: Rolling back scores from virtualip-1
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
native_color: Resource virtualip-1 cannot run anywhere
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
RecurringOp: Start recurring monitor (45s) for postfix on mail2
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
LogActions: Stop virtualip-1 (mail1)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave spool:0 (Master mail1)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave spool:1 (Slave mail2)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave mail:0 (Master mail1)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave mail:1 (Slave mail2)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
LogActions: Stop fs-spool (Started mail1)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
LogActions: Stop fs-mail (Started mail1)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
LogActions: Stop postfix (Started mail1)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
process_pe_message: Calculated Transition 1235:
/var/lib/pacemaker/pengine/pe-input-302.bz2
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
handle_response: pe_calc calculation pe_calc-dc-1457613441-3756 is
obsolete
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
unpack_config: On loss of CCM Quorum: Ignore
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
determine_online_status: Node mail1 is online
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
determine_online_status: Node mail2 is online
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource mail:0 active in
master mode on mail1
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource spool:0 active in
master mode on mail1
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource fs-spool active on
mail1
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource fs-mail active on
mail1
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning:
unpack_rsc_op_failure: Processing failed op monitor for postfix on
mail1: not running (7)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource spool:1 active in
master mode on mail2
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
determine_op_status: Operation monitor found resource mail:1 active in
master mode on mail2
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
group_print: Resource Group: network-services
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
native_print: virtualip-1 (ocf::heartbeat:IPaddr2): Started
mail1
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
clone_print: Master/Slave Set: spool-clone [spool]
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
short_print: Masters: [ mail1 ]
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
short_print: Slaves: [ mail2 ]
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
clone_print: Master/Slave Set: mail-clone [mail]
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
short_print: Masters: [ mail1 ]
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
short_print: Slaves: [ mail2 ]
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
group_print: Resource Group: fs-services
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
native_print: fs-spool (ocf::heartbeat:Filesystem): Started mail1
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
native_print: fs-mail (ocf::heartbeat:Filesystem): Started mail1
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
group_print: Resource Group: mail-services
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
native_print: postfix (ocf::heartbeat:postfix): FAILED mail1
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
get_failcount_full: postfix has failed 3 times on mail1
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning:
common_apply_stickiness: Forcing postfix away from mail1 after 3
failures (max=3)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
master_color: Promoting mail:0 (Master mail1)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
master_color: mail-clone: Promoted 1 instances of a possible 1 to master
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
master_color: Promoting spool:0 (Master mail1)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
master_color: spool-clone: Promoted 1 instances of a possible 1 to master
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
rsc_merge_weights: postfix: Rolling back scores from virtualip-1
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
native_color: Resource virtualip-1 cannot run anywhere
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
RecurringOp: Start recurring monitor (45s) for postfix on mail2
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
LogActions: Stop virtualip-1 (mail1)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave spool:0 (Master mail1)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave spool:1 (Slave mail2)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave mail:0 (Master mail1)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
LogActions: Leave mail:1 (Slave mail2)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
LogActions: Stop fs-spool (Started mail1)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
LogActions: Stop fs-mail (Started mail1)
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
LogActions: Stop postfix (Started mail1)
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
do_state_transition: State transition S_POLICY_ENGINE ->
S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
origin=handle_response ]
Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
process_pe_message: Calculated Transition 1236:
/var/lib/pacemaker/pengine/pe-input-303.bz2
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
do_te_invoke: Processing graph 1236 (ref=pe_calc-dc-1457613441-3757)
derived from /var/lib/pacemaker/pengine/pe-input-303.bz2
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
te_rsc_command: Initiating action 12: stop virtualip-1_stop_0 on mail1
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
te_rsc_command: Initiating action 5: stop postfix_stop_0 on mail1
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: --- 0.197.18 2
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: +++ 0.197.19 (null)
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: + /cib: @num_updates=19
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: +
/cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='virtualip-1']/lrm_rsc_op[@id='virtualip-1_last_0']:
@operation_key=virtualip-1_stop_0, @operation=stop,
@transition-key=12:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
@transition-magic=0:0;12:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
@call-id=1276, @last-run=1457613441, @last-rc-change=1457613441,
@exec-time=66
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_process_request: Completed cib_modify operation for section status: OK
(rc=0, origin=mail1/crmd/197, version=0.197.19)
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
match_graph_event: Action virtualip-1_stop_0 (12) confirmed on mail1
(rc=0)
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: --- 0.197.19 2
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: +++ 0.197.20 (null)
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: + /cib: @num_updates=20
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: +
/cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']/lrm_rsc_op[@id='postfix_last_0']:
@operation_key=postfix_stop_0, @operation=stop,
@transition-key=5:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
@transition-magic=0:0;5:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
@call-id=1278, @last-run=1457613441, @last-rc-change=1457613441,
@exec-time=476
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
match_graph_event: Action postfix_stop_0 (5) confirmed on mail1 (rc=0)
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
te_rsc_command: Initiating action 79: stop fs-mail_stop_0 on mail1
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_process_request: Completed cib_modify operation for section status: OK
(rc=0, origin=mail1/crmd/198, version=0.197.20)
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: --- 0.197.20 2
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: +++ 0.197.21 (null)
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: + /cib: @num_updates=21
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: +
/cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='fs-mail']/lrm_rsc_op[@id='fs-mail_last_0']:
@operation_key=fs-mail_stop_0, @operation=stop,
@transition-key=79:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
@transition-magic=0:0;79:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
@call-id=1280, @last-run=1457613441, @last-rc-change=1457613441,
@exec-time=88, @queue-time=1
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_process_request: Completed cib_modify operation for section status: OK
(rc=0, origin=mail1/crmd/199, version=0.197.21)
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
match_graph_event: Action fs-mail_stop_0 (79) confirmed on mail1 (rc=0)
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
te_rsc_command: Initiating action 77: stop fs-spool_stop_0 on mail1
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: --- 0.197.21 2
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: Diff: +++ 0.197.22 (null)
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: + /cib: @num_updates=22
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_perform_op: +
/cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='fs-spool']/lrm_rsc_op[@id='fs-spool_last_0']:
@operation_key=fs-spool_stop_0, @operation=stop,
@transition-key=77:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
@transition-magic=0:0;77:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
@call-id=1282, @last-run=1457613441, @last-rc-change=1457613441,
@exec-time=86
Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
cib_process_request: Completed cib_modify operation for section status: OK
(rc=0, origin=mail1/crmd/200, version=0.197.22)
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
match_graph_event: Action fs-spool_stop_0 (77) confirmed on mail1 (rc=0)
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: warning:
run_graph: Transition 1236 (Complete=11, Pending=0, Fired=0, Skipped=0,
Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-303.bz2):
Terminated
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: warning:
te_graph_trigger: Transition failed: terminated
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
print_graph: Graph 1236 with 12 actions: batch-limit=12 jobs,
network-delay=0ms
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
print_synapse: [Action 16]: Completed pseudo op
network-services_stopped_0 on N/A (priority: 0, waiting: none)
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
print_synapse: [Action 15]: Completed pseudo op
network-services_stop_0 on N/A (priority: 0, waiting: none)
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
print_synapse: [Action 12]: Completed rsc op virtualip-1_stop_0
on mail1 (priority: 0, waiting: none)
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
print_synapse: [Action 84]: Completed pseudo op
fs-services_stopped_0 on N/A (priority: 0, waiting: none)
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
print_synapse: [Action 83]: Completed pseudo op fs-services_stop_0
on N/A (priority: 0, waiting: none)
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
print_synapse: [Action 77]: Completed rsc op fs-spool_stop_0
on mail1 (priority: 0, waiting: none)
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
print_synapse: [Action 79]: Completed rsc op fs-mail_stop_0
on mail1 (priority: 0, waiting: none)
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
print_synapse: [Action 90]: Completed pseudo op
mail-services_stopped_0 on N/A (priority: 0, waiting: none)
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
print_synapse: [Action 89]: Completed pseudo op
mail-services_stop_0 on N/A (priority: 0, waiting: none)
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
print_synapse: [Action 86]: Pending rsc op postfix_monitor_45000
on mail2 (priority: 0, waiting: none)
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
print_synapse: * [Input 85]: Unresolved dependency rsc op
postfix_start_0 on mail2
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
print_synapse: [Action 5]: Completed rsc op postfix_stop_0
on mail1 (priority: 0, waiting: none)
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
print_synapse: [Action 8]: Completed pseudo op all_stopped
on N/A (priority: 0, waiting: none)
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: do_log:
FSA: Input I_TE_SUCCESS from notify_crmd() received in state
S_TRANSITION_ENGINE
Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [
input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Mar 10 13:37:26 [7415] HWJ-626.domain.local cib: info:
cib_process_ping: Reporting our current digest to mail2:
3896ee29cdb6ba128330b0ef6e41bd79 for 0.197.22 (0x1544a30 0)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160310/22941f39/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pe-input-302.bz2
Type: application/x-bzip2
Size: 3462 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160310/22941f39/attachment-0006.bz2>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pe-input-303.bz2
Type: application/x-bzip2
Size: 3460 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160310/22941f39/attachment-0007.bz2>
More information about the Users
mailing list