[ClusterLabs] Cluster failover failure with Unresolved dependency
Lorand Kelemen
lorand.kelemen at gmail.com
Fri Mar 18 10:46:00 CET 2016
Sure thing. Just to highlight the differences from before: current
constraints config, also the mail-services group is growing with systemd
resources.
What happened: mail2 was running all resources, then I killed the amavisd
master process.
Best regards,
Lorand
Location Constraints:
Ordering Constraints:
promote mail-clone then start fs-services (kind:Mandatory)
promote spool-clone then start fs-services (kind:Mandatory)
start network-services then start fs-services (kind:Mandatory)
start fs-services then start mail-services (kind:Mandatory)
Colocation Constraints:
fs-services with spool-clone (score:INFINITY) (rsc-role:Started)
(with-rsc-role:Master)
fs-services with mail-clone (score:INFINITY) (rsc-role:Started)
(with-rsc-role:Master)
mail-services with fs-services (score:INFINITY)
network-services with mail-services (score:INFINITY)
Group: mail-services
Resource: amavisd (class=systemd type=amavisd)
Operations: monitor interval=60s (amavisd-monitor-interval-60s)
Resource: spamassassin (class=systemd type=spamassassin)
Operations: monitor interval=60s (spamassassin-monitor-interval-60s)
Resource: clamd (class=systemd type=clamd at amavisd)
Operations: monitor interval=60s (clamd-monitor-interval-60s)
Cluster name: mailcluster
Last updated: Fri Mar 18 10:43:57 2016 Last change: Fri Mar 18
10:40:28 2016 by hacluster via crmd on mail1
Stack: corosync
Current DC: mail2 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with
quorum
2 nodes and 10 resources configured
Online: [ mail1 mail2 ]
Full list of resources:
Resource Group: network-services
virtualip-1 (ocf::heartbeat:IPaddr2): Stopped
Master/Slave Set: spool-clone [spool]
Masters: [ mail2 ]
Slaves: [ mail1 ]
Master/Slave Set: mail-clone [mail]
Masters: [ mail2 ]
Slaves: [ mail1 ]
Resource Group: fs-services
fs-spool (ocf::heartbeat:Filesystem): Stopped
fs-mail (ocf::heartbeat:Filesystem): Stopped
Resource Group: mail-services
amavisd (systemd:amavisd): Stopped
spamassassin (systemd:spamassassin): Stopped
clamd (systemd:clamd at amavisd): Stopped
Failed Actions:
* amavisd_monitor_60000 on mail2 'not running' (7): call=2499,
status=complete, exitreason='none',
last-rc-change='Fri Mar 18 10:42:29 2016', queued=0ms, exec=0ms
PCSD Status:
mail1: Online
mail2: Online
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
<cib crm_feature_set="3.0.10" validate-with="pacemaker-2.3" epoch="277"
num_updates="22" admin_epoch="0" cib-last-written="Fri Mar 18 10:40:28
2016" update-origin="mail1" update-client="crmd" update-user="hacluster"
have-quorum="1" dc-uuid="2">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-have-watchdog"
name="have-watchdog" value="false"/>
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.1.13-10.el7_2.2-44eb2dd"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure"
name="cluster-infrastructure" value="corosync"/>
<nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name"
value="mailcluster"/>
<nvpair id="cib-bootstrap-options-stonith-enabled"
name="stonith-enabled" value="false"/>
<nvpair id="cib-bootstrap-options-pe-error-series-max"
name="pe-error-series-max" value="1024"/>
<nvpair id="cib-bootstrap-options-pe-warn-series-max"
name="pe-warn-series-max" value="1024"/>
<nvpair id="cib-bootstrap-options-pe-input-series-max"
name="pe-input-series-max" value="1024"/>
<nvpair id="cib-bootstrap-options-no-quorum-policy"
name="no-quorum-policy" value="ignore"/>
<nvpair id="cib-bootstrap-options-cluster-recheck-interval"
name="cluster-recheck-interval" value="5min"/>
<nvpair id="cib-bootstrap-options-last-lrm-refresh"
name="last-lrm-refresh" value="1458294028"/>
<nvpair id="cib-bootstrap-options-default-resource-stickiness"
name="default-resource-stickiness" value="infinity"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="1" uname="mail1">
<instance_attributes id="nodes-1"/>
</node>
<node id="2" uname="mail2">
<instance_attributes id="nodes-2"/>
</node>
</nodes>
<resources>
<group id="network-services">
<primitive class="ocf" id="virtualip-1" provider="heartbeat"
type="IPaddr2">
<instance_attributes id="virtualip-1-instance_attributes">
<nvpair id="virtualip-1-instance_attributes-ip" name="ip"
value="10.20.64.10"/>
<nvpair id="virtualip-1-instance_attributes-cidr_netmask"
name="cidr_netmask" value="24"/>
<nvpair id="virtualip-1-instance_attributes-nic" name="nic"
value="lan0"/>
</instance_attributes>
<operations>
<op id="virtualip-1-start-interval-0s" interval="0s"
name="start" timeout="20s"/>
<op id="virtualip-1-stop-interval-0s" interval="0s" name="stop"
timeout="20s"/>
<op id="virtualip-1-monitor-interval-30s" interval="30s"
name="monitor"/>
</operations>
</primitive>
</group>
<master id="spool-clone">
<primitive class="ocf" id="spool" provider="linbit" type="drbd">
<instance_attributes id="spool-instance_attributes">
<nvpair id="spool-instance_attributes-drbd_resource"
name="drbd_resource" value="spool"/>
</instance_attributes>
<operations>
<op id="spool-start-interval-0s" interval="0s" name="start"
timeout="240"/>
<op id="spool-promote-interval-0s" interval="0s" name="promote"
timeout="90"/>
<op id="spool-demote-interval-0s" interval="0s" name="demote"
timeout="90"/>
<op id="spool-stop-interval-0s" interval="0s" name="stop"
timeout="100"/>
<op id="spool-monitor-interval-10s" interval="10s"
name="monitor"/>
</operations>
</primitive>
<meta_attributes id="spool-clone-meta_attributes">
<nvpair id="spool-clone-meta_attributes-master-max"
name="master-max" value="1"/>
<nvpair id="spool-clone-meta_attributes-master-node-max"
name="master-node-max" value="1"/>
<nvpair id="spool-clone-meta_attributes-clone-max"
name="clone-max" value="2"/>
<nvpair id="spool-clone-meta_attributes-clone-node-max"
name="clone-node-max" value="1"/>
<nvpair id="spool-clone-meta_attributes-notify" name="notify"
value="true"/>
</meta_attributes>
</master>
<master id="mail-clone">
<primitive class="ocf" id="mail" provider="linbit" type="drbd">
<instance_attributes id="mail-instance_attributes">
<nvpair id="mail-instance_attributes-drbd_resource"
name="drbd_resource" value="mail"/>
</instance_attributes>
<operations>
<op id="mail-start-interval-0s" interval="0s" name="start"
timeout="240"/>
<op id="mail-promote-interval-0s" interval="0s" name="promote"
timeout="90"/>
<op id="mail-demote-interval-0s" interval="0s" name="demote"
timeout="90"/>
<op id="mail-stop-interval-0s" interval="0s" name="stop"
timeout="100"/>
<op id="mail-monitor-interval-10s" interval="10s"
name="monitor"/>
</operations>
</primitive>
<meta_attributes id="mail-clone-meta_attributes">
<nvpair id="mail-clone-meta_attributes-master-max"
name="master-max" value="1"/>
<nvpair id="mail-clone-meta_attributes-master-node-max"
name="master-node-max" value="1"/>
<nvpair id="mail-clone-meta_attributes-clone-max"
name="clone-max" value="2"/>
<nvpair id="mail-clone-meta_attributes-clone-node-max"
name="clone-node-max" value="1"/>
<nvpair id="mail-clone-meta_attributes-notify" name="notify"
value="true"/>
</meta_attributes>
</master>
<group id="fs-services">
<primitive class="ocf" id="fs-spool" provider="heartbeat"
type="Filesystem">
<instance_attributes id="fs-spool-instance_attributes">
<nvpair id="fs-spool-instance_attributes-device" name="device"
value="/dev/drbd0"/>
<nvpair id="fs-spool-instance_attributes-directory"
name="directory" value="/var/spool/postfix"/>
<nvpair id="fs-spool-instance_attributes-fstype" name="fstype"
value="ext4"/>
<nvpair id="fs-spool-instance_attributes-options"
name="options" value="nodev,nosuid,noexec"/>
</instance_attributes>
<operations>
<op id="fs-spool-start-interval-0s" interval="0s" name="start"
timeout="60"/>
<op id="fs-spool-stop-interval-0s" interval="0s" name="stop"
timeout="60"/>
<op id="fs-spool-monitor-interval-20" interval="20"
name="monitor" timeout="40"/>
</operations>
</primitive>
<primitive class="ocf" id="fs-mail" provider="heartbeat"
type="Filesystem">
<instance_attributes id="fs-mail-instance_attributes">
<nvpair id="fs-mail-instance_attributes-device" name="device"
value="/dev/drbd1"/>
<nvpair id="fs-mail-instance_attributes-directory"
name="directory" value="/var/spool/mail"/>
<nvpair id="fs-mail-instance_attributes-fstype" name="fstype"
value="ext4"/>
<nvpair id="fs-mail-instance_attributes-options" name="options"
value="nodev,nosuid,noexec"/>
</instance_attributes>
<operations>
<op id="fs-mail-start-interval-0s" interval="0s" name="start"
timeout="60"/>
<op id="fs-mail-stop-interval-0s" interval="0s" name="stop"
timeout="60"/>
<op id="fs-mail-monitor-interval-20" interval="20"
name="monitor" timeout="40"/>
</operations>
</primitive>
</group>
<group id="mail-services">
<primitive class="systemd" id="amavisd" type="amavisd">
<instance_attributes id="amavisd-instance_attributes"/>
<operations>
<op id="amavisd-monitor-interval-60s" interval="60s"
name="monitor"/>
</operations>
</primitive>
<primitive class="systemd" id="spamassassin" type="spamassassin">
<instance_attributes id="spamassassin-instance_attributes"/>
<operations>
<op id="spamassassin-monitor-interval-60s" interval="60s"
name="monitor"/>
</operations>
</primitive>
<primitive class="systemd" id="clamd" type="clamd at amavisd">
<instance_attributes id="clamd-instance_attributes"/>
<operations>
<op id="clamd-monitor-interval-60s" interval="60s"
name="monitor"/>
</operations>
</primitive>
</group>
</resources>
<constraints>
<rsc_order first="mail-clone" first-action="promote"
id="order-mail-clone-fs-services-mandatory" then="fs-services"
then-action="start"/>
<rsc_order first="spool-clone" first-action="promote"
id="order-spool-clone-fs-services-mandatory" then="fs-services"
then-action="start"/>
<rsc_order first="network-services" first-action="start"
id="order-network-services-fs-services-mandatory" then="fs-services"
then-action="start"/>
<rsc_order first="fs-services" first-action="start"
id="order-fs-services-mail-services-mandatory" then="mail-services"
then-action="start"/>
<rsc_colocation id="colocation-fs-services-spool-clone-INFINITY"
rsc="fs-services" rsc-role="Started" score="INFINITY"
with-rsc="spool-clone" with-rsc-role="Master"/>
<rsc_colocation id="colocation-fs-services-mail-clone-INFINITY"
rsc="fs-services" rsc-role="Started" score="INFINITY" with-rsc="mail-clone"
with-rsc-role="Master"/>
<rsc_colocation id="colocation-mail-services-fs-services-INFINITY"
rsc="mail-services" score="INFINITY" with-rsc="fs-services"/>
<rsc_colocation
id="colocation-network-services-mail-services-INFINITY"
rsc="network-services" score="INFINITY" with-rsc="mail-services"/>
</constraints>
<op_defaults>
<meta_attributes id="op_defaults-options">
<nvpair id="op_defaults-options-on-fail" name="on-fail"
value="restart"/>
</meta_attributes>
</op_defaults>
<rsc_defaults>
<meta_attributes id="rsc_defaults-options">
<nvpair id="rsc_defaults-options-migration-threshold"
name="migration-threshold" value="1"/>
</meta_attributes>
</rsc_defaults>
</configuration>
<status>
<node_state id="1" uname="mail1" in_ccm="true" crmd="online"
crm-debug-origin="do_update_resource" join="member" expected="member">
<transient_attributes id="1">
<instance_attributes id="status-1">
<nvpair id="status-1-shutdown" name="shutdown" value="0"/>
<nvpair id="status-1-probe_complete" name="probe_complete"
value="true"/>
<nvpair id="status-1-last-failure-fs-mail"
name="last-failure-fs-mail" value="1458145164"/>
<nvpair id="status-1-last-failure-amavisd"
name="last-failure-amavisd" value="1458144572"/>
<nvpair id="status-1-master-spool" name="master-spool"
value="10000"/>
<nvpair id="status-1-master-mail" name="master-mail"
value="10000"/>
</instance_attributes>
</transient_attributes>
<lrm id="1">
<lrm_resources>
<lrm_resource id="virtualip-1" type="IPaddr2" class="ocf"
provider="heartbeat">
<lrm_rsc_op id="virtualip-1_last_0"
operation_key="virtualip-1_stop_0" operation="stop"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="13:3651:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;13:3651:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail1" call-id="1930" rc-code="0" op-status="0" interval="0"
last-run="1458292925" last-rc-change="1458292925" exec-time="285"
queue-time="0" op-digest="28a9f5254eca47bbb2a9892a336ab8d6"/>
<lrm_rsc_op id="virtualip-1_monitor_30000"
operation_key="virtualip-1_monitor_30000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="13:3390:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;13:3390:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail1" call-id="1886" rc-code="0" op-status="0" interval="30000"
last-rc-change="1458216597" exec-time="46" queue-time="0"
op-digest="c2158e684c2fe8758a545e9a9387caed"/>
</lrm_resource>
<lrm_resource id="mail" type="drbd" class="ocf" provider="linbit">
<lrm_rsc_op id="mail_last_failure_0"
operation_key="mail_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="9:3026:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;9:3026:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail1" call-id="1451" rc-code="0" op-status="0" interval="0"
last-run="1458128284" last-rc-change="1458128284" exec-time="72"
queue-time="0" op-digest="98235597a9743aebee92a6c373a068d5"/>
<lrm_rsc_op id="mail_last_0" operation_key="mail_start_0"
operation="start" crm-debug-origin="do_update_resource"
crm_feature_set="3.0.10"
transition-key="50:3669:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;50:3669:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail1" call-id="2014" rc-code="0" op-status="0" interval="0"
last-run="1458294003" last-rc-change="1458294003" exec-time="270"
queue-time="0" op-digest="98235597a9743aebee92a6c373a068d5"/>
<lrm_rsc_op id="mail_monitor_10000"
operation_key="mail_monitor_10000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="50:3670:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;50:3670:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail1" call-id="2019" rc-code="0" op-status="0" interval="10000"
last-rc-change="1458294004" exec-time="79" queue-time="0"
op-digest="57464d93900365abea1493a8f6b22159"/>
</lrm_resource>
<lrm_resource id="spool" type="drbd" class="ocf"
provider="linbit">
<lrm_rsc_op id="spool_last_failure_0"
operation_key="spool_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="9:3028:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;9:3028:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail1" call-id="1459" rc-code="0" op-status="0" interval="0"
last-run="1458128289" last-rc-change="1458128289" exec-time="73"
queue-time="0" op-digest="dbbf364a9d070ebe47b97831a0be60f4"/>
<lrm_rsc_op id="spool_last_0" operation_key="spool_start_0"
operation="start" crm-debug-origin="do_update_resource"
crm_feature_set="3.0.10"
transition-key="20:3669:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;20:3669:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail1" call-id="2015" rc-code="0" op-status="0" interval="0"
last-run="1458294003" last-rc-change="1458294003" exec-time="266"
queue-time="0" op-digest="dbbf364a9d070ebe47b97831a0be60f4"/>
<lrm_rsc_op id="spool_monitor_10000"
operation_key="spool_monitor_10000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="19:3670:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;19:3670:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail1" call-id="2018" rc-code="0" op-status="0" interval="10000"
last-rc-change="1458294004" exec-time="80" queue-time="0"
op-digest="97f3ae82d78b8755a2179c6797797580"/>
</lrm_resource>
<lrm_resource id="fs-spool" type="Filesystem" class="ocf"
provider="heartbeat">
<lrm_rsc_op id="fs-spool_last_0"
operation_key="fs-spool_stop_0" operation="stop"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="78:3651:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;78:3651:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail1" call-id="1928" rc-code="0" op-status="0" interval="0"
last-run="1458292923" last-rc-change="1458292923" exec-time="1258"
queue-time="0" op-digest="54f97a4890ac973bd096580098e40914"/>
<lrm_rsc_op id="fs-spool_monitor_20000"
operation_key="fs-spool_monitor_20000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="69:3392:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;69:3392:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail1" call-id="1896" rc-code="0" op-status="0" interval="20000"
last-rc-change="1458216598" exec-time="47" queue-time="0"
op-digest="e85a7e24c0c0b05f5d196e3d363e4dfc"/>
</lrm_resource>
<lrm_resource id="fs-mail" type="Filesystem" class="ocf"
provider="heartbeat">
<lrm_rsc_op id="fs-mail_last_0" operation_key="fs-mail_stop_0"
operation="stop" crm-debug-origin="do_update_resource"
crm_feature_set="3.0.10"
transition-key="81:3651:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;81:3651:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail1" call-id="1926" rc-code="0" op-status="0" interval="0"
last-run="1458292923" last-rc-change="1458292923" exec-time="85"
queue-time="1" op-digest="57adf8df552907571679154e346a4403"/>
<lrm_rsc_op id="fs-mail_monitor_20000"
operation_key="fs-mail_monitor_20000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="71:3392:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;71:3392:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail1" call-id="1898" rc-code="0" op-status="0" interval="20000"
last-rc-change="1458216598" exec-time="67" queue-time="0"
op-digest="ad82e3ec600949a8e869e8afe9a21fef"/>
</lrm_resource>
<lrm_resource id="amavisd" type="amavisd" class="systemd">
<lrm_rsc_op id="amavisd_last_0"
operation_key="amavisd_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="9:3674:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:7;9:3674:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail1" call-id="2026" rc-code="7" op-status="0" interval="0"
last-run="1458294028" last-rc-change="1458294028" exec-time="5"
queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
</lrm_resource>
<lrm_resource id="spamassassin" type="spamassassin"
class="systemd">
<lrm_rsc_op id="spamassassin_last_0"
operation_key="spamassassin_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="10:3674:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:7;10:3674:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail1" call-id="2030" rc-code="7" op-status="0" interval="0"
last-run="1458294028" last-rc-change="1458294028" exec-time="5"
queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
</lrm_resource>
<lrm_resource id="clamd" type="clamd at amavisd" class="systemd">
<lrm_rsc_op id="clamd_last_0" operation_key="clamd_monitor_0"
operation="monitor" crm-debug-origin="do_update_resource"
crm_feature_set="3.0.10"
transition-key="11:3674:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:7;11:3674:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail1" call-id="2034" rc-code="7" op-status="0" interval="0"
last-run="1458294028" last-rc-change="1458294028" exec-time="7"
queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
</lrm_resource>
</lrm_resources>
</lrm>
</node_state>
<node_state id="2" uname="mail2" in_ccm="true" crmd="online"
crm-debug-origin="do_update_resource" join="member" expected="member">
<transient_attributes id="2">
<instance_attributes id="status-2">
<nvpair id="status-2-shutdown" name="shutdown" value="0"/>
<nvpair id="status-2-last-failure-spool"
name="last-failure-spool" value="1457364470"/>
<nvpair id="status-2-probe_complete" name="probe_complete"
value="true"/>
<nvpair id="status-2-last-failure-mail" name="last-failure-mail"
value="1457527103"/>
<nvpair id="status-2-last-failure-fs-spool"
name="last-failure-fs-spool" value="1457524256"/>
<nvpair id="status-2-last-failure-fs-mail"
name="last-failure-fs-mail" value="1457611139"/>
<nvpair id="status-2-last-failure-amavisd"
name="last-failure-amavisd" value="1458294149"/>
<nvpair id="status-2-master-mail" name="master-mail"
value="10000"/>
<nvpair id="status-2-master-spool" name="master-spool"
value="10000"/>
<nvpair id="status-2-fail-count-amavisd"
name="fail-count-amavisd" value="1"/>
</instance_attributes>
</transient_attributes>
<lrm id="2">
<lrm_resources>
<lrm_resource id="virtualip-1" type="IPaddr2" class="ocf"
provider="heartbeat">
<lrm_rsc_op id="virtualip-1_last_failure_0"
operation_key="virtualip-1_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="11:3024:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;11:3024:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="1904" rc-code="0" op-status="0" interval="0"
last-run="1458128280" last-rc-change="1458128280" exec-time="49"
queue-time="0" op-digest="28a9f5254eca47bbb2a9892a336ab8d6"/>
<lrm_rsc_op id="virtualip-1_last_0"
operation_key="virtualip-1_stop_0" operation="stop"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="14:3677:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;14:3677:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="2513" rc-code="0" op-status="0" interval="0"
last-run="1458294156" last-rc-change="1458294156" exec-time="51"
queue-time="0" op-digest="28a9f5254eca47bbb2a9892a336ab8d6"/>
<lrm_rsc_op id="virtualip-1_monitor_30000"
operation_key="virtualip-1_monitor_30000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="12:3664:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;12:3664:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="2425" rc-code="0" op-status="0" interval="30000"
last-rc-change="1458293985" exec-time="48" queue-time="0"
op-digest="c2158e684c2fe8758a545e9a9387caed"/>
</lrm_resource>
<lrm_resource id="mail" type="drbd" class="ocf" provider="linbit">
<lrm_rsc_op id="mail_last_failure_0"
operation_key="mail_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="11:3026:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:8;11:3026:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="1911" rc-code="8" op-status="0" interval="0"
last-run="1458128284" last-rc-change="1458128284" exec-time="79"
queue-time="0" op-digest="98235597a9743aebee92a6c373a068d5"/>
<lrm_rsc_op id="mail_last_0" operation_key="mail_promote_0"
operation="promote" crm-debug-origin="do_update_resource"
crm_feature_set="3.0.10"
transition-key="41:3652:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;41:3652:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="2333" rc-code="0" op-status="0" interval="0"
last-run="1458292925" last-rc-change="1458292925" exec-time="41"
queue-time="0" op-digest="98235597a9743aebee92a6c373a068d5"/>
</lrm_resource>
<lrm_resource id="spool" type="drbd" class="ocf"
provider="linbit">
<lrm_rsc_op id="spool_last_failure_0"
operation_key="spool_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="11:3028:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:8;11:3028:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="1917" rc-code="8" op-status="0" interval="0"
last-run="1458128289" last-rc-change="1458128289" exec-time="73"
queue-time="0" op-digest="dbbf364a9d070ebe47b97831a0be60f4"/>
<lrm_rsc_op id="spool_last_0" operation_key="spool_promote_0"
operation="promote" crm-debug-origin="do_update_resource"
crm_feature_set="3.0.10"
transition-key="14:3652:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;14:3652:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="2332" rc-code="0" op-status="0" interval="0"
last-run="1458292925" last-rc-change="1458292925" exec-time="45"
queue-time="0" op-digest="dbbf364a9d070ebe47b97831a0be60f4"/>
</lrm_resource>
<lrm_resource id="fs-mail" type="Filesystem" class="ocf"
provider="heartbeat">
<lrm_rsc_op id="fs-mail_last_failure_0"
operation_key="fs-mail_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="11:3150:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;11:3150:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="2281" rc-code="0" op-status="0" interval="0"
last-run="1458145187" last-rc-change="1458145187" exec-time="77"
queue-time="1" op-digest="57adf8df552907571679154e346a4403"/>
<lrm_rsc_op id="fs-mail_last_0" operation_key="fs-mail_stop_0"
operation="stop" crm-debug-origin="do_update_resource"
crm_feature_set="3.0.10"
transition-key="81:3677:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;81:3677:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="2509" rc-code="0" op-status="0" interval="0"
last-run="1458294155" last-rc-change="1458294155" exec-time="78"
queue-time="0" op-digest="57adf8df552907571679154e346a4403"/>
<lrm_rsc_op id="fs-mail_monitor_20000"
operation_key="fs-mail_monitor_20000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="76:3664:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;76:3664:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="2429" rc-code="0" op-status="0" interval="20000"
last-rc-change="1458293985" exec-time="62" queue-time="0"
op-digest="ad82e3ec600949a8e869e8afe9a21fef"/>
</lrm_resource>
<lrm_resource id="fs-spool" type="Filesystem" class="ocf"
provider="heartbeat">
<lrm_rsc_op id="fs-spool_last_failure_0"
operation_key="fs-spool_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="10:3150:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;10:3150:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="2277" rc-code="0" op-status="0" interval="0"
last-run="1458145187" last-rc-change="1458145187" exec-time="81"
queue-time="0" op-digest="54f97a4890ac973bd096580098e40914"/>
<lrm_rsc_op id="fs-spool_last_0"
operation_key="fs-spool_stop_0" operation="stop"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="79:3677:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;79:3677:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="2511" rc-code="0" op-status="0" interval="0"
last-run="1458294155" last-rc-change="1458294155" exec-time="1220"
queue-time="0" op-digest="54f97a4890ac973bd096580098e40914"/>
<lrm_rsc_op id="fs-spool_monitor_20000"
operation_key="fs-spool_monitor_20000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="74:3664:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;74:3664:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="2427" rc-code="0" op-status="0" interval="20000"
last-rc-change="1458293985" exec-time="44" queue-time="0"
op-digest="e85a7e24c0c0b05f5d196e3d363e4dfc"/>
</lrm_resource>
<lrm_resource id="amavisd" type="amavisd" class="systemd">
<lrm_rsc_op id="amavisd_last_failure_0"
operation_key="amavisd_monitor_60000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="86:3675:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:7;86:3675:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="2499" rc-code="7" op-status="0" interval="60000"
last-run="1458294028" last-rc-change="1458294149" exec-time="0"
queue-time="0" op-digest="4811cef7f7f94e3a35a70be7916cb2fd"/>
<lrm_rsc_op id="amavisd_last_0" operation_key="amavisd_stop_0"
operation="stop" crm-debug-origin="do_update_resource"
crm_feature_set="3.0.10"
transition-key="7:3677:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;7:3677:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="2507" rc-code="0" op-status="0" interval="0"
last-run="1458294153" last-rc-change="1458294153" exec-time="2068"
queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
<lrm_rsc_op id="amavisd_monitor_60000"
operation_key="amavisd_monitor_60000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="86:3675:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;86:3675:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="2499" rc-code="0" op-status="0" interval="60000"
last-rc-change="1458294028" exec-time="2" queue-time="0"
op-digest="4811cef7f7f94e3a35a70be7916cb2fd"/>
</lrm_resource>
<lrm_resource id="spamassassin" type="spamassassin"
class="systemd">
<lrm_rsc_op id="spamassassin_last_failure_0"
operation_key="spamassassin_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="14:3674:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;14:3674:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="2494" rc-code="0" op-status="0" interval="0"
last-run="1458294028" last-rc-change="1458294028" exec-time="11"
queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
<lrm_rsc_op id="spamassassin_last_0"
operation_key="spamassassin_stop_0" operation="stop"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="87:3677:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;87:3677:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="2505" rc-code="0" op-status="0" interval="0"
last-run="1458294151" last-rc-change="1458294151" exec-time="2072"
queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
<lrm_rsc_op id="spamassassin_monitor_60000"
operation_key="spamassassin_monitor_60000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="89:3675:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;89:3675:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="2500" rc-code="0" op-status="0" interval="60000"
last-rc-change="1458294028" exec-time="1" queue-time="0"
op-digest="4811cef7f7f94e3a35a70be7916cb2fd"/>
</lrm_resource>
<lrm_resource id="clamd" type="clamd at amavisd" class="systemd">
<lrm_rsc_op id="clamd_last_failure_0"
operation_key="clamd_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="15:3674:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;15:3674:7:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="2498" rc-code="0" op-status="0" interval="0"
last-run="1458294028" last-rc-change="1458294028" exec-time="10"
queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
<lrm_rsc_op id="clamd_last_0" operation_key="clamd_stop_0"
operation="stop" crm-debug-origin="do_update_resource"
crm_feature_set="3.0.10"
transition-key="88:3677:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;88:3677:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="2503" rc-code="0" op-status="0" interval="0"
last-run="1458294149" last-rc-change="1458294149" exec-time="2085"
queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
<lrm_rsc_op id="clamd_monitor_60000"
operation_key="clamd_monitor_60000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="92:3675:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:0;92:3675:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail2" call-id="2501" rc-code="0" op-status="0" interval="60000"
last-rc-change="1458294029" exec-time="2" queue-time="0"
op-digest="4811cef7f7f94e3a35a70be7916cb2fd"/>
</lrm_resource>
</lrm_resources>
</lrm>
</node_state>
</status>
</cib>
On Thu, Mar 17, 2016 at 8:30 PM, Ken Gaillot <kgaillot at redhat.com> wrote:
> On 03/16/2016 11:20 AM, Lorand Kelemen wrote:
> > Dear Ken,
> >
> > I already modified the startup as suggested during testing, thanks! I
> > swapped the postfix ocf resource to the amavisd systemd resource, as
> latter
> > controls postfix startup also as it turns out and having both resouces in
> > the mail-services group causes conflicts (postfix is detected as not
> > running).
> >
> > Still experiencing the same behaviour, killing amavisd returns an rc=7
> for
> > the monitoring operation on the "victim" node, this soungs logical, but
> the
> > logs contain the same: amavisd and virtualip cannot run anywhere.
> >
> > I made sure systemd is clean (amavisd = inactive, not running instead of
> > failed) and also reset the failcount on all resources before killing
> > amavisd.
> >
> > How can I make sure to have a clean state for the resources beside above
> > actions?
>
> What you did is fine. I'm not sure why amavisd and virtualip can't run.
> Can you show the output of "cibadmin -Q" when the cluster is in that state?
>
> > Also note: when causing a filesystem resource to fail (e.g. with unmout),
> > the failover happens successfully, all resources are started on the
> > "survivor" node.
> >
> > Best regards,
> > Lorand
> >
> >
> > On Wed, Mar 16, 2016 at 4:34 PM, Ken Gaillot <kgaillot at redhat.com>
> wrote:
> >
> >> On 03/16/2016 05:49 AM, Lorand Kelemen wrote:
> >>> Dear Ken,
> >>>
> >>> Thanks for the reply! I lowered migration-threshold to 1 and rearranged
> >>> contraints like you suggested:
> >>>
> >>> Location Constraints:
> >>> Ordering Constraints:
> >>> promote mail-clone then start fs-services (kind:Mandatory)
> >>> promote spool-clone then start fs-services (kind:Mandatory)
> >>> start fs-services then start network-services (kind:Mandatory)
> >>
> >> Certainly not a big deal, but I would change the above constraint to
> >> start fs-services then start mail-services. The IP doesn't care whether
> >> the filesystems are up yet or not, but postfix does.
> >>
> >>> start network-services then start mail-services (kind:Mandatory)
> >>> Colocation Constraints:
> >>> fs-services with spool-clone (score:INFINITY) (rsc-role:Started)
> >>> (with-rsc-role:Master)
> >>> fs-services with mail-clone (score:INFINITY) (rsc-role:Started)
> >>> (with-rsc-role:Master)
> >>> network-services with mail-services (score:INFINITY)
> >>> mail-services with fs-services (score:INFINITY)
> >>>
> >>> Now virtualip and postfix becomes stopped, I guess these are relevant
> >> but I
> >>> attach also full logs:
> >>>
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> native_color: Resource postfix cannot run anywhere
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> native_color: Resource virtualip-1 cannot run anywhere
> >>>
> >>> Interesting, will try to play around with ordering - colocation, the
> >>> solution must be in these settings...
> >>>
> >>> Best regards,
> >>> Lorand
> >>>
> >>> Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
> >>> cib_perform_op: Diff: --- 0.215.7 2
> >>> Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
> >>> cib_perform_op: Diff: +++ 0.215.8 (null)
> >>> Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
> >>> cib_perform_op: + /cib: @num_updates=8
> >>> Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
> >>> cib_perform_op: ++
> >>>
> >>
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']:
> >>> <lrm_rsc_op id="postfix_last_failure_0"
> >>> operation_key="postfix_monitor_45000" operation="monitor"
> >>> crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
> >>> transition-key="86:2962:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
> >>> transition-magic="0:7;86:2962:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
> >>> on_node="mail1" call-id="1333" rc-code="7"
> >>> Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info:
> >>> abort_transition_graph: Transition aborted by
> postfix_monitor_45000
> >>> 'create' on mail1: Inactive graph
> >>> (magic=0:7;86:2962:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, cib=0.215.8,
> >>> source=process_graph_event:598, 1)
> >>> Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info:
> >>> update_failcount: Updating failcount for postfix on mail1 after
> >> failed
> >>> monitor: rc=7 (update=value++, time=1458124686)
> >>
> >> I don't think your constraints are causing problems now; the above
> >> message indicates that the postfix resource failed. Postfix may not be
> >> able to run anywhere because it's already failed on both nodes, and the
> >> IP would be down because it has to be colocated with postfix, and
> >> postfix can't run.
> >>
> >> The rc=7 above indicates that the postfix agent's monitor operation
> >> returned 7, which is "not running". I'd check the logs for postfix
> errors.
> >>
> >>> Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info:
> >>> process_graph_event: Detected action (2962.86)
> >>> postfix_monitor_45000.1333=not running: failed
> >>> Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
> >>> attrd_client_update: Expanded fail-count-postfix=value++ to 1
> >>> Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
> >>> cib_process_request: Completed cib_modify operation for section
> status:
> >> OK
> >>> (rc=0, origin=mail1/crmd/253, version=0.215.8)
> >>> Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
> >>> attrd_peer_update: Setting fail-count-postfix[mail1]: (null) -> 1
> from
> >>> mail2
> >>> Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: notice:
> >>> do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [
> >>> input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> >>> Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
> >>> write_attribute: Sent update 406 with 2 changes for
> >>> fail-count-postfix, id=<n/a>, set=(null)
> >>> Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
> >>> attrd_peer_update: Setting last-failure-postfix[mail1]: 1458124291
> ->
> >>> 1458124686 from mail2
> >>> Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
> >>> write_attribute: Sent update 407 with 2 changes for
> >>> last-failure-postfix, id=<n/a>, set=(null)
> >>> Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
> >>> cib_process_request: Forwarding cib_modify operation for section
> status
> >> to
> >>> master (origin=local/attrd/406)
> >>> Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
> >>> cib_process_request: Forwarding cib_modify operation for section
> status
> >> to
> >>> master (origin=local/attrd/407)
> >>> Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
> >>> cib_perform_op: Diff: --- 0.215.8 2
> >>> Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
> >>> cib_perform_op: Diff: +++ 0.215.9 (null)
> >>> Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
> >>> cib_perform_op: + /cib: @num_updates=9
> >>> Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
> >>> cib_perform_op: ++
> >>>
> >>
> /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']:
> >>> <nvpair id="status-1-fail-count-postfix" name="fail-count-postfix"
> >>> value="1"/>
> >>> Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
> >>> cib_process_request: Completed cib_modify operation for section
> status:
> >> OK
> >>> (rc=0, origin=mail2/attrd/406, version=0.215.9)
> >>> Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
> >>> cib_perform_op: Diff: --- 0.215.9 2
> >>> Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
> >>> cib_perform_op: Diff: +++ 0.215.10 (null)
> >>> Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
> >>> cib_perform_op: + /cib: @num_updates=10
> >>> Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
> >>> cib_perform_op: +
> >>>
> >>
> /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix']:
> >>> @value=1458124686
> >>> Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
> >>> attrd_cib_callback: Update 406 for fail-count-postfix: OK (0)
> >>> Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
> >>> attrd_cib_callback: Update 406 for fail-count-postfix[mail1]=1: OK
> (0)
> >>> Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info:
> >>> cib_process_request: Completed cib_modify operation for section
> status:
> >> OK
> >>> (rc=0, origin=mail2/attrd/407, version=0.215.10)
> >>> Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
> >>> attrd_cib_callback: Update 406 for fail-count-postfix[mail2]=(null):
> OK
> >>> (0)
> >>> Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
> >>> attrd_cib_callback: Update 407 for last-failure-postfix: OK (0)
> >>> Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
> >>> attrd_cib_callback: Update 407 for
> >>> last-failure-postfix[mail1]=1458124686: OK (0)
> >>> Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info:
> >>> attrd_cib_callback: Update 407 for
> >>> last-failure-postfix[mail2]=1457610376: OK (0)
> >>> Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info:
> >>> abort_transition_graph: Transition aborted by
> >>> status-1-fail-count-postfix, fail-count-postfix=1: Transient attribute
> >>> change (create cib=0.215.9, source=abort_unless_down:319,
> >>>
> >>
> path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1'],
> >>> 1)
> >>> Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info:
> >>> abort_transition_graph: Transition aborted by
> >>> status-1-last-failure-postfix, last-failure-postfix=1458124686:
> Transient
> >>> attribute change (modify cib=0.215.10, source=abort_unless_down:319,
> >>>
> >>
> path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix'],
> >>> 1)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice:
> >>> unpack_config: On loss of CCM Quorum: Ignore
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> determine_online_status: Node mail1 is online
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> determine_online_status: Node mail2 is online
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> determine_op_status: Operation monitor found resource mail:0 active in
> >>> master mode on mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> determine_op_status: Operation monitor found resource spool:0 active
> in
> >>> master mode on mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> determine_op_status: Operation monitor found resource fs-spool active
> on
> >>> mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> determine_op_status: Operation monitor found resource fs-spool active
> on
> >>> mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> determine_op_status: Operation monitor found resource fs-mail active
> on
> >>> mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> determine_op_status: Operation monitor found resource fs-mail active
> on
> >>> mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: warning:
> >>> unpack_rsc_op_failure: Processing failed op monitor for postfix
> on
> >>> mail1: not running (7)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> determine_op_status: Operation monitor found resource spool:1 active
> in
> >>> master mode on mail2
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> determine_op_status: Operation monitor found resource mail:1 active in
> >>> master mode on mail2
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> group_print: Resource Group: network-services
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> native_print: virtualip-1 (ocf::heartbeat:IPaddr2):
> >> Started
> >>> mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> clone_print: Master/Slave Set: spool-clone [spool]
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> short_print: Masters: [ mail1 ]
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> short_print: Slaves: [ mail2 ]
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> clone_print: Master/Slave Set: mail-clone [mail]
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> short_print: Masters: [ mail1 ]
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> short_print: Slaves: [ mail2 ]
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> group_print: Resource Group: fs-services
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> native_print: fs-spool (ocf::heartbeat:Filesystem): Started
> >> mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> native_print: fs-mail (ocf::heartbeat:Filesystem): Started
> >> mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> group_print: Resource Group: mail-services
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> native_print: postfix (ocf::heartbeat:postfix): FAILED
> >> mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> master_color: Promoting mail:0 (Master mail1)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> master_color: mail-clone: Promoted 1 instances of a possible 1 to
> master
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> master_color: Promoting spool:0 (Master mail1)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> master_color: spool-clone: Promoted 1 instances of a possible 1 to
> master
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> RecurringOp: Start recurring monitor (45s) for postfix on mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> LogActions: Leave virtualip-1 (Started mail1)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> LogActions: Leave spool:0 (Master mail1)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> LogActions: Leave spool:1 (Slave mail2)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> LogActions: Leave mail:0 (Master mail1)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> LogActions: Leave mail:1 (Slave mail2)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> LogActions: Leave fs-spool (Started mail1)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> LogActions: Leave fs-mail (Started mail1)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice:
> >>> LogActions: Recover postfix (Started mail1)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice:
> >>> process_pe_message: Calculated Transition 2963:
> >>> /var/lib/pacemaker/pengine/pe-input-330.bz2
> >>> Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info:
> >>> handle_response: pe_calc calculation pe_calc-dc-1458124686-5541 is
> >>> obsolete
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice:
> >>> unpack_config: On loss of CCM Quorum: Ignore
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> determine_online_status: Node mail1 is online
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> determine_online_status: Node mail2 is online
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> determine_op_status: Operation monitor found resource mail:0 active in
> >>> master mode on mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> determine_op_status: Operation monitor found resource spool:0 active
> in
> >>> master mode on mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> determine_op_status: Operation monitor found resource fs-spool active
> on
> >>> mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> determine_op_status: Operation monitor found resource fs-spool active
> on
> >>> mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> determine_op_status: Operation monitor found resource fs-mail active
> on
> >>> mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> determine_op_status: Operation monitor found resource fs-mail active
> on
> >>> mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: warning:
> >>> unpack_rsc_op_failure: Processing failed op monitor for postfix
> on
> >>> mail1: not running (7)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> determine_op_status: Operation monitor found resource spool:1 active
> in
> >>> master mode on mail2
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> determine_op_status: Operation monitor found resource mail:1 active in
> >>> master mode on mail2
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> group_print: Resource Group: network-services
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> native_print: virtualip-1 (ocf::heartbeat:IPaddr2):
> >> Started
> >>> mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> clone_print: Master/Slave Set: spool-clone [spool]
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> short_print: Masters: [ mail1 ]
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> short_print: Slaves: [ mail2 ]
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> clone_print: Master/Slave Set: mail-clone [mail]
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> short_print: Masters: [ mail1 ]
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> short_print: Slaves: [ mail2 ]
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> group_print: Resource Group: fs-services
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> native_print: fs-spool (ocf::heartbeat:Filesystem): Started
> >> mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> native_print: fs-mail (ocf::heartbeat:Filesystem): Started
> >> mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> group_print: Resource Group: mail-services
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> native_print: postfix (ocf::heartbeat:postfix): FAILED
> >> mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> get_failcount_full: postfix has failed 1 times on mail1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: warning:
> >>> common_apply_stickiness: Forcing postfix away from mail1 after 1
> >>> failures (max=1)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> master_color: Promoting mail:0 (Master mail1)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> master_color: mail-clone: Promoted 1 instances of a possible 1 to
> master
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> master_color: Promoting spool:0 (Master mail1)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> master_color: spool-clone: Promoted 1 instances of a possible 1 to
> master
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> rsc_merge_weights: fs-mail: Rolling back scores from postfix
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> rsc_merge_weights: postfix: Rolling back scores from virtualip-1
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> native_color: Resource postfix cannot run anywhere
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> native_color: Resource virtualip-1 cannot run anywhere
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice:
> >>> LogActions: Stop virtualip-1 (mail1)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> LogActions: Leave spool:0 (Master mail1)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> LogActions: Leave spool:1 (Slave mail2)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> LogActions: Leave mail:0 (Master mail1)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> LogActions: Leave mail:1 (Slave mail2)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> LogActions: Leave fs-spool (Started mail1)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info:
> >>> LogActions: Leave fs-mail (Started mail1)
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice:
> >>> LogActions: Stop postfix (mail1)
> >>> Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info:
> >>> do_state_transition: State transition S_POLICY_ENGINE ->
> >>> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
> >>> origin=handle_response ]
> >>> Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice:
> >>> process_pe_message: Calculated Transition 2964:
> >>> /var/lib/pacemaker/pengine/pe-input-331.bz2
> >>> Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info:
> >>> do_te_invoke: Processing graph 2964 (ref=pe_calc-dc-1458124686-5542)
> >>> derived from /var/lib/pacemaker/pengine/pe-input-331.bz2
> >>> Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: notice:
> >>> te_rsc_command: Initiating action 5: stop postfix_stop_0 on mail1
> >>> Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info:
> >>> cib_perform_op: Diff: --- 0.215.10 2
> >>> Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info:
> >>> cib_perform_op: Diff: +++ 0.215.11 (null)
> >>> Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info:
> >>> cib_perform_op: + /cib: @num_updates=11
> >>> Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info:
> >>> cib_perform_op: +
> >>>
> >>
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']/lrm_rsc_op[@id='postfix_last_0']:
> >>> @operation_key=postfix_stop_0, @operation=stop,
> >>> @transition-key=5:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> >>> @transition-magic=0:0;5:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> >>> @call-id=1335, @last-run=1458124686, @last-rc-change=1458124686,
> >>> @exec-time=435
> >>> Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: info:
> >>> match_graph_event: Action postfix_stop_0 (5) confirmed on mail1
> (rc=0)
> >>> Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info:
> >>> cib_process_request: Completed cib_modify operation for section
> status:
> >> OK
> >>> (rc=0, origin=mail1/crmd/254, version=0.215.11)
> >>> Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: notice:
> >>> te_rsc_command: Initiating action 12: stop virtualip-1_stop_0 on
> >> mail1
> >>> Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info:
> >>> cib_perform_op: Diff: --- 0.215.11 2
> >>> Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info:
> >>> cib_perform_op: Diff: +++ 0.215.12 (null)
> >>> Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info:
> >>> cib_perform_op: + /cib: @num_updates=12
> >>> Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info:
> >>> cib_perform_op: +
> >>>
> >>
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='virtualip-1']/lrm_rsc_op[@id='virtualip-1_last_0']:
> >>> @operation_key=virtualip-1_stop_0, @operation=stop,
> >>> @transition-key=12:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> >>> @transition-magic=0:0;12:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> >>> @call-id=1337, @last-run=1458124687, @last-rc-change=1458124687,
> >>> @exec-time=56
> >>> Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: info:
> >>> match_graph_event: Action virtualip-1_stop_0 (12) confirmed on mail1
> >>> (rc=0)
> >>> Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info:
> >>> cib_process_request: Completed cib_modify operation for section
> status:
> >> OK
> >>> (rc=0, origin=mail1/crmd/255, version=0.215.12)
> >>> Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: notice:
> >>> run_graph: Transition 2964 (Complete=7, Pending=0, Fired=0,
> Skipped=0,
> >>> Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-331.bz2):
> >> Complete
> >>> Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: info:
> do_log:
> >>> FSA: Input I_TE_SUCCESS from notify_crmd() received in state
> >>> S_TRANSITION_ENGINE
> >>> Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: notice:
> >>> do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [
> >>> input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> >>> Mar 16 11:38:12 [7415] HWJ-626.domain.local cib: info:
> >>> cib_process_ping: Reporting our current digest to mail2:
> >>> ed43bc3ecf0f15853900ba49fc514870 for 0.215.12 (0x152b110 0)
> >>>
> >>>
> >>> On Mon, Mar 14, 2016 at 6:44 PM, Ken Gaillot <kgaillot at redhat.com>
> >> wrote:
> >>>
> >>>> On 03/10/2016 09:49 AM, Lorand Kelemen wrote:
> >>>>> Dear List,
> >>>>>
> >>>>> After the creation and testing of a simple 2 node active-passive
> >>>>> drbd+postfix cluster nearly everything works flawlessly (standby,
> >> failure
> >>>>> of a filesystem resource + failover, splitbrain + manual recovery)
> >>>> however
> >>>>> when delibarately killing the postfix instance, after reaching the
> >>>>> migration threshold failover does not occur and resources revert to
> the
> >>>>> Stopped state (except the master-slave drbd resource, which works as
> >>>>> expected).
> >>>>>
> >>>>> Ordering and colocation is configured, STONITH and quorum disabled,
> the
> >>>>> goal is to always have one node running all the resources and at any
> >> sign
> >>>>> of error it should fail over to the passive node, nothing fancy.
> >>>>>
> >>>>> Is my configuration wrong or am I hitting a bug?
> >>>>>
> >>>>> All software from centos 7 + elrepo repositories.
> >>>>
> >>>> With these versions, you can set "two_node: 1" in
> >>>> /etc/corosync/corosync.conf (which will be done automatically if you
> >>>> used "pcs cluster setup" initially), and then you don't need to ignore
> >>>> quorum in pacemaker.
> >>>>
> >>>>> Regarding STONITH: the machines are running on free ESXi instances on
> >>>>> separate machines, so the Vmware fencing agents won't work because in
> >> the
> >>>>> free version the API is read-only.
> >>>>> Still trying to figure out a way to go, until then manual recovery +
> >> huge
> >>>>> arp cache times on the upstream firewall...
> >>>>>
> >>>>> Please find pe-input*.bz files attached, logs and config below. The
> >>>>> situation: on node mail1 postfix was killed 3 times (migration
> >>>> threshold),
> >>>>> it should have failed over to mail2.
> >>>>> When killing a filesystem resource three times this happens
> flawlessly.
> >>>>>
> >>>>> Thanks for your input!
> >>>>>
> >>>>> Best regards,
> >>>>> Lorand
> >>>>>
> >>>>>
> >>>>> Cluster Name: mailcluster
> >>>>> Corosync Nodes:
> >>>>> mail1 mail2
> >>>>> Pacemaker Nodes:
> >>>>> mail1 mail2
> >>>>>
> >>>>> Resources:
> >>>>> Group: network-services
> >>>>> Resource: virtualip-1 (class=ocf provider=heartbeat type=IPaddr2)
> >>>>> Attributes: ip=10.20.64.10 cidr_netmask=24 nic=lan0
> >>>>> Operations: start interval=0s timeout=20s
> >>>> (virtualip-1-start-interval-0s)
> >>>>> stop interval=0s timeout=20s
> >>>> (virtualip-1-stop-interval-0s)
> >>>>> monitor interval=30s
> (virtualip-1-monitor-interval-30s)
> >>>>> Master: spool-clone
> >>>>> Meta Attrs: master-max=1 master-node-max=1 clone-max=2
> >> clone-node-max=1
> >>>>> notify=true
> >>>>> Resource: spool (class=ocf provider=linbit type=drbd)
> >>>>> Attributes: drbd_resource=spool
> >>>>> Operations: start interval=0s timeout=240
> (spool-start-interval-0s)
> >>>>> promote interval=0s timeout=90
> >> (spool-promote-interval-0s)
> >>>>> demote interval=0s timeout=90
> (spool-demote-interval-0s)
> >>>>> stop interval=0s timeout=100 (spool-stop-interval-0s)
> >>>>> monitor interval=10s (spool-monitor-interval-10s)
> >>>>> Master: mail-clone
> >>>>> Meta Attrs: master-max=1 master-node-max=1 clone-max=2
> >> clone-node-max=1
> >>>>> notify=true
> >>>>> Resource: mail (class=ocf provider=linbit type=drbd)
> >>>>> Attributes: drbd_resource=mail
> >>>>> Operations: start interval=0s timeout=240 (mail-start-interval-0s)
> >>>>> promote interval=0s timeout=90
> >> (mail-promote-interval-0s)
> >>>>> demote interval=0s timeout=90
> (mail-demote-interval-0s)
> >>>>> stop interval=0s timeout=100 (mail-stop-interval-0s)
> >>>>> monitor interval=10s (mail-monitor-interval-10s)
> >>>>> Group: fs-services
> >>>>> Resource: fs-spool (class=ocf provider=heartbeat type=Filesystem)
> >>>>> Attributes: device=/dev/drbd0 directory=/var/spool/postfix
> >> fstype=ext4
> >>>>> options=nodev,nosuid,noexec
> >>>>> Operations: start interval=0s timeout=60
> >> (fs-spool-start-interval-0s)
> >>>>> stop interval=0s timeout=60
> (fs-spool-stop-interval-0s)
> >>>>> monitor interval=20 timeout=40
> >>>> (fs-spool-monitor-interval-20)
> >>>>> Resource: fs-mail (class=ocf provider=heartbeat type=Filesystem)
> >>>>> Attributes: device=/dev/drbd1 directory=/var/spool/mail
> fstype=ext4
> >>>>> options=nodev,nosuid,noexec
> >>>>> Operations: start interval=0s timeout=60
> (fs-mail-start-interval-0s)
> >>>>> stop interval=0s timeout=60 (fs-mail-stop-interval-0s)
> >>>>> monitor interval=20 timeout=40
> >>>> (fs-mail-monitor-interval-20)
> >>>>> Group: mail-services
> >>>>> Resource: postfix (class=ocf provider=heartbeat type=postfix)
> >>>>> Operations: start interval=0s timeout=20s
> >> (postfix-start-interval-0s)
> >>>>> stop interval=0s timeout=20s
> (postfix-stop-interval-0s)
> >>>>> monitor interval=45s (postfix-monitor-interval-45s)
> >>>>>
> >>>>> Stonith Devices:
> >>>>> Fencing Levels:
> >>>>>
> >>>>> Location Constraints:
> >>>>> Ordering Constraints:
> >>>>> start network-services then promote mail-clone (kind:Mandatory)
> >>>>> (id:order-network-services-mail-clone-mandatory)
> >>>>> promote mail-clone then promote spool-clone (kind:Mandatory)
> >>>>> (id:order-mail-clone-spool-clone-mandatory)
> >>>>> promote spool-clone then start fs-services (kind:Mandatory)
> >>>>> (id:order-spool-clone-fs-services-mandatory)
> >>>>> start fs-services then start mail-services (kind:Mandatory)
> >>>>> (id:order-fs-services-mail-services-mandatory)
> >>>>> Colocation Constraints:
> >>>>> network-services with spool-clone (score:INFINITY)
> (rsc-role:Started)
> >>>>> (with-rsc-role:Master)
> >>>> (id:colocation-network-services-spool-clone-INFINITY)
> >>>>> network-services with mail-clone (score:INFINITY)
> (rsc-role:Started)
> >>>>> (with-rsc-role:Master)
> >>>> (id:colocation-network-services-mail-clone-INFINITY)
> >>>>> network-services with fs-services (score:INFINITY)
> >>>>> (id:colocation-network-services-fs-services-INFINITY)
> >>>>> network-services with mail-services (score:INFINITY)
> >>>>> (id:colocation-network-services-mail-services-INFINITY)
> >>>>
> >>>> I'm not sure whether it's causing your issue, but I would make the
> >>>> constraints reflect the logical relationships better.
> >>>>
> >>>> For example, network-services only needs to be colocated with
> >>>> mail-services logically; it's mail-services that needs to be with
> >>>> fs-services, and fs-services that needs to be with
> >>>> spool-clone/mail-clone master. In other words, don't make the
> >>>> highest-level resource depend on everything else, make each level
> depend
> >>>> on the level below it.
> >>>>
> >>>> Also, I would guess that the virtual IP only needs to be ordered
> before
> >>>> mail-services, and mail-clone and spool-clone could both be ordered
> >>>> before fs-services, rather than ordering mail-clone before
> spool-clone.
> >>>>
> >>>>> Resources Defaults:
> >>>>> migration-threshold: 3
> >>>>> Operations Defaults:
> >>>>> on-fail: restart
> >>>>>
> >>>>> Cluster Properties:
> >>>>> cluster-infrastructure: corosync
> >>>>> cluster-name: mailcluster
> >>>>> cluster-recheck-interval: 5min
> >>>>> dc-version: 1.1.13-10.el7_2.2-44eb2dd
> >>>>> default-resource-stickiness: infinity
> >>>>> have-watchdog: false
> >>>>> last-lrm-refresh: 1457613674
> >>>>> no-quorum-policy: ignore
> >>>>> pe-error-series-max: 1024
> >>>>> pe-input-series-max: 1024
> >>>>> pe-warn-series-max: 1024
> >>>>> stonith-enabled: false
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: Diff: --- 0.197.15 2
> >>>>> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: Diff: +++ 0.197.16 (null)
> >>>>> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: + /cib: @num_updates=16
> >>>>> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: +
> >>>>>
> >>>>
> >>
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']/lrm_rsc_op[@id='postfix_last_failure_0']:
> >>>>> @transition-key=4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> >>>>> @transition-magic=0:7;4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> >>>>> @call-id=1274, @last-rc-change=1457613440
> >>>>> Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info:
> >>>>> abort_transition_graph: Transition aborted by
> >> postfix_monitor_45000
> >>>>> 'modify' on mail1: Inactive graph
> >>>>> (magic=0:7;4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> cib=0.197.16,
> >>>>> source=process_graph_event:598, 1)
> >>>>> Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info:
> >>>>> update_failcount: Updating failcount for postfix on mail1 after
> >>>> failed
> >>>>> monitor: rc=7 (update=value++, time=1457613440)
> >>>>> Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
> >>>>> attrd_client_update: Expanded fail-count-postfix=value++ to 3
> >>>>> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_process_request: Completed cib_modify operation for section
> >> status:
> >>>> OK
> >>>>> (rc=0, origin=mail1/crmd/196, version=0.197.16)
> >>>>> Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
> >>>>> attrd_peer_update: Setting fail-count-postfix[mail1]: 2 -> 3 from
> >>>> mail2
> >>>>> Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
> >>>>> write_attribute: Sent update 400 with 2 changes for
> >>>>> fail-count-postfix, id=<n/a>, set=(null)
> >>>>> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_process_request: Forwarding cib_modify operation for section
> >> status
> >>>> to
> >>>>> master (origin=local/attrd/400)
> >>>>> Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info:
> >>>>> process_graph_event: Detected action (1234.4)
> >>>>> postfix_monitor_45000.1274=not running: failed
> >>>>> Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
> >>>>> attrd_peer_update: Setting last-failure-postfix[mail1]: 1457613347
> >> ->
> >>>>> 1457613440 from mail2
> >>>>> Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: notice:
> >>>>> do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [
> >>>>> input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> >>>>> Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
> >>>>> write_attribute: Sent update 401 with 2 changes for
> >>>>> last-failure-postfix, id=<n/a>, set=(null)
> >>>>> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: Diff: --- 0.197.16 2
> >>>>> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: Diff: +++ 0.197.17 (null)
> >>>>> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: + /cib: @num_updates=17
> >>>>> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: +
> >>>>>
> >>>>
> >>
> /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-fail-count-postfix']:
> >>>>> @value=3
> >>>>> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_process_request: Completed cib_modify operation for section
> >> status:
> >>>> OK
> >>>>> (rc=0, origin=mail2/attrd/400, version=0.197.17)
> >>>>> Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info:
> >>>>> abort_transition_graph: Transition aborted by
> >>>>> status-1-fail-count-postfix, fail-count-postfix=3: Transient
> attribute
> >>>>> change (modify cib=0.197.17, source=abort_unless_down:319,
> >>>>>
> >>>>
> >>
> path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-fail-count-postfix'],
> >>>>> 1)
> >>>>> Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
> >>>>> attrd_cib_callback: Update 400 for fail-count-postfix: OK (0)
> >>>>> Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
> >>>>> attrd_cib_callback: Update 400 for fail-count-postfix[mail1]=3: OK
> >> (0)
> >>>>> Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:
> >>>>> attrd_cib_callback: Update 400 for
> fail-count-postfix[mail2]=(null):
> >> OK
> >>>>> (0)
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_process_request: Forwarding cib_modify operation for section
> >> status
> >>>> to
> >>>>> master (origin=local/attrd/401)
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: Diff: --- 0.197.17 2
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: Diff: +++ 0.197.18 (null)
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: + /cib: @num_updates=18
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: +
> >>>>>
> >>>>
> >>
> /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix']:
> >>>>> @value=1457613440
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_process_request: Completed cib_modify operation for section
> >> status:
> >>>> OK
> >>>>> (rc=0, origin=mail2/attrd/401, version=0.197.18)
> >>>>> Mar 10 13:37:21 [7418] HWJ-626.domain.local attrd: info:
> >>>>> attrd_cib_callback: Update 401 for last-failure-postfix: OK (0)
> >>>>> Mar 10 13:37:21 [7418] HWJ-626.domain.local attrd: info:
> >>>>> attrd_cib_callback: Update 401 for
> >>>>> last-failure-postfix[mail1]=1457613440: OK (0)
> >>>>> Mar 10 13:37:21 [7418] HWJ-626.domain.local attrd: info:
> >>>>> attrd_cib_callback: Update 401 for
> >>>>> last-failure-postfix[mail2]=1457610376: OK (0)
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
> >>>>> abort_transition_graph: Transition aborted by
> >>>>> status-1-last-failure-postfix, last-failure-postfix=1457613440:
> >> Transient
> >>>>> attribute change (modify cib=0.197.18, source=abort_unless_down:319,
> >>>>>
> >>>>
> >>
> path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix'],
> >>>>> 1)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> >>>>> unpack_config: On loss of CCM Quorum: Ignore
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> determine_online_status: Node mail1 is online
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> determine_online_status: Node mail2 is online
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> determine_op_status: Operation monitor found resource mail:0 active
> in
> >>>>> master mode on mail1
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> determine_op_status: Operation monitor found resource spool:0 active
> >> in
> >>>>> master mode on mail1
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> determine_op_status: Operation monitor found resource fs-spool
> active
> >> on
> >>>>> mail1
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> determine_op_status: Operation monitor found resource fs-mail active
> >> on
> >>>>> mail1
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning:
> >>>>> unpack_rsc_op_failure: Processing failed op monitor for
> postfix
> >> on
> >>>>> mail1: not running (7)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> determine_op_status: Operation monitor found resource spool:1 active
> >> in
> >>>>> master mode on mail2
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> determine_op_status: Operation monitor found resource mail:1 active
> in
> >>>>> master mode on mail2
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> group_print: Resource Group: network-services
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> native_print: virtualip-1 (ocf::heartbeat:IPaddr2):
> >>>> Started
> >>>>> mail1
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> clone_print: Master/Slave Set: spool-clone [spool]
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> short_print: Masters: [ mail1 ]
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> short_print: Slaves: [ mail2 ]
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> clone_print: Master/Slave Set: mail-clone [mail]
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> short_print: Masters: [ mail1 ]
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> short_print: Slaves: [ mail2 ]
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> group_print: Resource Group: fs-services
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> native_print: fs-spool (ocf::heartbeat:Filesystem): Started
> >>>> mail1
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> native_print: fs-mail (ocf::heartbeat:Filesystem): Started
> >>>> mail1
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> group_print: Resource Group: mail-services
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> native_print: postfix (ocf::heartbeat:postfix): FAILED
> >>>> mail1
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> get_failcount_full: postfix has failed 3 times on mail1
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning:
> >>>>> common_apply_stickiness: Forcing postfix away from mail1 after 3
> >>>>> failures (max=3)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> master_color: Promoting mail:0 (Master mail1)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> master_color: mail-clone: Promoted 1 instances of a possible 1 to
> >> master
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> master_color: Promoting spool:0 (Master mail1)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> master_color: spool-clone: Promoted 1 instances of a possible 1 to
> >> master
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> rsc_merge_weights: postfix: Rolling back scores from virtualip-1
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> native_color: Resource virtualip-1 cannot run anywhere
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> RecurringOp: Start recurring monitor (45s) for postfix on mail2
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> >>>>> LogActions: Stop virtualip-1 (mail1)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> LogActions: Leave spool:0 (Master mail1)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> LogActions: Leave spool:1 (Slave mail2)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> LogActions: Leave mail:0 (Master mail1)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> LogActions: Leave mail:1 (Slave mail2)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> >>>>> LogActions: Stop fs-spool (Started mail1)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> >>>>> LogActions: Stop fs-mail (Started mail1)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> >>>>> LogActions: Stop postfix (Started mail1)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> >>>>> process_pe_message: Calculated Transition 1235:
> >>>>> /var/lib/pacemaker/pengine/pe-input-302.bz2
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
> >>>>> handle_response: pe_calc calculation pe_calc-dc-1457613441-3756
> is
> >>>>> obsolete
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> >>>>> unpack_config: On loss of CCM Quorum: Ignore
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> determine_online_status: Node mail1 is online
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> determine_online_status: Node mail2 is online
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> determine_op_status: Operation monitor found resource mail:0 active
> in
> >>>>> master mode on mail1
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> determine_op_status: Operation monitor found resource spool:0 active
> >> in
> >>>>> master mode on mail1
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> determine_op_status: Operation monitor found resource fs-spool
> active
> >> on
> >>>>> mail1
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> determine_op_status: Operation monitor found resource fs-mail active
> >> on
> >>>>> mail1
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning:
> >>>>> unpack_rsc_op_failure: Processing failed op monitor for
> postfix
> >> on
> >>>>> mail1: not running (7)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> determine_op_status: Operation monitor found resource spool:1 active
> >> in
> >>>>> master mode on mail2
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> determine_op_status: Operation monitor found resource mail:1 active
> in
> >>>>> master mode on mail2
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> group_print: Resource Group: network-services
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> native_print: virtualip-1 (ocf::heartbeat:IPaddr2):
> >>>> Started
> >>>>> mail1
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> clone_print: Master/Slave Set: spool-clone [spool]
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> short_print: Masters: [ mail1 ]
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> short_print: Slaves: [ mail2 ]
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> clone_print: Master/Slave Set: mail-clone [mail]
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> short_print: Masters: [ mail1 ]
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> short_print: Slaves: [ mail2 ]
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> group_print: Resource Group: fs-services
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> native_print: fs-spool (ocf::heartbeat:Filesystem): Started
> >>>> mail1
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> native_print: fs-mail (ocf::heartbeat:Filesystem): Started
> >>>> mail1
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> group_print: Resource Group: mail-services
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> native_print: postfix (ocf::heartbeat:postfix): FAILED
> >>>> mail1
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> get_failcount_full: postfix has failed 3 times on mail1
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning:
> >>>>> common_apply_stickiness: Forcing postfix away from mail1 after 3
> >>>>> failures (max=3)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> master_color: Promoting mail:0 (Master mail1)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> master_color: mail-clone: Promoted 1 instances of a possible 1 to
> >> master
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> master_color: Promoting spool:0 (Master mail1)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> master_color: spool-clone: Promoted 1 instances of a possible 1 to
> >> master
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> rsc_merge_weights: postfix: Rolling back scores from virtualip-1
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> native_color: Resource virtualip-1 cannot run anywhere
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> RecurringOp: Start recurring monitor (45s) for postfix on mail2
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> >>>>> LogActions: Stop virtualip-1 (mail1)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> LogActions: Leave spool:0 (Master mail1)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> LogActions: Leave spool:1 (Slave mail2)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> LogActions: Leave mail:0 (Master mail1)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:
> >>>>> LogActions: Leave mail:1 (Slave mail2)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> >>>>> LogActions: Stop fs-spool (Started mail1)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> >>>>> LogActions: Stop fs-mail (Started mail1)
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> >>>>> LogActions: Stop postfix (Started mail1)
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
> >>>>> do_state_transition: State transition S_POLICY_ENGINE ->
> >>>>> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
> >>>>> origin=handle_response ]
> >>>>> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:
> >>>>> process_pe_message: Calculated Transition 1236:
> >>>>> /var/lib/pacemaker/pengine/pe-input-303.bz2
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
> >>>>> do_te_invoke: Processing graph 1236 (ref=pe_calc-dc-1457613441-3757)
> >>>>> derived from /var/lib/pacemaker/pengine/pe-input-303.bz2
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> >>>>> te_rsc_command: Initiating action 12: stop virtualip-1_stop_0
> on
> >>>> mail1
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> >>>>> te_rsc_command: Initiating action 5: stop postfix_stop_0 on
> mail1
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: Diff: --- 0.197.18 2
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: Diff: +++ 0.197.19 (null)
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: + /cib: @num_updates=19
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: +
> >>>>>
> >>>>
> >>
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='virtualip-1']/lrm_rsc_op[@id='virtualip-1_last_0']:
> >>>>> @operation_key=virtualip-1_stop_0, @operation=stop,
> >>>>> @transition-key=12:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> >>>>> @transition-magic=0:0;12:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> >>>>> @call-id=1276, @last-run=1457613441, @last-rc-change=1457613441,
> >>>>> @exec-time=66
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_process_request: Completed cib_modify operation for section
> >> status:
> >>>> OK
> >>>>> (rc=0, origin=mail1/crmd/197, version=0.197.19)
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
> >>>>> match_graph_event: Action virtualip-1_stop_0 (12) confirmed on
> mail1
> >>>>> (rc=0)
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: Diff: --- 0.197.19 2
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: Diff: +++ 0.197.20 (null)
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: + /cib: @num_updates=20
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: +
> >>>>>
> >>>>
> >>
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']/lrm_rsc_op[@id='postfix_last_0']:
> >>>>> @operation_key=postfix_stop_0, @operation=stop,
> >>>>> @transition-key=5:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> >>>>> @transition-magic=0:0;5:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> >>>>> @call-id=1278, @last-run=1457613441, @last-rc-change=1457613441,
> >>>>> @exec-time=476
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
> >>>>> match_graph_event: Action postfix_stop_0 (5) confirmed on mail1
> >> (rc=0)
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> >>>>> te_rsc_command: Initiating action 79: stop fs-mail_stop_0 on
> >> mail1
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_process_request: Completed cib_modify operation for section
> >> status:
> >>>> OK
> >>>>> (rc=0, origin=mail1/crmd/198, version=0.197.20)
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: Diff: --- 0.197.20 2
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: Diff: +++ 0.197.21 (null)
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: + /cib: @num_updates=21
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: +
> >>>>>
> >>>>
> >>
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='fs-mail']/lrm_rsc_op[@id='fs-mail_last_0']:
> >>>>> @operation_key=fs-mail_stop_0, @operation=stop,
> >>>>> @transition-key=79:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> >>>>> @transition-magic=0:0;79:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> >>>>> @call-id=1280, @last-run=1457613441, @last-rc-change=1457613441,
> >>>>> @exec-time=88, @queue-time=1
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_process_request: Completed cib_modify operation for section
> >> status:
> >>>> OK
> >>>>> (rc=0, origin=mail1/crmd/199, version=0.197.21)
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
> >>>>> match_graph_event: Action fs-mail_stop_0 (79) confirmed on mail1
> >>>> (rc=0)
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> >>>>> te_rsc_command: Initiating action 77: stop fs-spool_stop_0 on
> >> mail1
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: Diff: --- 0.197.21 2
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: Diff: +++ 0.197.22 (null)
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: + /cib: @num_updates=22
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_perform_op: +
> >>>>>
> >>>>
> >>
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='fs-spool']/lrm_rsc_op[@id='fs-spool_last_0']:
> >>>>> @operation_key=fs-spool_stop_0, @operation=stop,
> >>>>> @transition-key=77:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> >>>>> @transition-magic=0:0;77:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> >>>>> @call-id=1282, @last-run=1457613441, @last-rc-change=1457613441,
> >>>>> @exec-time=86
> >>>>> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_process_request: Completed cib_modify operation for section
> >> status:
> >>>> OK
> >>>>> (rc=0, origin=mail1/crmd/200, version=0.197.22)
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
> >>>>> match_graph_event: Action fs-spool_stop_0 (77) confirmed on mail1
> >>>> (rc=0)
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: warning:
> >>>>> run_graph: Transition 1236 (Complete=11, Pending=0, Fired=0,
> >>>> Skipped=0,
> >>>>> Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-303.bz2):
> >>>>> Terminated
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: warning:
> >>>>> te_graph_trigger: Transition failed: terminated
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> >>>>> print_graph: Graph 1236 with 12 actions: batch-limit=12 jobs,
> >>>>> network-delay=0ms
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> >>>>> print_synapse: [Action 16]: Completed pseudo op
> >>>>> network-services_stopped_0 on N/A (priority: 0, waiting: none)
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> >>>>> print_synapse: [Action 15]: Completed pseudo op
> >>>>> network-services_stop_0 on N/A (priority: 0, waiting: none)
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> >>>>> print_synapse: [Action 12]: Completed rsc op
> >> virtualip-1_stop_0
> >>>>> on mail1 (priority: 0, waiting: none)
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> >>>>> print_synapse: [Action 84]: Completed pseudo op
> >>>>> fs-services_stopped_0 on N/A (priority: 0, waiting: none)
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> >>>>> print_synapse: [Action 83]: Completed pseudo op
> >>>> fs-services_stop_0
> >>>>> on N/A (priority: 0, waiting: none)
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> >>>>> print_synapse: [Action 77]: Completed rsc op fs-spool_stop_0
> >>>>> on mail1 (priority: 0, waiting: none)
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> >>>>> print_synapse: [Action 79]: Completed rsc op fs-mail_stop_0
> >>>>> on mail1 (priority: 0, waiting: none)
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> >>>>> print_synapse: [Action 90]: Completed pseudo op
> >>>>> mail-services_stopped_0 on N/A (priority: 0, waiting: none)
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> >>>>> print_synapse: [Action 89]: Completed pseudo op
> >>>>> mail-services_stop_0 on N/A (priority: 0, waiting: none)
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> >>>>> print_synapse: [Action 86]: Pending rsc op
> >> postfix_monitor_45000
> >>>>> on mail2 (priority: 0, waiting: none)
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> >>>>> print_synapse: * [Input 85]: Unresolved dependency rsc op
> >>>>> postfix_start_0 on mail2
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> >>>>> print_synapse: [Action 5]: Completed rsc op postfix_stop_0
> >>>>> on mail1 (priority: 0, waiting: none)
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> >>>>> print_synapse: [Action 8]: Completed pseudo op all_stopped
> >>>>> on N/A (priority: 0, waiting: none)
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:
> >> do_log:
> >>>>> FSA: Input I_TE_SUCCESS from notify_crmd() received in state
> >>>>> S_TRANSITION_ENGINE
> >>>>> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:
> >>>>> do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE
> [
> >>>>> input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> >>>>> Mar 10 13:37:26 [7415] HWJ-626.domain.local cib: info:
> >>>>> cib_process_ping: Reporting our current digest to mail2:
> >>>>> 3896ee29cdb6ba128330b0ef6e41bd79 for 0.197.22 (0x1544a30 0)
> >>
> >>
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://clusterlabs.org/pipermail/users/attachments/20160318/4bbadd46/attachment-0001.html>
More information about the Users
mailing list