[Pacemaker] Pacemaker resource migration behaviour

Wed Mar 6 01:32:08 EST 2013

Unfortunately the config only tells half of the story, the really
important parts are in the status.
Do you still happen to have
/opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-156.bz2 on mu
around?  That would have what we need.

On Wed, Feb 6, 2013 at 1:12 AM, James Guthrie <jag at open.ch> wrote:
> Hi all,
>
> as a follow-up to this, I realised that I needed to slightly change the way the resource constraints are put together, but I'm still seeing the same behaviour.
>
> Below are an excerpt from the logs on the host and the revised xml configuration. In this case, I caused two failures on the host mu, which forced the resources onto nu then I forced two failures on nu. What can be seen in the logs are the two detected failures on nu (the "warning: update_failcount:" lines). After the two failures on nu, the VIP is migrated back to mu, but none of the "support" resources are promoted with it.
>
> Regards,
> James
>
> <1c>Feb  5 14:58:45 mu crmd[31482]:  warning: update_failcount: Updating failcount for sub-squid on nu after failed monitor: rc=9 (update=value++, time=1360072725)
> <1d>Feb  5 14:58:45 mu crmd[31482]:   notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> <1d>Feb  5 14:58:45 mu pengine[31481]:   notice: unpack_config: On loss of CCM Quorum: Ignore
> <1c>Feb  5 14:58:45 mu pengine[31481]:  warning: unpack_rsc_op: Processing failed op monitor for sub-squid:0 on mu: master (failed) (9)
> <1c>Feb  5 14:58:45 mu pengine[31481]:  warning: unpack_rsc_op: Processing failed op monitor for sub-squid:0 on nu: master (failed) (9)
> <1c>Feb  5 14:58:45 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:58:45 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:58:45 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from mu after 2 failures (max=2)
> <1d>Feb  5 14:58:45 mu pengine[31481]:   notice: LogActions: Recover sub-squid:0        (Master nu)
> <1d>Feb  5 14:58:45 mu pengine[31481]:   notice: process_pe_message: Calculated Transition 64: /opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-152.bz2
> <1d>Feb  5 14:58:45 mu pengine[31481]:   notice: unpack_config: On loss of CCM Quorum: Ignore
> <1c>Feb  5 14:58:45 mu pengine[31481]:  warning: unpack_rsc_op: Processing failed op monitor for sub-squid:0 on mu: master (failed) (9)
> <1c>Feb  5 14:58:45 mu pengine[31481]:  warning: unpack_rsc_op: Processing failed op monitor for sub-squid:0 on nu: master (failed) (9)
> <1c>Feb  5 14:58:45 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:58:45 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:58:45 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from mu after 2 failures (max=2)
> <1d>Feb  5 14:58:45 mu pengine[31481]:   notice: LogActions: Recover sub-squid:0        (Master nu)
> <1d>Feb  5 14:58:45 mu pengine[31481]:   notice: process_pe_message: Calculated Transition 65: /opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-153.bz2
> <1d>Feb  5 14:58:48 mu crmd[31482]:   notice: run_graph: Transition 65 (Complete=14, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-153.bz2): Complete
> <1d>Feb  5 14:58:48 mu crmd[31482]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> <1d>Feb  5 14:58:58 mu conntrack-tools[1677]: flushing kernel conntrack table (scheduled)
> <1c>Feb  5 14:59:10 mu crmd[31482]:  warning: update_failcount: Updating failcount for sub-squid on nu after failed monitor: rc=9 (update=value++, time=1360072750)
> <1d>Feb  5 14:59:10 mu crmd[31482]:   notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: unpack_config: On loss of CCM Quorum: Ignore
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: unpack_rsc_op: Processing failed op monitor for sub-squid:0 on mu: master (failed) (9)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: unpack_rsc_op: Processing failed op monitor for sub-squid:0 on nu: master (failed) (9)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from nu after 2 failures (max=2)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from nu after 2 failures (max=2)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from nu after 2 failures (max=2)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Demote  conntrackd:1       (Master -> Slave nu)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Demote  condition:1        (Master -> Slave nu)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Demote  sub-ospfd:1        (Master -> Slave nu)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Demote  sub-ripd:1 (Master -> Slave nu)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Demote  sub-squid:0        (Master -> Stopped nu)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Move    eth1-0-192.168.1.10        (Started nu -> mu)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: process_pe_message: Calculated Transition 66: /opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-154.bz2
> <1d>Feb  5 14:59:10 mu crmd[31482]:   notice: process_lrm_event: LRM operation conntrackd_notify_0 (call=996, rc=0, cib-update=0, confirmed=true) ok
> <1d>Feb  5 14:59:10 mu crmd[31482]:   notice: run_graph: Transition 66 (Complete=21, Pending=0, Fired=0, Skipped=15, Incomplete=6, Source=/opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-154.bz2): Stopped
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: unpack_config: On loss of CCM Quorum: Ignore
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: unpack_rsc_op: Processing failed op monitor for sub-squid:0 on mu: master (failed) (9)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: unpack_rsc_op: Processing failed op monitor for sub-squid:0 on nu: master (failed) (9)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from nu after 2 failures (max=2)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from nu after 2 failures (max=2)
> <1c>Feb  5 14:59:10 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from nu after 2 failures (max=2)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Demote  conntrackd:1       (Master -> Slave nu)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Stop    sub-squid:0        (nu)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: LogActions: Start   eth1-0-192.168.1.10        (mu)
> <1d>Feb  5 14:59:10 mu pengine[31481]:   notice: process_pe_message: Calculated Transition 67: /opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-155.bz2
> <1d>Feb  5 14:59:10 mu crmd[31482]:   notice: process_lrm_event: LRM operation conntrackd_notify_0 (call=1001, rc=0, cib-update=0, confirmed=true) ok
> <1e>Feb  5 14:59:10 mu IPaddr2(eth1-0-192.168.1.10)[19429]: INFO: Adding inet address 192.168.1.10/24 with broadcast address 192.168.1.255 to device eth1 (with label eth1:0)
> <1e>Feb  5 14:59:10 mu IPaddr2(eth1-0-192.168.1.10)[19429]: INFO: Bringing device eth1 up
> <1e>Feb  5 14:59:11 mu IPaddr2(eth1-0-192.168.1.10)[19429]: INFO: /opt/OSAGpcmk/resource-agents/lib/heartbeat/send_arp -i 200 -r 5 -p /opt/OSAGpcmk/resource-agents/var/run/resource-agents/send_arp-192.168.1.10 eth1 192.168.1.10 auto not_used not_used
> <1d>Feb  5 14:59:12 mu crmd[31482]:   notice: process_lrm_event: LRM operation eth1-0-192.168.1.10_start_0 (call=999, rc=0, cib-update=553, confirmed=true) ok
> <1d>Feb  5 14:59:12 mu crmd[31482]:   notice: process_lrm_event: LRM operation conntrackd_notify_0 (call=1005, rc=0, cib-update=0, confirmed=true) ok
> <1d>Feb  5 14:59:12 mu crmd[31482]:   notice: run_graph: Transition 67 (Complete=20, Pending=0, Fired=0, Skipped=3, Incomplete=0, Source=/opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-155.bz2): Stopped
> <1d>Feb  5 14:59:12 mu pengine[31481]:   notice: unpack_config: On loss of CCM Quorum: Ignore
> <1c>Feb  5 14:59:12 mu pengine[31481]:  warning: unpack_rsc_op: Processing failed op monitor for sub-squid:0 on mu: master (failed) (9)
> <1c>Feb  5 14:59:12 mu pengine[31481]:  warning: unpack_rsc_op: Processing failed op monitor for sub-squid:0 on nu: master (failed) (9)
> <1c>Feb  5 14:59:12 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:59:12 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:59:12 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from mu after 2 failures (max=2)
> <1c>Feb  5 14:59:12 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from nu after 2 failures (max=2)
> <1c>Feb  5 14:59:12 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from nu after 2 failures (max=2)
> <1c>Feb  5 14:59:12 mu pengine[31481]:  warning: common_apply_stickiness: Forcing master-squid away from nu after 2 failures (max=2)
> <1d>Feb  5 14:59:12 mu pengine[31481]:   notice: process_pe_message: Calculated Transition 68: /opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-156.bz2
> <1d>Feb  5 14:59:12 mu crmd[31482]:   notice: process_lrm_event: LRM operation eth1-0-192.168.1.10_monitor_10000 (call=1008, rc=0, cib-update=555, confirmed=false) ok
> <1d>Feb  5 14:59:12 mu crmd[31482]:   notice: run_graph: Transition 68 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-156.bz2): Complete
> <1d>Feb  5 14:59:12 mu crmd[31482]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
>
> <resources>
>   <!--resource for conntrackd-->
>   <master id="master-conntrackd">
>     <meta_attributes id="master-conntrackd-meta_attributes">
>       <nvpair id="master-conntrackd-meta_attributes-notify" name="notify" value="true"/>
>       <nvpair id="master-conntrackd-meta_attributes-interleave" name="interleave" value="true"/>
>       <nvpair id="master-conntrackd-meta_attributes-target-role" name="target-role" value="Master"/>
>       <nvpair id="master-conndtrakd-meta_attributes-failure-timeout" name="failure-timeout" value="600"/>
>       <nvpair id="master-conntrackd-meta_attributes-migration-threshold" name="migration-threshold" value="2"/>
>     </meta_attributes>
>     <primitive id="conntrackd" class="ocf" provider="OSAG" type="conntrackd">
>       <operations>
>         <op id="conntrackd-slave-check" name="monitor" interval="60" role="Slave" />
>         <op id="conntrackd-master-check" name="monitor" interval="61" role="Master" />
>       </operations>
>     </primitive>
>   </master>
>
>   <!--resource for condition files-->
>   <master id="master-condition">
>     <meta_attributes id="master-condition-meta_attributes">
>       <nvpair id="master-condition-meta_attributes-notify" name="notify" value="false"/>
>       <nvpair id="master-condition-meta_attributes-interleave" name="interleave" value="true"/>
>       <nvpair id="master-condition-meta_attributes-target-role" name="target-role" value="Master"/>
>       <nvpair id="master-condition-meta_attributes-failure-timeout" name="failure-timeout" value="600"/>
>       <nvpair id="master-condition-meta_attributes-migration-threshold" name="migration-threshold" value="2"/>
>     </meta_attributes>
>     <primitive id="condition" class="ocf" provider="OSAG" type="condition">
>       <instance_attributes id="condition-attrs">
>       </instance_attributes>
>       <operations>
>         <op id="condition-slave-check" name="monitor" interval="60" role="Slave" />
>         <op id="condition-master-check" name="monitor" interval="61" role="Master" />
>       </operations>
>     </primitive>
>   </master>
>
>   <!--resource for subsystem ospfd-->
>   <master id="master-ospfd">
>     <meta_attributes id="master-ospfd-meta_attributes">
>       <nvpair id="master-ospfd-meta_attributes-notify" name="notify" value="false"/>
>       <nvpair id="master-ospfd-meta_attributes-interleave" name="interleave" value="true"/>
>       <nvpair id="master-ospfd-meta_attributes-target-role" name="target-role" value="Master"/>
>       <nvpair id="master-ospfd-meta_attributes-failure-timeout" name="failure-timeout" value="600"/>
>       <nvpair id="master-ospfd-meta_attributes-migration-threshold" name="migration-threshold" value="2"/>
>     </meta_attributes>
>     <primitive id="sub-ospfd" class="ocf" provider="OSAG" type="osaginit">
>       <instance_attributes id="ospfd-attrs">
>         <nvpair id="ospfd-script" name="script" value="ospfd.init"/>
>       </instance_attributes>
>       <operations>
>         <op id="ospfd-slave-check" name="monitor" interval="10" role="Slave" />
>         <op id="ospfd-master-check" name="monitor" interval="11" role="Master" />
>       </operations>
>     </primitive>
>   </master>
>   <!--resource for subsystem ripd-->
>   <master id="master-ripd">
>     <meta_attributes id="master-ripd-meta_attributes">
>       <nvpair id="master-ripd-meta_attributes-notify" name="notify" value="false"/>
>       <nvpair id="master-ripd-meta_attributes-interleave" name="interleave" value="true"/>
>       <nvpair id="master-ripd-meta_attributes-target-role" name="target-role" value="Master"/>
>       <nvpair id="master-ripd-meta_attributes-failure-timeout" name="failure-timeout" value="600"/>
>       <nvpair id="master-ripd-meta_attributes-migration-threshold" name="migration-threshold" value="2"/>
>     </meta_attributes>
>     <primitive id="sub-ripd" class="ocf" provider="OSAG" type="osaginit">
>       <instance_attributes id="ripd-attrs">
>         <nvpair id="ripd-script" name="script" value="ripd.init"/>
>       </instance_attributes>
>       <operations>
>         <op id="ripd-slave-check" name="monitor" interval="10" role="Slave" />
>         <op id="ripd-master-check" name="monitor" interval="11" role="Master" />
>       </operations>
>     </primitive>
>   </master>
>   <!--resource for subsystem squid-->
>   <master id="master-squid">
>     <meta_attributes id="master-squid-meta_attributes">
>       <nvpair id="master-squid-meta_attributes-notify" name="notify" value="false"/>
>       <nvpair id="master-squid-meta_attributes-interleave" name="interleave" value="true"/>
>       <nvpair id="master-squid-meta_attributes-target-role" name="target-role" value="Master"/>
>       <nvpair id="master-squid-meta_attributes-failure-timeout" name="failure-timeout" value="600"/>
>       <nvpair id="master-squid-meta_attributes-migration-threshold" name="migration-threshold" value="2"/>
>     </meta_attributes>
>     <primitive id="sub-squid" class="ocf" provider="OSAG" type="osaginit">
>       <instance_attributes id="squid-attrs">
>         <nvpair id="squid-script" name="script" value="squid.init"/>
>       </instance_attributes>
>       <operations>
>         <op id="squid-slave-check" name="monitor" interval="10" role="Slave" />
>         <op id="squid-master-check" name="monitor" interval="11" role="Master" />
>       </operations>
>     </primitive>
>   </master>
>
>   <!--resource for interface checks -->
>   <clone id="clone-IFcheck">
>     <primitive id="IFcheck" class="ocf" provider="OSAG" type="ifmonitor">
>       <instance_attributes id="resIFcheck-attrs">
>         <nvpair id="IFcheck-interfaces" name="interfaces" value="eth0 eth1"/>
>         <nvpair id="IFcheck-multiplier" name="multiplier" value="200"/>
>         <nvpair id="IFcheck-dampen" name="dampen" value="16s" />
>       </instance_attributes>
>       <operations>
>         <op id="IFcheck-monitor" interval="8s" name="monitor"/>
>       </operations>
>     </primitive>
>   </clone>
>
>   <!--resource for ISP checks-->
>   <clone id="clone-ISPcheck">
>     <primitive id="ISPcheck" class="ocf" provider="OSAG" type="ispcheck">
>       <instance_attributes id="ISPcheck-attrs">
>         <nvpair id="ISPcheck-ipsec" name="ipsec-check" value="1" />
>         <nvpair id="ISPcheck-ping" name="ping-check" value="1" />
>         <nvpair id="ISPcheck-multiplier" name="multiplier" value="200"/>
>         <nvpair id="ISPcheck-dampen" name="dampen" value="60s"/>
>       </instance_attributes>
>       <operations>
>         <op id="ISPcheck-monitor" interval="30s" name="monitor"/>
>       </operations>
>     </primitive>
>   </clone>
>
>   <!--Virtual IP group-->
>   <group id="VIP-group">
>     <primitive id="eth1-0-192.168.1.10" class="ocf" provider="heartbeat" type="IPaddr2">
>       <meta_attributes id="meta-VIP-1">
>         <nvpair id="VIP-1-failure-timeout" name="failure-timeout" value="60"/>
>         <nvpair id="VIP-1-migration-threshold" name="migration-threshold" value="50"/>
>       </meta_attributes>
>       <instance_attributes id="VIP-1-instance_attributes">
>         <nvpair id="VIP-1-IP" name = "ip" value="192.168.1.10"/>
>         <nvpair id="VIP-1-nic" name="nic" value="eth1"/>
>         <nvpair id="VIP-1-cidr" name="cidr_netmask" value="24"/>
>         <nvpair id="VIP-1-iflabel" name="iflabel" value="0"/>
>         <nvpair id="VIP-1-arp-sender" name="arp_sender" value="send_arp"/>
>       </instance_attributes>
>       <operations>
>         <op id="VIP-1-monitor" interval="10s" name="monitor"/>
>       </operations>
>     </primitive>
>   </group>
> </resources>
>
> <!--resource constraints-->
> <constraints>
>   <!--set VIP location based on the following two rules-->
>   <rsc_location id="VIPs" rsc="VIP-group">
>     <!--prefer host with more interfaces-->
>     <rule id="VIP-prefer-connected-rule-1" score-attribute="ifcheck" >
>       <expression id="VIP-prefer-most-connected-1" attribute="ifcheck" operation="defined"/>
>     </rule>
>     <!--prefer host with better ISP connectivity-->
>     <rule id="VIP-prefer-connected-rule-2" score-attribute="ispcheck">
>       <expression id="VIP-prefer-most-connected-2" attribute="ispcheck" operation="defined"/>
>     </rule>
>   </rsc_location>
>
>   <!--conntrack master must run where the VIPs are-->
>   <rsc_colocation id="conntrack-master-with-VIPs" rsc="master-conntrackd" with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>   <!--condition master must run where the VIPs are-->
>   <rsc_colocation id="condition-master-with-VIPs" rsc="master-condition" with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>
>   <!--ospfd master must run with master-condition in master-->
>   <rsc_colocation id="ospfd-master-with-VIPs" rsc="master-ospfd" with-rsc="master-condition" with-rsc-role="Master" rsc-role="Master" score="INFINITY" />
>   <!--ripd master must run with master-condition in master-->
>   <rsc_colocation id="ripd-master-with-VIPs" rsc="master-ripd" with-rsc="master-condition" with-rsc-role="Master" rsc-role="Master" score="INFINITY" />
>   <!--squid master must run with master-condition in master-->
>   <rsc_colocation id="squid-master-with-VIPs" rsc="master-squid" with-rsc="master-condition" with-rsc-role="Master" rsc-role="Master" score="INFINITY" />
>
>   <!--prefer as master the following hosts in ascending order-->
>   <rsc_location id="VIP-master-xi" rsc="VIP-group" node="xi" score="0"/>
>   <rsc_location id="VIP-master-nu" rsc="VIP-group" node="nu" score="20"/>
>   <rsc_location id="VIP-master-mu" rsc="VIP-group" node="mu" score="40"/>
> </constraints>
>
> On Feb 5, 2013, at 11:13 AM, James Guthrie <jag at open.ch> wrote:
>
>> Hi Andrew,
>>
>> "The resource" in this case was master-squid.init. The resource agent serves as a master/slave OCF wrapper to a non-LSB init script. I forced the failure by manually stopping that init script on the host.
>>
>> Regards,
>> James
>> On Feb 5, 2013, at 10:56 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
>>
>>> On Thu, Jan 31, 2013 at 3:04 AM, James Guthrie <jag at open.ch> wrote:
>>>> Hi all,
>>>>
>>>> I'm having a bit of difficulty with the way that my cluster is behaving on failure of a resource.
>>>>
>>>> The objective of my clustering setup is to provide a virtual IP, to which a number of other services are bound. The services are bound to the VIP with constraints to force the service to be running on the same host as the VIP.
>>>>
>>>> I have been testing the way that the cluster behaves if it is unable to start a resource. What I observe is the following: the cluster tries to start the resource on node 1,
>>>
>>> Can you define "the resource"?  You have a few and it matters :)
>>>
>>>> fails 10 times, reaches the migration threshold, moves the resource to the other host, fails 10 times, reaches the migration threshold. Now it has reached the migration threshold on all possible hosts. I was then expecting that it would stop the resource on all nodes and run all of the other resources as though nothing were wrong. What I see though is that the cluster demotes all master/slave resources, despite the fact that only one of them is failing.
>>>>
>>>> I wasn't able to find a parameter which would dictate what the behaviour should be if the migration failed on all available hosts. I must therefore believe that the constraints configuration I'm using isn't doing quite what I hope it's doing.
>>>>
>>>> Below is the configuration xml I am using on the hosts (no crmsh config, sorry).
>>>>
>>>> I am using Corosync 2.3.0 and Pacemaker 1.1.8, built from source.
>>>>
>>>> Regards,
>>>> James
>>>>
>>>> <!-- Configuration file for pacemaker -->
>>>> <resources>
>>>> <!--resource for conntrackd-->
>>>> <master id="master-conntrackd">
>>>>   <meta_attributes id="master-conntrackd-meta_attributes">
>>>>     <nvpair id="master-conntrackd-meta_attributes-notify" name="notify" value="true"/>
>>>>     <nvpair id="master-conntrackd-meta_attributes-interleave" name="interleave" value="true"/>
>>>>     <nvpair id="master-conntrackd-meta_attributes-target-role" name="target-role" value="Master"/>
>>>>     <nvpair id="master-conndtrakd-meta_attributes-failure-timeout" name="failure-timeout" value="600"/>
>>>>     <nvpair id="master-conntrackd-meta_attributes-migration-threshold" name="migration-threshold" value="10"/>
>>>>   </meta_attributes>
>>>>   <primitive id="conntrackd" class="ocf" provider="OSAG" type="conntrackd">
>>>>     <operations>
>>>>       <op id="conntrackd-slave-check" name="monitor" interval="60" role="Slave" />
>>>>       <op id="conntrackd-master-check" name="monitor" interval="61" role="Master" />
>>>>     </operations>
>>>>   </primitive>
>>>> </master>
>>>> <master id="master-condition">
>>>>   <meta_attributes id="master-condition-meta_attributes">
>>>>     <nvpair id="master-condition-meta_attributes-notify" name="notify" value="false"/>
>>>>     <nvpair id="master-condition-meta_attributes-interleave" name="interleave" value="true"/>
>>>>     <nvpair id="master-condition-meta_attributes-target-role" name="target-role" value="Master"/>
>>>>     <nvpair id="master-condition-meta_attributes-failure-timeout" name="failure-timeout" value="600"/>
>>>>     <nvpair id="master-condition-meta_attributes-migration-threshold" name="migration-threshold" value="10"/>
>>>>   </meta_attributes>
>>>>   <primitive id="condition" class="ocf" provider="OSAG" type="condition">
>>>>     <instance_attributes id="condition-attrs">
>>>>     </instance_attributes>
>>>>     <operations>
>>>>       <op id="condition-slave-check" name="monitor" interval="10" role="Slave" />
>>>>       <op id="condition-master-check" name="monitor" interval="11" role="Master" />
>>>>     </operations>
>>>>   </primitive>
>>>> </master>
>>>> <master id="master-ospfd.init">
>>>>   <meta_attributes id="master-ospfd-meta_attributes">
>>>>     <nvpair id="master-ospfd-meta_attributes-notify" name="notify" value="false"/>
>>>>     <nvpair id="master-ospfd-meta_attributes-interleave" name="interleave" value="true"/>
>>>>     <nvpair id="master-ospfd-meta_attributes-target-role" name="target-role" value="Master"/>
>>>>     <nvpair id="master-ospfd-meta_attributes-failure-timeout" name="failure-timeout" value="600"/>
>>>>     <nvpair id="master-ospfd-meta_attributes-migration-threshold" name="migration-threshold" value="10"/>
>>>>   </meta_attributes>
>>>>   <primitive id="ospfd" class="ocf" provider="OSAG" type="osaginit">
>>>>     <instance_attributes id="ospfd-attrs">
>>>>       <nvpair id="ospfd-script" name="script" value="ospfd.init"/>
>>>>     </instance_attributes>
>>>>     <operations>
>>>>       <op id="ospfd-slave-check" name="monitor" interval="10" role="Slave" />
>>>>       <op id="ospfd-master-check" name="monitor" interval="11" role="Master" />
>>>>     </operations>
>>>>   </primitive>
>>>> </master>
>>>> <master id="master-ripd.init">
>>>>   <meta_attributes id="master-ripd-meta_attributes">
>>>>     <nvpair id="master-ripd-meta_attributes-notify" name="notify" value="false"/>
>>>>     <nvpair id="master-ripd-meta_attributes-interleave" name="interleave" value="true"/>
>>>>     <nvpair id="master-ripd-meta_attributes-target-role" name="target-role" value="Master"/>
>>>>     <nvpair id="master-ripd-meta_attributes-failure-timeout" name="failure-timeout" value="600"/>
>>>>     <nvpair id="master-ripd-meta_attributes-migration-threshold" name="migration-threshold" value="10"/>
>>>>   </meta_attributes>
>>>>   <primitive id="ripd" class="ocf" provider="OSAG" type="osaginit">
>>>>     <instance_attributes id="ripd-attrs">
>>>>       <nvpair id="ripd-script" name="script" value="ripd.init"/>
>>>>     </instance_attributes>
>>>>     <operations>
>>>>       <op id="ripd-slave-check" name="monitor" interval="10" role="Slave" />
>>>>       <op id="ripd-master-check" name="monitor" interval="11" role="Master" />
>>>>     </operations>
>>>>   </primitive>
>>>> </master>
>>>> <master id="master-squid.init">
>>>>   <meta_attributes id="master-squid-meta_attributes">
>>>>     <nvpair id="master-squid-meta_attributes-notify" name="notify" value="false"/>
>>>>     <nvpair id="master-squid-meta_attributes-interleave" name="interleave" value="true"/>
>>>>     <nvpair id="master-squid-meta_attributes-target-role" name="target-role" value="Master"/>
>>>>     <nvpair id="master-squid-meta_attributes-failure-timeout" name="failure-timeout" value="600"/>
>>>>     <nvpair id="master-squid-meta_attributes-migration-threshold" name="migration-threshold" value="10"/>
>>>>   </meta_attributes>
>>>>   <primitive id="squid" class="ocf" provider="OSAG" type="osaginit">
>>>>     <instance_attributes id="squid-attrs">
>>>>       <nvpair id="squid-script" name="script" value="squid.init"/>
>>>>     </instance_attributes>
>>>>     <operations>
>>>>       <op id="squid-slave-check" name="monitor" interval="10" role="Slave" />
>>>>       <op id="squid-master-check" name="monitor" interval="11" role="Master" />
>>>>     </operations>
>>>>   </primitive>
>>>> </master>
>>>>
>>>> <!--resource for interface checks -->
>>>> <clone id="clone-IFcheck">
>>>>   <primitive id="IFcheck" class="ocf" provider="OSAG" type="ifmonitor">
>>>>     <instance_attributes id="resIFcheck-attrs">
>>>>       <nvpair id="IFcheck-interfaces" name="interfaces" value="eth0 eth1"/>
>>>>       <nvpair id="IFcheck-multiplier" name="multiplier" value="200"/>
>>>>       <nvpair id="IFcheck-dampen" name="dampen" value="6s" />
>>>>     </instance_attributes>
>>>>     <operations>
>>>>       <op id="IFcheck-monitor" interval="3s" name="monitor"/>
>>>>     </operations>
>>>>   </primitive>
>>>> </clone>
>>>>
>>>> <!--resource for ISP checks-->
>>>> <clone id="clone-ISPcheck">
>>>>   <primitive id="ISPcheck" class="ocf" provider="OSAG" type="ispcheck">
>>>>     <instance_attributes id="ISPcheck-attrs">
>>>>       <nvpair id="ISPcheck-ipsec" name="ipsec-check" value="1" />
>>>>       <nvpair id="ISPcheck-ping" name="ping-check" value="1" />
>>>>       <nvpair id="ISPcheck-multiplier" name="multiplier" value="200"/>
>>>>       <nvpair id="ISPcheck-dampen" name="dampen" value="60s"/>
>>>>     </instance_attributes>
>>>>     <operations>
>>>>       <op id="ISPcheck-monitor" interval="30s" name="monitor"/>
>>>>     </operations>
>>>>   </primitive>
>>>> </clone>
>>>>
>>>> <!--Virtual IP group-->
>>>> <group id="VIP-group">
>>>>   <primitive id="eth1-0-192.168.1.10" class="ocf" provider="heartbeat" type="IPaddr2">
>>>>     <meta_attributes id="meta-VIP-1">
>>>>       <nvpair id="VIP-1-failure-timeout" name="failure-timeout" value="60"/>
>>>>       <nvpair id="VIP-1-migration-threshold" name="migration-threshold" value="50"/>
>>>>     </meta_attributes>
>>>>     <instance_attributes id="VIP-1-instance_attributes">
>>>>       <nvpair id="VIP-1-IP" name = "ip" value="192.168.1.10"/>
>>>>       <nvpair id="VIP-1-nic" name="nic" value="eth1"/>
>>>>       <nvpair id="VIP-1-cidr" name="cidr_netmask" value="24"/>
>>>>       <nvpair id="VIP-1-iflabel" name="iflabel" value="0"/>
>>>>       <nvpair id="VIP-1-arp-sender" name="arp_sender" value="send_arp"/>
>>>>     </instance_attributes>
>>>>     <operations>
>>>>       <op id="VIP-1-monitor" interval="10s" name="monitor"/>
>>>>     </operations>
>>>>   </primitive>
>>>> </group>
>>>> </resources>
>>>>
>>>> <!--resource constraints-->
>>>> <constraints>
>>>> <!--set VIP location based on the following two rules-->
>>>> <rsc_location id="VIPs" rsc="VIP-group">
>>>>   <!--prefer host with more interfaces-->
>>>>   <rule id="VIP-prefer-connected-rule-1" score-attribute="ifcheck" >
>>>>     <expression id="VIP-prefer-most-connected-1" attribute="ifcheck" operation="defined"/>
>>>>   </rule>
>>>>   <!--prefer host with better ISP connectivity-->
>>>>   <rule id="VIP-prefer-connected-rule-2" score-attribute="ispcheck">
>>>>     <expression id="VIP-prefer-most-connected-2" attribute="ispcheck" operation="defined"/>
>>>>   </rule>
>>>> </rsc_location>
>>>> <!--conntrack master must run where the VIPs are-->
>>>> <rsc_colocation id="conntrack-master-with-VIPs" rsc="master-conntrackd" with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>>>> <rsc_colocation id="condition-master-with-VIPs" rsc="master-condition" with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>>>> <!--services masters must run where the VIPs are-->
>>>> <rsc_colocation id="ospfd-master-with-VIPs" rsc="master-ospfd.init" with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>>>> <rsc_colocation id="ripd-master-with-VIPs" rsc="master-ripd.init" with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>>>> <rsc_colocation id="squid-master-with-VIPs" rsc="master-squid.init" with-rsc="VIP-group" rsc-role="Master" score="INFINITY" />
>>>> <!--prefer as master the following hosts in ascending order-->
>>>> <rsc_location id="VIP-master-xi" rsc="VIP-group" node="xi" score="0"/>
>>>> <rsc_location id="VIP-master-nu" rsc="VIP-group" node="nu" score="20"/>
>>>> <rsc_location id="VIP-master-mu" rsc="VIP-group" node="mu" score="40"/>
>>>> </constraints>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org