[Pacemaker] Utilization & resource stickiness strange behaviour - sorted resources

Thu Jan 19 18:30:04 UTC 2012

Thank you for your fast reply Andreas. This solved the issue.
Tried it in debian squeeze with squeeze backport repositories (Source
Package: pacemaker (1.1.6-2~bpo60+1) and behaviour is correct.
Best regards

2012/1/19 Andreas Kurz <andreas at hastexo.com>:
> Hello,
>
> On 01/19/2012 01:39 PM, agutxi Agustin wrote:
>> Hi all,
>> I am trying to set up a cluster of virtual machine hosts, and while
>> doing so, I came out with a very strange behaviour (I think it may be
>> a bug) and I hope you can lend me a hand in debugging this.
>> For testing the behaviour observed in my production environment, I set
>> up 2 new simple machines with no location/colocation/order
>> constraints, and changed the Xen resource agent with the dummy
>> resource agent, and the behaviour was the same.
>
> Have you tried Pacemaker 1.1.6? There have been some utilization fixes.
>
> Regards,
> Andreas
>
> --
> Need help with Pacemaker?
> http://www.hastexo.com/now
>
>>
>> The scenario is the following:
>> - The strategy is "utilization".
>> - 2 nodes: vmHost1 and vmHost2, with 2 cores each handle 5 resources:
>> DummyVM001-005, with resource-stickiness="INFINITY".
>>
>> What happens is the following:
>> - If I start resources DummyVM001-004, everything is fine. 2 resources
>> run on each of the nodes
>> - Now, I start DummyVM005, but utilization is full, so it does not
>> start (cool :)
>> - Then , if I stop any of the running resources, everything goes
>> smoothly and DummyVM005 starts up. That's cool too.
>> * Here comes the strange part:
>> I have full utilization and resource stickines INFINITY, so starting
>> new resources shouldn't change anything in the cluster status BUT
>> If any of the freshly restarted resources is alphabetically sorted
>> before, the cluster stops the "last" resource alphabetically sorted
>> running  and starts the stopped one.
>> I don't think this is the expected behaviour, please correct me if I wrong.
>>
>> Thank you kindly,
>> Agustin
>>
>> status:
>> ====
>> Online: [ vmHost1 vmHost2 ]
>>
>>  DummyVM1     (ocf::pacemaker:Dummy): Started vmHost1
>>  DummyVM2     (ocf::pacemaker:Dummy): Started vmHost1
>>  DummyVM3     (ocf::pacemaker:Dummy): Started vmHost2
>>  DummyVM5     (ocf::pacemaker:Dummy): Started vmHost2
>> crm(live)# resource start DummyVM4
>>
>> My configuration:
>> ============
>> crm(live)# configure show
>> node vmHost1 \
>>       utilization cores="2"
>> node vmHost2 \
>>       utilization cores="2"
>> primitive DummyVM1 ocf:pacemaker:Dummy \
>>       op monitor interval="60s" timeout="60s" \
>>       op start on-fail="restart" interval="0" \
>>       op stop on-fail="ignore" interval="0" \
>>       utilization cores="1" \
>>       meta is-managed="true" migration-threshold="2"
>> primitive DummyVM2 ocf:pacemaker:Dummy \
>>       op monitor interval="60s" timeout="60s" \
>>       op start on-fail="restart" interval="0" \
>>       op stop on-fail="ignore" interval="0" \
>>       utilization cores="1" \
>>       meta is-managed="true" migration-threshold="2"
>> primitive DummyVM3 ocf:pacemaker:Dummy \
>>       op monitor interval="60s" timeout="60s" \
>>       op start on-fail="restart" interval="0" \
>>       op stop on-fail="ignore" interval="0" \
>>       utilization cores="1" \
>>       meta is-managed="true" migration-threshold="2"
>> primitive DummyVM4 ocf:pacemaker:Dummy \
>>       op monitor interval="60s" timeout="60s" \
>>       op start on-fail="restart" interval="0" \
>>       op stop on-fail="ignore" interval="0" \
>>       utilization cores="1" \
>>       meta is-managed="true" migration-threshold="2" target-role="Started"
>> primitive DummyVM5 ocf:pacemaker:Dummy \
>>       op monitor interval="60s" timeout="60s" \
>>       op start on-fail="restart" interval="0" \
>>       op stop on-fail="ignore" interval="0" \
>>       utilization cores="1" \
>>       meta is-managed="true" migration-threshold="2" target-role="Started"
>> property $id="cib-bootstrap-options" \
>>       dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
>>       cluster-infrastructure="openais" \
>>       expected-quorum-votes="2" \
>>       stonith-enabled="false" \
>>       stop-all-resources="false" \
>>       placement-strategy="utilization" \
>>       no-quorum-policy="ignore" \
>>       cluster-infrastructure="openais" \
>>       stop-orphan-resources="true" \
>>       stop-orphan-actions="true" \
>>       last-lrm-refresh="1326975274"
>> rsc_defaults $id="rsc-options" \
>>       resource-stickiness="INFINITY"
>>
>>
>> and the important part in /var/log/syslog :
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff- <cib
>> admin_epoch="0" epoch="75" num_updates="4" >
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff-   <configuration >
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff-     <resources >
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff-       <primitive
>> id="DummyVM4" >
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff-
>> <meta_attributes id="DummyVM4-meta_attributes" >
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff-           <nvpair
>> value="Stopped" id="DummyVM4-meta_attributes-target-role" />
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff-         </meta_attributes>
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff-       </primitive>
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff-     </resources>
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff-   </configuration>
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff- </cib>
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+ <cib epoch="76"
>> num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2"
>> crm_feature_set="3.0.5" have-quorum="1" cib-last-written="Thu Jan 19
>> 12:35:47 2012" dc-uuid="vmHost1" >
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+   <configuration >
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+     <resources >
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+       <primitive
>> class="ocf" id="DummyVM4" provider="pacemaker" type="Dummy" >
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+
>> <meta_attributes id="DummyVM4-meta_attributes" >
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+           <nvpair
>> id="DummyVM4-meta_attributes-target-role" name="target-role"
>> value="Started" />
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+         </meta_attributes>
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+       </primitive>
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+     </resources>
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+   </configuration>
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib:diff+ </cib>
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: abort_transition_graph:
>> te_update_diff:131 - Triggered transition abort (complete=1, tag=diff,
>> id=(null), magic=NA, cib=0.76.1) : Non-status change
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_state_transition: State
>> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
>> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_state_transition: All 2
>> cluster nodes are eligible to run resources.
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_pe_invoke: Query 191:
>> Requesting the current CIB: S_POLICY_ENGINE
>> Jan 19 13:36:19 vmHost1 cib: [725]: info: cib_process_request:
>> Operation complete: op cib_replace for section resources
>> (origin=local/cibadmin/2, version=0.76.1): ok (rc=0)
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_pe_invoke_callback:
>> Invoking the PE: query=191, ref=pe_calc-dc-1326976579-119, seq=64,
>> quorate=1
>> Jan 19 13:36:19 vmHost1 pengine: [728]: notice: unpack_config: On loss
>> of CCM Quorum: Ignore
>> Jan 19 13:36:19 vmHost1 pengine: [728]: notice: native_print:
>> DummyVM1#011(ocf::pacemaker:Dummy):#011Started vmHost1
>> Jan 19 13:36:19 vmHost1 pengine: [728]: notice: native_print:
>> DummyVM2#011(ocf::pacemaker:Dummy):#011Started vmHost1
>> Jan 19 13:36:19 vmHost1 pengine: [728]: notice: native_print:
>> DummyVM3#011(ocf::pacemaker:Dummy):#011Started vmHost2
>> Jan 19 13:36:19 vmHost1 pengine: [728]: notice: native_print:
>> DummyVM4#011(ocf::pacemaker:Dummy):#011Stopped
>> Jan 19 13:36:19 vmHost1 pengine: [728]: notice: native_print:
>> DummyVM5#011(ocf::pacemaker:Dummy):#011Started vmHost2
>> Jan 19 13:36:19 vmHost1 pengine: [728]: notice: RecurringOp:  Start
>> recurring monitor (60s) for DummyVM4 on vmHost2
>> Jan 19 13:36:19 vmHost1 pengine: [728]: notice: LogActions: Leave
>> DummyVM1#011(Started vmHost1)
>> Jan 19 13:36:19 vmHost1 pengine: [728]: notice: LogActions: Leave
>> DummyVM2#011(Started vmHost1)
>> Jan 19 13:36:19 vmHost1 pengine: [728]: notice: LogActions: Leave
>> DummyVM3#011(Started vmHost2)
>> Jan 19 13:36:19 vmHost1 pengine: [728]: notice: LogActions: Start
>> DummyVM4#011(vmHost2)
>> Jan 19 13:36:19 vmHost1 pengine: [728]: notice: LogActions: Stop
>> DummyVM5#011(vmHost2)
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_state_transition: State
>> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
>> cause=C_IPC_MESSAGE origin=handle_response ]
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: unpack_graph: Unpacked
>> transition 33: 6 actions in 6 synapses
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_te_invoke: Processing
>> graph 33 (ref=pe_calc-dc-1326976579-119) derived from
>> /var/lib/pengine/pe-input-717.bz2
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_rsc_command: Initiating
>> action 19: stop DummyVM5_stop_0 on vmHost2
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_pseudo_action: Pseudo
>> action 6 fired and confirmed
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: match_graph_event: Action
>> DummyVM5_stop_0 (19) confirmed on vmHost2 (rc=0)
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_pseudo_action: Pseudo
>> action 7 fired and confirmed
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_pseudo_action: Pseudo
>> action 5 fired and confirmed
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_rsc_command: Initiating
>> action 17: start DummyVM4_start_0 on vmHost2
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: match_graph_event: Action
>> DummyVM4_start_0 (17) confirmed on vmHost2 (rc=0)
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_rsc_command: Initiating
>> action 18: monitor DummyVM4_monitor_60000 on vmHost2
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: match_graph_event: Action
>> DummyVM4_monitor_60000 (18) confirmed on vmHost2 (rc=0)
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: run_graph:
>> ====================================================
>> Jan 19 13:36:19 vmHost1 crmd: [729]: notice: run_graph: Transition 33
>> (Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>> Source=/var/lib/pengine/pe-input-717.bz2): Complete
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: te_graph_trigger:
>> Transition 33 is now complete
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: notify_crmd: Transition 33
>> status: done - <null>
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_state_transition: State
>> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
>> cause=C_FSA_INTERNAL origin=notify_crmd ]
>> Jan 19 13:36:19 vmHost1 crmd: [729]: info: do_state_transition:
>> Starting PEngine Recheck Timer
>>
>>
>>
>>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>