[Pacemaker] ordering cloned resources

Sun Mar 23 05:46:41 EDT 2014

It was probably too late yesterday eve when I tested this.Taking a
fresh look at it this morning, the ordering constraint is acting
perfectly fine now even with cloned resources.

Thank you.

2014-03-22 23:41 GMT+01:00 Alexandre <alxgomz at gmail.com>:
> So.... it took me a while to have everything packaged and so on, but
> eventually, I managed to upgrade my cluster to
> corosync2/pacemaker1.1.11 (version advertised is 1.1.10-9d39a6b).
> ALthough I have a much more fficient communicatio between nodes I
> still have the same issue with this ordering constraint that uses
> clones on both sides.
> The ordering contraint works if I set a primitive as the first
> resource. But if I put this primitive in a clone resource, it stops
> working.
>
> Above are the logs I get on the node were the fist resource starts:
>
> Mar 22 23:29:18 sanaoe02 crmd[10989]:   notice: do_state_transition:
> State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Mar 22 23:29:18 sanaoe02 cib[10984]:   notice: cib:diff: Diff: --- 0.916.2
> Mar 22 23:29:18 sanaoe02 cib[10984]:   notice: cib:diff: Diff: +++
> 0.917.1 5da74572ddb3a247189b39d515918343
> Mar 22 23:29:18 sanaoe02 cib[10984]:   notice: cib:diff: --
> <nvpair value="Stopped" id="cln_aoe-meta_attributes-target-role"/>
> Mar 22 23:29:18 sanaoe02 cib[10984]:   notice: cib:diff: ++
> <nvpair id="cln_aoe-meta_attributes-target-role" name="target-role"
> value="Started"/>
> Mar 22 23:29:18 sanaoe02 pengine[10988]:   notice: unpack_rsc_op:
> Preventing cln_aoe from re-starting on dir01: operation monitor failed
> 'not installed' (5)
> Mar 22 23:29:18 sanaoe02 pengine[10988]:   notice: unpack_rsc_op:
> Preventing cln_aoe from re-starting on mta02: operation monitor failed
> 'not installed' (5)
> Mar 22 23:29:18 sanaoe02 pengine[10988]:   notice: unpack_rsc_op:
> Preventing cln_aoe from re-starting on ms02: operation monitor failed
> 'not installed' (5)
> Mar 22 23:29:18 sanaoe02 pengine[10988]:   notice: unpack_rsc_op:
> Preventing cln_aoe from re-starting on mx02: operation monitor failed
> 'not installed' (5)
> Mar 22 23:29:18 sanaoe02 pengine[10988]:   notice: unpack_rsc_op:
> Preventing cln_aoe from re-starting on dir02: operation monitor failed
> 'not installed' (5)
> Mar 22 23:29:18 sanaoe02 pengine[10988]:   notice: LogActions: Start
> pri_aoe1:0#011(sanaoe02)
> Mar 22 23:29:18 sanaoe02 crmd[10989]:   notice: te_rsc_command:
> Initiating action 39: start pri_aoe1_start_0 on sanaoe02 (local)
> Mar 22 23:29:18 sanaoe02 pengine[10988]:   notice: process_pe_message:
> Calculated Transition 377: /var/lib/pacemaker/pengine/pe-input-100.bz2
> Mar 22 23:29:18 sanaoe02 AoEtarget(pri_aoe1)[14285]: INFO: Exporting
> device /dev/xvdb on eth1 as shelf 2, slot 1
> Mar 22 23:29:18 sanaoe02 AoEtarget(pri_aoe1)[14285]: DEBUG: pri_aoe1 start : 0
> Mar 22 23:29:19 sanaoe02 crmd[10989]:   notice: process_lrm_event: LRM
> operation pri_aoe1_start_0 (call=194, rc=0, cib-update=982,
> confirmed=true) ok
> Mar 22 23:29:19 sanaoe02 crmd[10989]:   notice: run_graph: Transition
> 377 (Complete=8, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-100.bz2): Complete
> Mar 22 23:29:19 sanaoe02 crmd[10989]:   notice: do_state_transition:
> State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
>
> On the nodes where the second resource should start, I get absoletly
> no logs *at all*!
>
> If I modify the ordering constraint to use a primitive as the first
> resource instead of a cloned resource, then everythong works ok....
> and I get the following logs on the node where the the firt resource
> starts (very similar too the previous one)
>
> Mar 22 23:37:50 sanaoe02 crmd[10989]:   notice: do_state_transition:
> State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Mar 22 23:37:50 sanaoe02 cib[10984]:   notice: cib:diff: Diff: --- 0.920.3
> Mar 22 23:37:50 sanaoe02 cib[10984]:   notice: cib:diff: Diff: +++
> 0.921.1 04b8247b3c6786c3ff15f583cf725c3d
> Mar 22 23:37:50 sanaoe02 cib[10984]:   notice: cib:diff: --
> <nvpair value="Stopped" id="pri_aoe1-meta_attributes-target-role"/>
> Mar 22 23:37:50 sanaoe02 cib[10984]:   notice: cib:diff: ++
> <nvpair id="pri_aoe1-meta_attributes-target-role" name="target-role"
> value="Started"/>
> Mar 22 23:37:50 sanaoe02 pengine[10988]:   notice: unpack_rsc_op:
> Preventing pri_aoe1 from re-starting on dir01: operation monitor
> failed 'not installed' (5)
> Mar 22 23:37:50 sanaoe02 pengine[10988]:   notice: unpack_rsc_op:
> Preventing pri_aoe1 from re-starting on mta02: operation monitor
> failed 'not installed' (5)
> Mar 22 23:37:50 sanaoe02 pengine[10988]:   notice: unpack_rsc_op:
> Preventing pri_aoe1 from re-starting on ms02: operation monitor failed
> 'not installed' (5)
> Mar 22 23:37:50 sanaoe02 pengine[10988]:   notice: unpack_rsc_op:
> Preventing pri_aoe1 from re-starting on mx02: operation monitor failed
> 'not installed' (5)
> Mar 22 23:37:50 sanaoe02 pengine[10988]:   notice: unpack_rsc_op:
> Preventing pri_aoe1 from re-starting on dir02: operation monitor
> failed 'not installed' (5)
> Mar 22 23:37:50 sanaoe02 pengine[10988]:   notice: LogActions: Start
> pri_dovecot:0#011(ms02)
> Mar 22 23:37:50 sanaoe02 pengine[10988]:   notice: LogActions: Start
> pri_aoe1#011(sanaoe02)
> Mar 22 23:37:50 sanaoe02 crmd[10989]:   notice: te_rsc_command:
> Initiating action 39: start pri_aoe1_start_0 on sanaoe02 (local)
> Mar 22 23:37:50 sanaoe02 pengine[10988]:   notice: process_pe_message:
> Calculated Transition 381: /var/lib/pacemaker/pengine/pe-input-104.bz2
> Mar 22 23:37:50 sanaoe02 AoEtarget(pri_aoe1)[14379]: INFO: Exporting
> device /dev/xvdb on eth1 as shelf 2, slot 1
> Mar 22 23:37:50 sanaoe02 AoEtarget(pri_aoe1)[14379]: DEBUG: pri_aoe1 start : 0
> Mar 22 23:37:50 sanaoe02 crmd[10989]:   notice: process_lrm_event: LRM
> operation pri_aoe1_start_0 (call=198, rc=0, cib-update=1027,
> confirmed=true) ok
> Mar 22 23:37:50 sanaoe02 crmd[10989]:   notice: te_rsc_command:
> Initiating action 25: start pri_dovecot_start_0 on ms02
> Mar 22 23:37:50 sanaoe02 crmd[10989]:   notice: te_rsc_command:
> Initiating action 26: monitor pri_dovecot_monitor_5000 on ms02
> Mar 22 23:37:50 sanaoe02 crmd[10989]:   notice: run_graph: Transition
> 381 (Complete=8, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-104.bz2): Complete
> Mar 22 23:37:50 sanaoe02 crmd[10989]:   notice: do_state_transition:
> State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
>
> and where the second resource starts
>
> Mar 22 22:37:50 ms02 crmd[89496]:   notice: process_lrm_event: LRM
> operation pri_dovecot_start_0 (call=151, rc=0, cib-update=197,
> confirmed=true) ok
> Mar 22 22:37:50 ms02 dovecot: master: Dovecot v2.1.7 starting up
> Mar 22 22:37:50 ms02 dovecot: master: Warning: /home is no longer
> mounted. If this is intentional, remove it with doveadm mount
> Mar 22 22:37:50 ms02 crmd[89496]:   notice: process_lrm_event: LRM
> operation pri_dovecot_monitor_5000 (call=152, rc=0, cib-update=198,
> confirmed=false) ok
>
> I can't find anything usefull in those logs but if you think something
> is relevant or could be, please feel free to highlight.
>
> 2014-03-11 2:13 GMT+01:00 Andrew Beekhof <andrew at beekhof.net>:
>>
>> On 9 Mar 2014, at 10:36 pm, Alexandre <alxgomz at gmail.com> wrote:
>>
>>> So...,
>>>
>>> It appears the problem doesn't come from the primitive but for the
>>> cloned resource. If I use the primitive instead of the clone in the
>>> order constraint (thus deleting the clone and the group) , the second
>>> resource of the constraint startup as expected.
>>>
>>> Any idea why?
>>
>> Not without logs
>>
>>>
>>> Should I upgrade this pretty old version of pacemaker?
>>
>> Yes :)
>>
>>>
>>> 2014-03-08 10:36 GMT+01:00 Alexandre <alxgomz at gmail.com>:
>>>> Hi Andrew,
>>>>
>>>> I have tried to stop and start the first resource of the ordering
>>>> constraint (cln_san), hoping it would trigger a start attemps of the
>>>> second resource of the ordering constraint (cln_mailstore).
>>>> I tailed the syslog logs on the node where I was expecting the second
>>>> resource to start but really nothing appeared in those logs (I grepped
>>>> 'pengine as per your suggestion).
>>>>
>>>> I have done another test, where I changed the first resource of the
>>>> ordering constraint with a very simple primitive (lsb resource), and
>>>> it worked in this case.
>>>>
>>>> I am wondering if the issue doesn't come from the rather complicated
>>>> first  resource. It is a cloned group which contains a primitive
>>>> conditional instance attributes...
>>>> Are you aware of any specific issue in pacemaker 1.1.7 with this kind
>>>> of ressources?
>>>>
>>>> I will try to simplify the resources by getting rid of the conditional
>>>> instance attribute and try again. In the mean time I'd be delighted to
>>>> hear about what you guys think about that.
>>>>
>>>> Regards, Alex.
>>>>
>>>> 2014-03-07 4:21 GMT+01:00 Andrew Beekhof <andrew at beekhof.net>:
>>>>>
>>>>> On 3 Mar 2014, at 3:56 am, Alexandre <alxgomz at gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am setting up a cluster on debian wheezy.
>>>>>> I have installed pacemaker using the debian provided packages (so am
>>>>>> runing  1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff).
>>>>>>
>>>>>> I have roughly 10 nodes, among which some nodes are acting as SAN
>>>>>> (exporting block devices using AoE protocol) and others nodes acting
>>>>>> as initiators (they are actually mail servers, storing emails on the
>>>>>> exported devices).
>>>>>> Bellow are the defined resources for those nodes:
>>>>>>
>>>>>> xml <primitive class="ocf" id="pri_aoe1" provider="heartbeat"
>>>>>> type="AoEtarget"> \
>>>>>>       <instance_attributes id="pri_aoe1.1-instance_attributes"> \
>>>>>>               <rule id="node-sanaoe01" score="1"> \
>>>>>>                       <expression attribute="#uname"
>>>>>> id="expr-node-sanaoe01" operation="eq" value="sanaoe01"/> \
>>>>>>               </rule> \
>>>>>>               <nvpair id="pri_aoe1.1-instance_attributes-device"
>>>>>> name="device" value="/dev/xvdb"/> \
>>>>>>               <nvpair id="pri_aoe1.1-instance_attributes-nic"
>>>>>> name="nic" value="eth0"/> \
>>>>>>               <nvpair id="pri_aoe1.1-instance_attributes-shelf"
>>>>>> name="shelf" value="1"/> \
>>>>>>               <nvpair id="pri_aoe1.1-instance_attributes-slot"
>>>>>> name="slot" value="1"/> \
>>>>>>       </instance_attributes> \
>>>>>>       <instance_attributes id="pri_aoe2.1-instance_attributes"> \
>>>>>>               <rule id="node-sanaoe02" score="2"> \
>>>>>>                       <expression attribute="#uname"
>>>>>> id="expr-node-sanaoe2" operation="eq" value="sanaoe02"/> \
>>>>>>               </rule> \
>>>>>>               <nvpair id="pri_aoe2.1-instance_attributes-device"
>>>>>> name="device" value="/dev/xvdb"/> \
>>>>>>               <nvpair id="pri_aoe2.1-instance_attributes-nic"
>>>>>> name="nic" value="eth1"/> \
>>>>>>               <nvpair id="pri_aoe2.1-instance_attributes-shelf"
>>>>>> name="shelf" value="2"/> \
>>>>>>               <nvpair id="pri_aoe2.1-instance_attributes-slot"
>>>>>> name="slot" value="1"/> \
>>>>>>       </instance_attributes> \
>>>>>> </primitive>
>>>>>> primitive pri_dovecot lsb:dovecot \
>>>>>>       op start interval="0" timeout="20" \
>>>>>>       op stop interval="0" timeout="30" \
>>>>>>       op monitor interval="5" timeout="10"
>>>>>> primitive pri_spamassassin lsb:spamassassin \
>>>>>>       op start interval="0" timeout="50" \
>>>>>>       op stop interval="0" timeout="60" \
>>>>>>       op monitor interval="5" timeout="20"
>>>>>> group grp_aoe pri_aoe1
>>>>>> group grp_mailstore pri_dlm pri_clvmd pri_spamassassin pri_dovecot
>>>>>> clone cln_mailstore grp_mailstore \
>>>>>>       meta ordered="false" interleave="true" clone-max="2"
>>>>>> clone cln_san grp_aoe \
>>>>>>       meta ordered="true" interleave="true" clone-max="2"
>>>>>>
>>>>>> As I am in an "opt-in cluster" mode (symmetric-cluster="false"), I
>>>>>> have the location constraints bellow for those hosts:
>>>>>>
>>>>>> location LOC_AOE_ETHERD_1 cln_san inf: sanaoe01
>>>>>> location LOC_AOE_ETHERD_2 cln_san inf: sanaoe02
>>>>>> location LOC_MAIL_STORE_1 cln_mailstore inf: ms01
>>>>>> location LOC_MAIL_STORE_2 cln_mailstore inf: ms02
>>>>>>
>>>>>> So far so good. I want to make sure the initiators won't try to search
>>>>>> for exported devices before the targets actually exported them. To do
>>>>>> so, I though I could use the following ordering constraint:
>>>>>>
>>>>>> order ORD_SAN_MAILSTORE inf: cln_san cln_mailstore
>>>>>>
>>>>>> Unfortunately if I add this constraint the clone Set "cln_mailstore"
>>>>>> never starts (or even stops if started when I add the constraint).
>>>>>>
>>>>>> Is there something wrong with this ordering rule?
>>>>>> Where can i find informations on what's going on?
>>>>>
>>>>> No errors in the logs?
>>>>> If you grep for 'pengine' does it want to start them or just leave them stopped?
>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>