[Pacemaker] Order of resources in a group and crm_diff
Andrew Beekhof
andrew at beekhof.net
Tue Jun 10 02:00:56 CEST 2014
On 6 Jun 2014, at 5:37 pm, Gao,Yan <ygao at suse.com> wrote:
>
>
> On 06/06/14 13:21, Gao,Yan wrote:
>> On 01/29/14 13:44, Andrew Beekhof wrote:
>>>
>>> On 28 Jan 2014, at 10:11 pm, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Just discovered, that when I add resource to a middle of
>>>> (running) group, it is added to the end.
>>>>
>>>> I mean, if I update following (crmsh syntax)
>>>>
>>>> group dhcp-server vip-10-5-200-244 dhcpd
>>>>
>>>> with
>>>>
>>>> group dhcp-server vip-10-5-200-244 vip-10-5-201-244 dhcpd
>>>>
>>>> with 'crm configure load update', actual definition becomes
>>>>
>>>> group dhcp-server vip-10-5-200-244 dhcpd vip-10-5-201-244
>>>>
>>>> Also, strange enough, if I get XML CIB with cibadmin -Q, then edit
>>>> order of primitives with text editor, crm_diff doesn't show any differences:
>>>>
>>>> cib-orig.xml:
>>>> ...
>>>> <group id="dhcp-server">
>>>> <primitive id="vip-10-5-200-244" class="ocf" provider="heartbeat" type="IPaddr2">
>>>> <instance_attributes id="vip-10-5-200-244-instance_attributes">
>>>> <nvpair name="ip" value="10.5.200.244" id="vip-10-5-200-244-instance_attributes-ip"/>
>>>> <nvpair name="cidr_netmask" value="32" id="vip-10-5-200-244-instance_attributes-cidr_netmask"/>
>>>> <nvpair name="nic" value="vlan1" id="vip-10-5-200-244-instance_attributes-nic"/>
>>>> </instance_attributes>
>>>> <operations>
>>>> <op name="start" interval="0" timeout="20" id="vip-10-5-200-244-start-0"/>
>>>> <op name="stop" interval="0" timeout="20" id="vip-10-5-200-244-stop-0"/>
>>>> <op name="monitor" interval="30" id="vip-10-5-200-244-monitor-30"/>
>>>> </operations>
>>>> </primitive>
>>>> <primitive id="dhcpd" class="lsb" type="dhcpd">
>>>> <operations>
>>>> <op name="monitor" interval="10" timeout="15" id="dhcpd-monitor-10"/>
>>>> <op name="start" interval="0" timeout="90" id="dhcpd-start-0"/>
>>>> <op name="stop" interval="0" timeout="90" id="dhcpd-stop-0"/>
>>>> </operations>
>>>> <meta_attributes id="dhcpd-meta_attributes">
>>>> <nvpair id="dhcpd-meta_attributes-target-role" name="target-role" value="Started"/>
>>>> </meta_attributes>
>>>> </primitive>
>>>> <primitive id="vip-10-5-201-244" class="ocf" provider="heartbeat" type="IPaddr2">
>>>> <instance_attributes id="vip-10-5-201-244-instance_attributes">
>>>> <nvpair name="ip" value="10.5.201.244" id="vip-10-5-201-244-instance_attributes-ip"/>
>>>> <nvpair name="cidr_netmask" value="24" id="vip-10-5-201-244-instance_attributes-cidr_netmask"/>
>>>> <nvpair name="nic" value="vlan201" id="vip-10-5-201-244-instance_attributes-nic"/>
>>>> </instance_attributes>
>>>> <operations>
>>>> <op name="start" interval="0" timeout="20" id="vip-10-5-201-244-start-0"/>
>>>> <op name="stop" interval="0" timeout="20" id="vip-10-5-201-244-stop-0"/>
>>>> <op name="monitor" interval="30" id="vip-10-5-201-244-monitor-30"/>
>>>> </operations>
>>>> </primitive>
>>>> </group>
>>>> ...
>>>>
>>>> cib.xml:
>>>> ...
>>>> <group id="dhcp-server">
>>>> <primitive id="vip-10-5-200-244" class="ocf" provider="heartbeat" type="IPaddr2">
>>>> <instance_attributes id="vip-10-5-200-244-instance_attributes">
>>>> <nvpair name="ip" value="10.5.200.244" id="vip-10-5-200-244-instance_attributes-ip"/>
>>>> <nvpair name="cidr_netmask" value="32" id="vip-10-5-200-244-instance_attributes-cidr_netmask"/>
>>>> <nvpair name="nic" value="vlan1" id="vip-10-5-200-244-instance_attributes-nic"/>
>>>> </instance_attributes>
>>>> <operations>
>>>> <op name="start" interval="0" timeout="20" id="vip-10-5-200-244-start-0"/>
>>>> <op name="stop" interval="0" timeout="20" id="vip-10-5-200-244-stop-0"/>
>>>> <op name="monitor" interval="30" id="vip-10-5-200-244-monitor-30"/>
>>>> </operations>
>>>> </primitive>
>>>> <primitive id="vip-10-5-201-244" class="ocf" provider="heartbeat" type="IPaddr2">
>>>> <instance_attributes id="vip-10-5-201-244-instance_attributes">
>>>> <nvpair name="ip" value="10.5.201.244" id="vip-10-5-201-244-instance_attributes-ip"/>
>>>> <nvpair name="cidr_netmask" value="24" id="vip-10-5-201-244-instance_attributes-cidr_netmask"/>
>>>> <nvpair name="nic" value="vlan201" id="vip-10-5-201-244-instance_attributes-nic"/>
>>>> </instance_attributes>
>>>> <operations>
>>>> <op name="start" interval="0" timeout="20" id="vip-10-5-201-244-start-0"/>
>>>> <op name="stop" interval="0" timeout="20" id="vip-10-5-201-244-stop-0"/>
>>>> <op name="monitor" interval="30" id="vip-10-5-201-244-monitor-30"/>
>>>> </operations>
>>>> </primitive>
>>>> <primitive id="dhcpd" class="lsb" type="dhcpd">
>>>> <operations>
>>>> <op name="monitor" interval="10" timeout="15" id="dhcpd-monitor-10"/>
>>>> <op name="start" interval="0" timeout="90" id="dhcpd-start-0"/>
>>>> <op name="stop" interval="0" timeout="90" id="dhcpd-stop-0"/>
>>>> </operations>
>>>> <meta_attributes id="dhcpd-meta_attributes">
>>>> <nvpair id="dhcpd-meta_attributes-target-role" name="target-role" value="Started"/>
>>>> </meta_attributes>
>>>> </primitive>
>>>> </group>
>>>> ...
>>>>
>>>> # crm_diff --original cib-orig.xml --new cib.xml
>>>>
>>>> shows nothing.
>>>>
>>>> And, 'cibadmin --replace --xml-file cib.xml' does nothing:
>>>>
>>>> Jan 28 11:01:21 booter-0 cib[2693]: notice: cib:diff: Diff: --- 0.427.2
>>>> Jan 28 11:01:21 booter-0 cib[2693]: notice: cib:diff: Diff: +++ 0.427.19 df366a02885285cc95529f402bfdac12
>>>> Jan 28 11:01:21 booter-0 cib[2693]: notice: cib:diff: -- <nvpair id="status-2-shutdown" name="shutdown" value="0"/>
>>>> Jan 28 11:01:21 booter-0 cib[2693]: notice: cib:diff: ++ <cib epoch="427" num_updates="19" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Tue Jan 28 10:46:06 2014" update-origin="booter-0" update-client="cibadmin" crm_feature_set="3.0.8" have-quorum="1" dc-uuid="1"/>
>>>
>>> Thats a known deficiency in the v1 diff format (and why we need costly digests to detect ordering changes).
>>> Happily .12 will have a new and improve diff format that will handle this correctly.
>>>
>>>>
>>>> But, after I do
>>>>
>>>> # crm_shadow --create-empty myShadow
>>>> shadow[myShadow] # cibadmin -E --force
>>>> shadow[myShadow] # cibadmin --replace --xml-file cib.xml
>>>> shadow[myShadow] # crm_shadow --commit myShadow --force
>>>> Now type Ctrl-D to exit the crm_shadow shell
>>>> shadow[myShadow] # exit
>>>>
>>>> group becomes defined in a proper order.
>>>>
>>>> That's why the only suspect is xml-diff algorithm.
>>>>
>>>> Andrew, David, could you please look?
>>>
>>> Its also partly how crmsh is using diffs.
>>> It could be verifying the diff produces the correct result by verifying the above mentioned digest.
>>> Or it could do a replace for the group instead...
>> I'm a bit surprised that even a replace cannot successfully reorder
>> resources in a group. I tried it on 1.1.9 ~ 1.1.11.
I'm both surprised and not surprised.
The old diff format sucked in this respect but I thought it was able to sync up eventually :-(
>>
>> On DC:
>> Jun 6 12:18:51 sles11-1 cib[1814]: notice: cib_perform_op:
>> Configuration ordering change detected
>> Jun 6 12:18:51 sles11-1 cib[1814]: notice: cib:diff: Diff: --- 0.3835.86
>> Jun 6 12:18:51 sles11-1 cib[1814]: notice: cib:diff: Diff: +++
>> 0.3835.1 21300207d1fe995ea0475be3dc60718f
>>
>>
>> On non-DC:
>> Jun 6 12:16:50 sles11-2 cib[32053]: warning: cib_process_diff: Diff
>> 0.3835.81 -> 0.3835.1 from sles11-1 not applied to 0.3835.81: Failed
>> application of an update diff
>> Jun 6 12:16:50 sles11-2 cib[32053]: warning: cib_process_replace:
>> Replacement 0.3835.1 from sles11-1 not applied to 0.3835.81: current
>> num_updates is greater than the replacement
>>
>>
>> I think the crm_shadow way mentioned above works because it bumps
>> "epoch" itself.
>>
>>
>> If we replace only the snippet of the group with
>> cibadmin -R -o resources -x group.xml
>>
>> , it'll apply the change in DC's cib, while it'll leave the non-DC's cib
>> out of sync.
>>
>>
>> 1.1.12-rc goes a different way in cib_perform_cib() and works.
>>
>> So, for 1.1.10/1.1.11, is it supposed to be like:
>>
>> --- pacemaker.orig/lib/cib/cib_utils.c
>> +++ pacemaker/lib/cib/cib_utils.c
>> @@ -565,6 +565,7 @@ cib_perform_op(const char *op, int call_
>> } else if (crm_str_eq(new_digest, last_digest, TRUE) == FALSE) {
>>
>> crm_notice("Configuration ordering change detected");
>> + cib_update_counter(scratch, XML_ATTR_GENERATION, FALSE);
>> cib_update_counter(scratch, XML_ATTR_NUMUPDATES, TRUE);
The problem with this is that if you're replacing from the DC then you just leapfrogged its version.
And when the DC asks for this version it will again leapfrog yours - which will probably result in a never ending loop.
Did you try it?
> And probably also:
> + *config_changed = TRUE;
Yep. Looks reasonable.
>>
>> crm_trace("Old: %s, New: %s", last_digest, new_digest);
>>
>> ?
>>
>> Regards,
>> Yan
>>
>>>
>>>>
>>>> Thank you,
>>>> Vladislav
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>
> --
> Gao,Yan <ygao at suse.com>
> Software Engineer
> China Server Team, SUSE.
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20140610/f74b1abe/attachment-0001.sig>
More information about the Pacemaker
mailing list