[Pacemaker] Order of resources in a group and crm_diff
Gao,Yan
ygao at suse.com
Fri Jun 6 09:37:32 CEST 2014
On 06/06/14 13:21, Gao,Yan wrote:
> On 01/29/14 13:44, Andrew Beekhof wrote:
>>
>> On 28 Jan 2014, at 10:11 pm, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>>
>>> Hi all,
>>>
>>> Just discovered, that when I add resource to a middle of
>>> (running) group, it is added to the end.
>>>
>>> I mean, if I update following (crmsh syntax)
>>>
>>> group dhcp-server vip-10-5-200-244 dhcpd
>>>
>>> with
>>>
>>> group dhcp-server vip-10-5-200-244 vip-10-5-201-244 dhcpd
>>>
>>> with 'crm configure load update', actual definition becomes
>>>
>>> group dhcp-server vip-10-5-200-244 dhcpd vip-10-5-201-244
>>>
>>> Also, strange enough, if I get XML CIB with cibadmin -Q, then edit
>>> order of primitives with text editor, crm_diff doesn't show any differences:
>>>
>>> cib-orig.xml:
>>> ...
>>> <group id="dhcp-server">
>>> <primitive id="vip-10-5-200-244" class="ocf" provider="heartbeat" type="IPaddr2">
>>> <instance_attributes id="vip-10-5-200-244-instance_attributes">
>>> <nvpair name="ip" value="10.5.200.244" id="vip-10-5-200-244-instance_attributes-ip"/>
>>> <nvpair name="cidr_netmask" value="32" id="vip-10-5-200-244-instance_attributes-cidr_netmask"/>
>>> <nvpair name="nic" value="vlan1" id="vip-10-5-200-244-instance_attributes-nic"/>
>>> </instance_attributes>
>>> <operations>
>>> <op name="start" interval="0" timeout="20" id="vip-10-5-200-244-start-0"/>
>>> <op name="stop" interval="0" timeout="20" id="vip-10-5-200-244-stop-0"/>
>>> <op name="monitor" interval="30" id="vip-10-5-200-244-monitor-30"/>
>>> </operations>
>>> </primitive>
>>> <primitive id="dhcpd" class="lsb" type="dhcpd">
>>> <operations>
>>> <op name="monitor" interval="10" timeout="15" id="dhcpd-monitor-10"/>
>>> <op name="start" interval="0" timeout="90" id="dhcpd-start-0"/>
>>> <op name="stop" interval="0" timeout="90" id="dhcpd-stop-0"/>
>>> </operations>
>>> <meta_attributes id="dhcpd-meta_attributes">
>>> <nvpair id="dhcpd-meta_attributes-target-role" name="target-role" value="Started"/>
>>> </meta_attributes>
>>> </primitive>
>>> <primitive id="vip-10-5-201-244" class="ocf" provider="heartbeat" type="IPaddr2">
>>> <instance_attributes id="vip-10-5-201-244-instance_attributes">
>>> <nvpair name="ip" value="10.5.201.244" id="vip-10-5-201-244-instance_attributes-ip"/>
>>> <nvpair name="cidr_netmask" value="24" id="vip-10-5-201-244-instance_attributes-cidr_netmask"/>
>>> <nvpair name="nic" value="vlan201" id="vip-10-5-201-244-instance_attributes-nic"/>
>>> </instance_attributes>
>>> <operations>
>>> <op name="start" interval="0" timeout="20" id="vip-10-5-201-244-start-0"/>
>>> <op name="stop" interval="0" timeout="20" id="vip-10-5-201-244-stop-0"/>
>>> <op name="monitor" interval="30" id="vip-10-5-201-244-monitor-30"/>
>>> </operations>
>>> </primitive>
>>> </group>
>>> ...
>>>
>>> cib.xml:
>>> ...
>>> <group id="dhcp-server">
>>> <primitive id="vip-10-5-200-244" class="ocf" provider="heartbeat" type="IPaddr2">
>>> <instance_attributes id="vip-10-5-200-244-instance_attributes">
>>> <nvpair name="ip" value="10.5.200.244" id="vip-10-5-200-244-instance_attributes-ip"/>
>>> <nvpair name="cidr_netmask" value="32" id="vip-10-5-200-244-instance_attributes-cidr_netmask"/>
>>> <nvpair name="nic" value="vlan1" id="vip-10-5-200-244-instance_attributes-nic"/>
>>> </instance_attributes>
>>> <operations>
>>> <op name="start" interval="0" timeout="20" id="vip-10-5-200-244-start-0"/>
>>> <op name="stop" interval="0" timeout="20" id="vip-10-5-200-244-stop-0"/>
>>> <op name="monitor" interval="30" id="vip-10-5-200-244-monitor-30"/>
>>> </operations>
>>> </primitive>
>>> <primitive id="vip-10-5-201-244" class="ocf" provider="heartbeat" type="IPaddr2">
>>> <instance_attributes id="vip-10-5-201-244-instance_attributes">
>>> <nvpair name="ip" value="10.5.201.244" id="vip-10-5-201-244-instance_attributes-ip"/>
>>> <nvpair name="cidr_netmask" value="24" id="vip-10-5-201-244-instance_attributes-cidr_netmask"/>
>>> <nvpair name="nic" value="vlan201" id="vip-10-5-201-244-instance_attributes-nic"/>
>>> </instance_attributes>
>>> <operations>
>>> <op name="start" interval="0" timeout="20" id="vip-10-5-201-244-start-0"/>
>>> <op name="stop" interval="0" timeout="20" id="vip-10-5-201-244-stop-0"/>
>>> <op name="monitor" interval="30" id="vip-10-5-201-244-monitor-30"/>
>>> </operations>
>>> </primitive>
>>> <primitive id="dhcpd" class="lsb" type="dhcpd">
>>> <operations>
>>> <op name="monitor" interval="10" timeout="15" id="dhcpd-monitor-10"/>
>>> <op name="start" interval="0" timeout="90" id="dhcpd-start-0"/>
>>> <op name="stop" interval="0" timeout="90" id="dhcpd-stop-0"/>
>>> </operations>
>>> <meta_attributes id="dhcpd-meta_attributes">
>>> <nvpair id="dhcpd-meta_attributes-target-role" name="target-role" value="Started"/>
>>> </meta_attributes>
>>> </primitive>
>>> </group>
>>> ...
>>>
>>> # crm_diff --original cib-orig.xml --new cib.xml
>>>
>>> shows nothing.
>>>
>>> And, 'cibadmin --replace --xml-file cib.xml' does nothing:
>>>
>>> Jan 28 11:01:21 booter-0 cib[2693]: notice: cib:diff: Diff: --- 0.427.2
>>> Jan 28 11:01:21 booter-0 cib[2693]: notice: cib:diff: Diff: +++ 0.427.19 df366a02885285cc95529f402bfdac12
>>> Jan 28 11:01:21 booter-0 cib[2693]: notice: cib:diff: -- <nvpair id="status-2-shutdown" name="shutdown" value="0"/>
>>> Jan 28 11:01:21 booter-0 cib[2693]: notice: cib:diff: ++ <cib epoch="427" num_updates="19" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Tue Jan 28 10:46:06 2014" update-origin="booter-0" update-client="cibadmin" crm_feature_set="3.0.8" have-quorum="1" dc-uuid="1"/>
>>
>> Thats a known deficiency in the v1 diff format (and why we need costly digests to detect ordering changes).
>> Happily .12 will have a new and improve diff format that will handle this correctly.
>>
>>>
>>> But, after I do
>>>
>>> # crm_shadow --create-empty myShadow
>>> shadow[myShadow] # cibadmin -E --force
>>> shadow[myShadow] # cibadmin --replace --xml-file cib.xml
>>> shadow[myShadow] # crm_shadow --commit myShadow --force
>>> Now type Ctrl-D to exit the crm_shadow shell
>>> shadow[myShadow] # exit
>>>
>>> group becomes defined in a proper order.
>>>
>>> That's why the only suspect is xml-diff algorithm.
>>>
>>> Andrew, David, could you please look?
>>
>> Its also partly how crmsh is using diffs.
>> It could be verifying the diff produces the correct result by verifying the above mentioned digest.
>> Or it could do a replace for the group instead...
> I'm a bit surprised that even a replace cannot successfully reorder
> resources in a group. I tried it on 1.1.9 ~ 1.1.11.
>
> On DC:
> Jun 6 12:18:51 sles11-1 cib[1814]: notice: cib_perform_op:
> Configuration ordering change detected
> Jun 6 12:18:51 sles11-1 cib[1814]: notice: cib:diff: Diff: --- 0.3835.86
> Jun 6 12:18:51 sles11-1 cib[1814]: notice: cib:diff: Diff: +++
> 0.3835.1 21300207d1fe995ea0475be3dc60718f
>
>
> On non-DC:
> Jun 6 12:16:50 sles11-2 cib[32053]: warning: cib_process_diff: Diff
> 0.3835.81 -> 0.3835.1 from sles11-1 not applied to 0.3835.81: Failed
> application of an update diff
> Jun 6 12:16:50 sles11-2 cib[32053]: warning: cib_process_replace:
> Replacement 0.3835.1 from sles11-1 not applied to 0.3835.81: current
> num_updates is greater than the replacement
>
>
> I think the crm_shadow way mentioned above works because it bumps
> "epoch" itself.
>
>
> If we replace only the snippet of the group with
> cibadmin -R -o resources -x group.xml
>
> , it'll apply the change in DC's cib, while it'll leave the non-DC's cib
> out of sync.
>
>
> 1.1.12-rc goes a different way in cib_perform_cib() and works.
>
> So, for 1.1.10/1.1.11, is it supposed to be like:
>
> --- pacemaker.orig/lib/cib/cib_utils.c
> +++ pacemaker/lib/cib/cib_utils.c
> @@ -565,6 +565,7 @@ cib_perform_op(const char *op, int call_
> } else if (crm_str_eq(new_digest, last_digest, TRUE) == FALSE) {
>
> crm_notice("Configuration ordering change detected");
> + cib_update_counter(scratch, XML_ATTR_GENERATION, FALSE);
> cib_update_counter(scratch, XML_ATTR_NUMUPDATES, TRUE);
And probably also:
+ *config_changed = TRUE;
>
> crm_trace("Old: %s, New: %s", last_digest, new_digest);
>
> ?
>
> Regards,
> Yan
>
>>
>>>
>>> Thank you,
>>> Vladislav
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
--
Gao,Yan <ygao at suse.com>
Software Engineer
China Server Team, SUSE.
More information about the Pacemaker
mailing list