[ClusterLabs] Updated attribute is not displayed in crm_mon
Jan Friesse
jfriesse at redhat.com
Tue Aug 15 02:42:30 EDT 2017
Ken Gaillot napsal(a):
> On Mon, 2017-08-14 at 12:33 -0500, Ken Gaillot wrote:
>> On Wed, 2017-08-02 at 09:59 +0000, 井上 和徳 wrote:
>>> Hi,
>>>
>>> In Pacemaker-1.1.17, the attribute updated while starting pacemaker is not displayed in crm_mon.
>>> In Pacemaker-1.1.16, it is displayed and results are different.
>>>
>>> https://github.com/ClusterLabs/pacemaker/commit/fe44f400a3116a158ab331a92a49a4ad8937170d
>>> This commit is the cause, but the following result (3.) is expected behavior?
>>
>> This turned out to be an odd one. The sequence of events is:
>>
>> 1. When the node leaves the cluster, the DC (correctly) wipes all its
>> transient attributes from attrd and the CIB.
>>
>> 2. Pacemaker is newly started on the node, and a transient attribute is
>> set before the node joins the cluster.
>>
>> 3. The node joins the cluster, and its transient attributes (including
>> the new value) are sync'ed with the rest of the cluster, in both attrd
>> and the CIB. So far, so good.
>>
>> 4. Because this is the node's first join since its crmd started, its
>> crmd wipes all of its transient attributes again. The idea is that the
>> node may have restarted so quickly that the DC hasn't yet done it (step
>> 1 here), so clear them now to avoid any problems with old values.
>> However, the crmd wipes only the CIB -- not attrd (arguably a bug).
>
> Whoops, clarification: the node may have restarted so quickly that
> corosync didn't notice it left, so the DC would never have gotten the
Corosync always notice left of node no matter if left is longer or
within token timeout.
> "peer lost" message that triggers wiping its transient attributes.
>
> I suspect the crmd wipes only the CIB in this case because we assumed
> attrd would be empty at this point -- missing exactly this case where a
> value was set between start-up and first join.
>
>> 5. With the older pacemaker version, both the joining node and the DC
>> would request a full write-out of all values from attrd. Because step 4
>> only wiped the CIB, this ends up restoring the new value. With the newer
>> pacemaker version, this step is no longer done, so the value winds up
>> staying in attrd but not in CIB (until the next write-out naturally
>> occurs).
>>
>> I don't have a solution yet, but step 4 is clearly the problem (rather
>> than the new code that skips step 5, which is still a good idea
>> performance-wise). I'll keep working on it.
>>
>>> [test case]
>>> 1. Start pacemaker on two nodes at the same time and update the attribute during startup.
>>> In this case, the attribute is displayed in crm_mon.
>>>
>>> [root at node1 ~]# ssh -f node1 'systemctl start pacemaker ; attrd_updater -n KEY -U V-1' ; \
>>> ssh -f node3 'systemctl start pacemaker ; attrd_updater -n KEY -U V-3'
>>> [root at node1 ~]# crm_mon -QA1
>>> Stack: corosync
>>> Current DC: node3 (version 1.1.17-1.el7-b36b869) - partition with quorum
>>>
>>> 2 nodes configured
>>> 0 resources configured
>>>
>>> Online: [ node1 node3 ]
>>>
>>> No active resources
>>>
>>>
>>> Node Attributes:
>>> * Node node1:
>>> + KEY : V-1
>>> * Node node3:
>>> + KEY : V-3
>>>
>>>
>>> 2. Restart pacemaker on node1, and update the attribute during startup.
>>>
>>> [root at node1 ~]# systemctl stop pacemaker
>>> [root at node1 ~]# systemctl start pacemaker ; attrd_updater -n KEY -U V-10
>>>
>>>
>>> 3. The attribute is registered in attrd but it is not registered in CIB,
>>> so the updated attribute is not displayed in crm_mon.
>>>
>>> [root at node1 ~]# attrd_updater -Q -n KEY -A
>>> name="KEY" host="node3" value="V-3"
>>> name="KEY" host="node1" value="V-10"
>>>
>>> [root at node1 ~]# crm_mon -QA1
>>> Stack: corosync
>>> Current DC: node3 (version 1.1.17-1.el7-b36b869) - partition with quorum
>>>
>>> 2 nodes configured
>>> 0 resources configured
>>>
>>> Online: [ node1 node3 ]
>>>
>>> No active resources
>>>
>>>
>>> Node Attributes:
>>> * Node node1:
>>> * Node node3:
>>> + KEY : V-3
>>>
>>>
>>> Best Regards
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>
More information about the Users
mailing list