[ClusterLabs] Updated attribute is not displayed in crm_mon
Ken Gaillot
kgaillot at redhat.com
Mon Aug 14 13:41:36 EDT 2017
On Mon, 2017-08-14 at 12:33 -0500, Ken Gaillot wrote:
> On Wed, 2017-08-02 at 09:59 +0000, 井上 和徳 wrote:
> > Hi,
> >
> > In Pacemaker-1.1.17, the attribute updated while starting pacemaker is not displayed in crm_mon.
> > In Pacemaker-1.1.16, it is displayed and results are different.
> >
> > https://github.com/ClusterLabs/pacemaker/commit/fe44f400a3116a158ab331a92a49a4ad8937170d
> > This commit is the cause, but the following result (3.) is expected behavior?
>
> This turned out to be an odd one. The sequence of events is:
>
> 1. When the node leaves the cluster, the DC (correctly) wipes all its
> transient attributes from attrd and the CIB.
>
> 2. Pacemaker is newly started on the node, and a transient attribute is
> set before the node joins the cluster.
>
> 3. The node joins the cluster, and its transient attributes (including
> the new value) are sync'ed with the rest of the cluster, in both attrd
> and the CIB. So far, so good.
>
> 4. Because this is the node's first join since its crmd started, its
> crmd wipes all of its transient attributes again. The idea is that the
> node may have restarted so quickly that the DC hasn't yet done it (step
> 1 here), so clear them now to avoid any problems with old values.
> However, the crmd wipes only the CIB -- not attrd (arguably a bug).
Whoops, clarification: the node may have restarted so quickly that
corosync didn't notice it left, so the DC would never have gotten the
"peer lost" message that triggers wiping its transient attributes.
I suspect the crmd wipes only the CIB in this case because we assumed
attrd would be empty at this point -- missing exactly this case where a
value was set between start-up and first join.
> 5. With the older pacemaker version, both the joining node and the DC
> would request a full write-out of all values from attrd. Because step 4
> only wiped the CIB, this ends up restoring the new value. With the newer
> pacemaker version, this step is no longer done, so the value winds up
> staying in attrd but not in CIB (until the next write-out naturally
> occurs).
>
> I don't have a solution yet, but step 4 is clearly the problem (rather
> than the new code that skips step 5, which is still a good idea
> performance-wise). I'll keep working on it.
>
> > [test case]
> > 1. Start pacemaker on two nodes at the same time and update the attribute during startup.
> > In this case, the attribute is displayed in crm_mon.
> >
> > [root at node1 ~]# ssh -f node1 'systemctl start pacemaker ; attrd_updater -n KEY -U V-1' ; \
> > ssh -f node3 'systemctl start pacemaker ; attrd_updater -n KEY -U V-3'
> > [root at node1 ~]# crm_mon -QA1
> > Stack: corosync
> > Current DC: node3 (version 1.1.17-1.el7-b36b869) - partition with quorum
> >
> > 2 nodes configured
> > 0 resources configured
> >
> > Online: [ node1 node3 ]
> >
> > No active resources
> >
> >
> > Node Attributes:
> > * Node node1:
> > + KEY : V-1
> > * Node node3:
> > + KEY : V-3
> >
> >
> > 2. Restart pacemaker on node1, and update the attribute during startup.
> >
> > [root at node1 ~]# systemctl stop pacemaker
> > [root at node1 ~]# systemctl start pacemaker ; attrd_updater -n KEY -U V-10
> >
> >
> > 3. The attribute is registered in attrd but it is not registered in CIB,
> > so the updated attribute is not displayed in crm_mon.
> >
> > [root at node1 ~]# attrd_updater -Q -n KEY -A
> > name="KEY" host="node3" value="V-3"
> > name="KEY" host="node1" value="V-10"
> >
> > [root at node1 ~]# crm_mon -QA1
> > Stack: corosync
> > Current DC: node3 (version 1.1.17-1.el7-b36b869) - partition with quorum
> >
> > 2 nodes configured
> > 0 resources configured
> >
> > Online: [ node1 node3 ]
> >
> > No active resources
> >
> >
> > Node Attributes:
> > * Node node1:
> > * Node node3:
> > + KEY : V-3
> >
> >
> > Best Regards
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > http://lists.clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list