[Pacemaker] different behavior cibadmin -Ql with cman and corosync2
Andrew Beekhof
andrew at beekhof.net
Thu Sep 12 02:59:46 EDT 2013
On 11/09/2013, at 2:57 PM, Andrey Groshev <greenx at yandex.ru> wrote:
> Hello Christine, Andrew and all.
>
> I'm sorry - a little was unwell, so did not answer.
> What we end this stream of messages?
> Who will change? corosync or pacemaker?
For now make sure you specify a nodeid and name.
Longer term, Chrissie is looking at making the combined data set available in a different namespace for pacemaker to use.
>
>
> 05.09.2013, 15:49, "Christine Caulfield" <ccaulfie at redhat.com>:
>> On 05/09/13 11:33, Andrew Beekhof wrote:
>>
>>> On 05/09/2013, at 6:37 PM, Christine Caulfield <ccaulfie at redhat.com> wrote:
>>>> On 03/09/13 22:03, Andrew Beekhof wrote:
>>>>> On 03/09/2013, at 11:49 PM, Christine Caulfield <ccaulfie at redhat.com> wrote:
>>>>>> On 03/09/13 05:20, Andrew Beekhof wrote:
>>>>>>> On 02/09/2013, at 5:27 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>> 30.08.2013, 07:18, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>> On 29/08/2013, at 7:31 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>> 29.08.2013, 12:25, "Andrey Groshev" <greenx at yandex.ru>:
>>>>>>>>>>> 29.08.2013, 02:55, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>>>> On 28/08/2013, at 5:38 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>>>> 28.08.2013, 04:06, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>>>>>> On 27/08/2013, at 1:13 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>>>>>> 27.08.2013, 05:39, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>>>>>>>> On 26/08/2013, at 3:09 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>>>>>>>> 26.08.2013, 03:34, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>>>>>>>>>> On 23/08/2013, at 9:39 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Today I try remake my test cluster from cman to corosync2.
>>>>>>>>>>>>>>>>>>> I drew attention to the following:
>>>>>>>>>>>>>>>>>>> If I reset cluster with cman through cibadmin --erase --force
>>>>>>>>>>>>>>>>>>> In cib is still there exist names of nodes.
>>>>>>>>>>>>>>>>>> Yes, the cluster puts back entries for all the nodes it know about automagically.
>>>>>>>>>>>>>>>>>>> cibadmin -Ql
>>>>>>>>>>>>>>>>>>> .....
>>>>>>>>>>>>>>>>>>> <nodes>
>>>>>>>>>>>>>>>>>>> <node id="dev-cluster2-node2.unix.tensor.ru" uname="dev-cluster2-node2"/>
>>>>>>>>>>>>>>>>>>> <node id="dev-cluster2-node4.unix.tensor.ru" uname="dev-cluster2-node4"/>
>>>>>>>>>>>>>>>>>>> <node id="dev-cluster2-node3.unix.tensor.ru" uname="dev-cluster2-node3"/>
>>>>>>>>>>>>>>>>>>> </nodes>
>>>>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Even if cman and pacemaker running only one node.
>>>>>>>>>>>>>>>>>> I'm assuming all three are configured in cluster.conf?
>>>>>>>>>>>>>>>>> Yes, there exist list nodes.
>>>>>>>>>>>>>>>>>>> And if I do too on cluster with corosync2
>>>>>>>>>>>>>>>>>>> I see only names of nodes which run corosync and pacemaker.
>>>>>>>>>>>>>>>>>> Since you're not included your config, I can only guess that your corosync.conf does not have a nodelist.
>>>>>>>>>>>>>>>>>> If it did, you should get the same behaviour.
>>>>>>>>>>>>>>>>> I try and expected_node and nodelist.
>>>>>>>>>>>>>>>> And it didn't work? What version of pacemaker?
>>>>>>>>>>>>>>> It does not work as I expected.
>>>>>>>>>>>>>> Thats because you've used IP addresses in the node list.
>>>>>>>>>>>>>> ie.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> node {
>>>>>>>>>>>>>> ring0_addr: 10.76.157.17
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> try including the node name as well, eg.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> node {
>>>>>>>>>>>>>> name: dev-cluster2-node2
>>>>>>>>>>>>>> ring0_addr: 10.76.157.17
>>>>>>>>>>>>>> }
>>>>>>>>>>>>> The same thing.
>>>>>>>>>>>> I don't know what to say. I tested it here yesterday and it worked as expected.
>>>>>>>>>>> I found that the reason that You and I have different results - I did not have reverse DNS zone for these nodes.
>>>>>>>>>>> I know what it should be, but (PACEMAKER + CMAN) worked without a reverse area!
>>>>>>>>>> Hasty. Deleted all. Reinstalled. Configured. Not working again. Damn!
>>>>>>>>> It would have surprised me... pacemaker 1.1.11 doesn't do any dns lookups - reverse or otherwise.
>>>>>>>>> Can you set
>>>>>>>>>
>>>>>>>>> PCMK_trace_files=corosync.c
>>>>>>>>>
>>>>>>>>> in your environment and retest?
>>>>>>>>>
>>>>>>>>> On RHEL6 that means putting the following in /etc/sysconfig/pacemaker
>>>>>>>>> export PCMK_trace_files=corosync.c
>>>>>>>>>
>>>>>>>>> It should produce additional logging[1] that will help diagnose the issue.
>>>>>>>>>
>>>>>>>>> [1] http://blog.clusterlabs.org/blog/2013/pacemaker-logging/
>>>>>>>> Hello, Andrew.
>>>>>>>>
>>>>>>>> You are a little misunderstood me.
>>>>>>> No, I understood you fine.
>>>>>>>> I wrote that I rushed to judgment.
>>>>>>>> After I did the reverse DNS zone, the cluster behaved correctly.
>>>>>>>> BUT after I took apart the cluster dropped configs and restarted on the new cluster,
>>>>>>>> cluster again don't showed all the nodes in the nodes (only node with running pacemaker).
>>>>>>>>
>>>>>>>> A small portion of the log. Full log
>>>>>>>> In which (I thought) there is something interesting.
>>>>>>>>
>>>>>>>> Aug 30 12:31:11 [9986] dev-cluster2-node4 cib: ( corosync.c:423 ) trace: check_message_sanity: Verfied message 4: (dest=<all>:cib, from=dev-cluster2-node4:cib.9986, compressed=0, size=1551, total=2143)
>>>>>>>> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: ( corosync.c:96 ) trace: corosync_node_name: Checking 172793107 vs 0 from nodelist.node.0.nodeid
>>>>>>>> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: ( ipcc.c:378 ) debug: qb_ipcc_disconnect: qb_ipcc_disconnect()
>>>>>>>> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: (ringbuffer.c:294 ) debug: qb_rb_close: Closing ringbuffer: /dev/shm/qb-cmap-request-9616-9989-27-header
>>>>>>>> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: (ringbuffer.c:294 ) debug: qb_rb_close: Closing ringbuffer: /dev/shm/qb-cmap-response-9616-9989-27-header
>>>>>>>> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: (ringbuffer.c:294 ) debug: qb_rb_close: Closing ringbuffer: /dev/shm/qb-cmap-event-9616-9989-27-header
>>>>>>>> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: ( corosync.c:134 ) notice: corosync_node_name: Unable to get node name for nodeid 172793107
>>>>>>> I wonder if you need to be including the nodeid too. ie.
>>>>>>>
>>>>>>> node {
>>>>>>> name: dev-cluster2-node2
>>>>>>> ring0_addr: 10.76.157.17
>>>>>>> nodeid: 2
>>>>>>> }
>>>>>>>
>>>>>>> I _thought_ that was implicit.
>>>>>>> Chrissie: is "nodelist.node.%d.nodeid" always available for corosync2 or only if explicitly defined in the config?
>>>>>> You do need to specify a nodeid if you don't want corosync to imply it from the IP address (or you're using IPv6). corosync won't imply a nodeif from the order of the nodes in corosync.conf - that's not reliable enough.
>>>>> Right, but is that implied nodeid available as "nodelist.node.%d.nodeid"?
>>>>> Andrey's results suggest "no" and I would claim this is not expected/good :)
>>>> If you want to get the nodeid of the node you are on
>>> No, we're trying to use a known nodeid to look up the other information in the node list - such as 'ring0_addr' or 'name'.
>>
>> votequorum_get_info()
>>
>> Chrissie
>>
>>>> there is both a corosync API call for it - totem_nodeid_get() - or you can get it from votequorum via cmap - runtime.votequorum.this_node_id
>>>>
>>>> The nodelist.* section of cmap is really meant to reflect what is in corosync.conf and I don't really want to be writing into it. I know there is already nodelist.our_node_pos, but I'm not a fan of that either :P
>>>>
>>>> Chrissie
>>>>>> Also bear in mind that 0 is not a valid node number :-)
>>>>>>
>>>>>> Chrissie
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130912/8792d40a/attachment-0003.sig>
More information about the Pacemaker
mailing list