[Pacemaker] different behavior cibadmin -Ql with cman and corosync2

Tue Sep 3 10:10:21 EDT 2013

03.09.2013, 17:52, "Christine Caulfield" <ccaulfie at redhat.com>:
> On 03/09/13 05:20, Andrew Beekhof wrote:
>
>>  On 02/09/2013, at 5:27 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>  30.08.2013, 07:18, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>  On 29/08/2013, at 7:31 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>    29.08.2013, 12:25, "Andrey Groshev" <greenx at yandex.ru>:
>>>>>>    29.08.2013, 02:55, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>     On 28/08/2013, at 5:38 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>      28.08.2013, 04:06, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>      On 27/08/2013, at 1:13 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>       27.08.2013, 05:39, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>>>       On 26/08/2013, at 3:09 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>>>        26.08.2013, 03:34, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>>>>>        On 23/08/2013, at 9:39 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>>>>>         Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         Today I try remake my test cluster from cman to corosync2.
>>>>>>>>>>>>>>         I drew attention to the following:
>>>>>>>>>>>>>>         If I reset cluster with cman through cibadmin --erase --force
>>>>>>>>>>>>>>         In cib is still there exist names of nodes.
>>>>>>>>>>>>>        Yes, the cluster puts back entries for all the nodes it know about automagically.
>>>>>>>>>>>>>>         cibadmin -Ql
>>>>>>>>>>>>>>         .....
>>>>>>>>>>>>>>            <nodes>
>>>>>>>>>>>>>>              <node id="dev-cluster2-node2.unix.tensor.ru" uname="dev-cluster2-node2"/>
>>>>>>>>>>>>>>              <node id="dev-cluster2-node4.unix.tensor.ru" uname="dev-cluster2-node4"/>
>>>>>>>>>>>>>>              <node id="dev-cluster2-node3.unix.tensor.ru" uname="dev-cluster2-node3"/>
>>>>>>>>>>>>>>            </nodes>
>>>>>>>>>>>>>>         ....
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         Even if cman and pacemaker running only one node.
>>>>>>>>>>>>>        I'm assuming all three are configured in cluster.conf?
>>>>>>>>>>>>        Yes, there exist list nodes.
>>>>>>>>>>>>>>         And if I do too on cluster with corosync2
>>>>>>>>>>>>>>         I see only names of nodes which run corosync and pacemaker.
>>>>>>>>>>>>>        Since you're not included your config, I can only guess that your corosync.conf does not have a nodelist.
>>>>>>>>>>>>>        If it did, you should get the same behaviour.
>>>>>>>>>>>>        I try and expected_node and nodelist.
>>>>>>>>>>>       And it didn't work? What version of pacemaker?
>>>>>>>>>>       It does not work as I expected.
>>>>>>>>>      Thats because you've used IP addresses in the node list.
>>>>>>>>>      ie.
>>>>>>>>>
>>>>>>>>>      node {
>>>>>>>>>        ring0_addr: 10.76.157.17
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>>      try including the node name as well, eg.
>>>>>>>>>
>>>>>>>>>      node {
>>>>>>>>>        name: dev-cluster2-node2
>>>>>>>>>        ring0_addr: 10.76.157.17
>>>>>>>>>      }
>>>>>>>>      The same thing.
>>>>>>>     I don't know what to say.  I tested it here yesterday and it worked as expected.
>>>>>>    I found that the reason that You and I have different results - I did not have reverse DNS zone for these nodes.
>>>>>>    I know what it should be, but (PACEMAKER + CMAN) worked without a reverse area!
>>>>>    Hasty. Deleted all. Reinstalled. Configured. Not working again. Damn!
>>>>  It would have surprised me... pacemaker 1.1.11 doesn't do any dns lookups - reverse or otherwise.
>>>>  Can you set
>>>>
>>>>    PCMK_trace_files=corosync.c
>>>>
>>>>  in your environment and retest?
>>>>
>>>>  On RHEL6 that means putting the following in /etc/sysconfig/pacemaker
>>>>     export PCMK_trace_files=corosync.c
>>>>
>>>>  It should produce additional logging[1] that will help diagnose the issue.
>>>>
>>>>  [1] http://blog.clusterlabs.org/blog/2013/pacemaker-logging/
>>>  Hello, Andrew.
>>>
>>>  You are a little misunderstood me.
>>  No, I understood you fine.
>>>  I wrote that I rushed to judgment.
>>>  After I did the reverse DNS zone, the cluster behaved correctly.
>>>  BUT after I took apart the cluster dropped configs and restarted on the new cluster,
>>>  cluster again don't showed all the nodes in the nodes (only node with running pacemaker).
>>>
>>>  A small portion of the log. Full log
>>>  In which (I thought) there is something interesting.
>>>
>>>  Aug 30 12:31:11 [9986] dev-cluster2-node4        cib: (  corosync.c:423   )   trace: check_message_sanity:      Verfied message 4: (dest=<all>:cib, from=dev-cluster2-node4:cib.9986, compressed=0, size=1551, total=2143)
>>>  Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (  corosync.c:96    )   trace: corosync_node_name:        Checking 172793107 vs 0 from nodelist.node.0.nodeid
>>>  Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (      ipcc.c:378   )   debug: qb_ipcc_disconnect:        qb_ipcc_disconnect()
>>>  Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (ringbuffer.c:294   )   debug: qb_rb_close:       Closing ringbuffer: /dev/shm/qb-cmap-request-9616-9989-27-header
>>>  Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (ringbuffer.c:294   )   debug: qb_rb_close:       Closing ringbuffer: /dev/shm/qb-cmap-response-9616-9989-27-header
>>>  Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (ringbuffer.c:294   )   debug: qb_rb_close:       Closing ringbuffer: /dev/shm/qb-cmap-event-9616-9989-27-header
>>>  Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (  corosync.c:134   )  notice: corosync_node_name:        Unable to get node name for nodeid 172793107
>>  I wonder if you need to be including the nodeid too. ie.
>>
>>  node {
>>    name: dev-cluster2-node2
>>    ring0_addr: 10.76.157.17
>>    nodeid: 2
>>  }
>>
>>  I _thought_ that was implicit.
>>  Chrissie: is "nodelist.node.%d.nodeid" always available for corosync2 or only if explicitly defined in the config?
>
> You do need to specify a nodeid if you don't want corosync to imply it
> from the IP address (or you're using IPv6). corosync won't imply a
> nodeif from the order of the nodes in corosync.conf - that's not
> reliable enough. Also bear in mind that 0 is not a valid node number :-)
>
> Chrissie

But if we do not specify "nodeid" hands, the "pacemaker" can not see the nodelist.