[Pacemaker] different behavior cibadmin -Ql with cman and corosync2

Tue Sep 3 13:49:13 UTC 2013

On 03/09/13 05:20, Andrew Beekhof wrote:
>
> On 02/09/2013, at 5:27 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>
>>
>>
>> 30.08.2013, 07:18, "Andrew Beekhof" <andrew at beekhof.net>:
>>> On 29/08/2013, at 7:31 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>
>>>>   29.08.2013, 12:25, "Andrey Groshev" <greenx at yandex.ru>:
>>>>>   29.08.2013, 02:55, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>    On 28/08/2013, at 5:38 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>     28.08.2013, 04:06, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>     On 27/08/2013, at 1:13 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>      27.08.2013, 05:39, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>>      On 26/08/2013, at 3:09 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>>       26.08.2013, 03:34, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>>>>       On 23/08/2013, at 9:39 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>>>>        Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>>        Today I try remake my test cluster from cman to corosync2.
>>>>>>>>>>>>>        I drew attention to the following:
>>>>>>>>>>>>>        If I reset cluster with cman through cibadmin --erase --force
>>>>>>>>>>>>>        In cib is still there exist names of nodes.
>>>>>>>>>>>>       Yes, the cluster puts back entries for all the nodes it know about automagically.
>>>>>>>>>>>>>        cibadmin -Ql
>>>>>>>>>>>>>        .....
>>>>>>>>>>>>>           <nodes>
>>>>>>>>>>>>>             <node id="dev-cluster2-node2.unix.tensor.ru" uname="dev-cluster2-node2"/>
>>>>>>>>>>>>>             <node id="dev-cluster2-node4.unix.tensor.ru" uname="dev-cluster2-node4"/>
>>>>>>>>>>>>>             <node id="dev-cluster2-node3.unix.tensor.ru" uname="dev-cluster2-node3"/>
>>>>>>>>>>>>>           </nodes>
>>>>>>>>>>>>>        ....
>>>>>>>>>>>>>
>>>>>>>>>>>>>        Even if cman and pacemaker running only one node.
>>>>>>>>>>>>       I'm assuming all three are configured in cluster.conf?
>>>>>>>>>>>       Yes, there exist list nodes.
>>>>>>>>>>>>>        And if I do too on cluster with corosync2
>>>>>>>>>>>>>        I see only names of nodes which run corosync and pacemaker.
>>>>>>>>>>>>       Since you're not included your config, I can only guess that your corosync.conf does not have a nodelist.
>>>>>>>>>>>>       If it did, you should get the same behaviour.
>>>>>>>>>>>       I try and expected_node and nodelist.
>>>>>>>>>>      And it didn't work? What version of pacemaker?
>>>>>>>>>      It does not work as I expected.
>>>>>>>>     Thats because you've used IP addresses in the node list.
>>>>>>>>     ie.
>>>>>>>>
>>>>>>>>     node {
>>>>>>>>       ring0_addr: 10.76.157.17
>>>>>>>>     }
>>>>>>>>
>>>>>>>>     try including the node name as well, eg.
>>>>>>>>
>>>>>>>>     node {
>>>>>>>>       name: dev-cluster2-node2
>>>>>>>>       ring0_addr: 10.76.157.17
>>>>>>>>     }
>>>>>>>     The same thing.
>>>>>>    I don't know what to say.  I tested it here yesterday and it worked as expected.
>>>>>   I found that the reason that You and I have different results - I did not have reverse DNS zone for these nodes.
>>>>>   I know what it should be, but (PACEMAKER + CMAN) worked without a reverse area!
>>>>   Hasty. Deleted all. Reinstalled. Configured. Not working again. Damn!
>>>
>>> It would have surprised me... pacemaker 1.1.11 doesn't do any dns lookups - reverse or otherwise.
>>> Can you set
>>>
>>>   PCMK_trace_files=corosync.c
>>>
>>> in your environment and retest?
>>>
>>> On RHEL6 that means putting the following in /etc/sysconfig/pacemaker
>>>    export PCMK_trace_files=corosync.c
>>>
>>> It should produce additional logging[1] that will help diagnose the issue.
>>>
>>> [1] http://blog.clusterlabs.org/blog/2013/pacemaker-logging/
>>>
>>
>> Hello, Andrew.
>>
>> You are a little misunderstood me.
>
> No, I understood you fine.
>
>> I wrote that I rushed to judgment.
>> After I did the reverse DNS zone, the cluster behaved correctly.
>> BUT after I took apart the cluster dropped configs and restarted on the new cluster,
>> cluster again don't showed all the nodes in the nodes (only node with running pacemaker).
>>
>> A small portion of the log. Full log
>> In which (I thought) there is something interesting.
>>
>> Aug 30 12:31:11 [9986] dev-cluster2-node4        cib: (  corosync.c:423   )   trace: check_message_sanity:      Verfied message 4: (dest=<all>:cib, from=dev-cluster2-node4:cib.9986, compressed=0, size=1551, total=2143)
>> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (  corosync.c:96    )   trace: corosync_node_name:        Checking 172793107 vs 0 from nodelist.node.0.nodeid
>> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (      ipcc.c:378   )   debug: qb_ipcc_disconnect:        qb_ipcc_disconnect()
>> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (ringbuffer.c:294   )   debug: qb_rb_close:       Closing ringbuffer: /dev/shm/qb-cmap-request-9616-9989-27-header
>> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (ringbuffer.c:294   )   debug: qb_rb_close:       Closing ringbuffer: /dev/shm/qb-cmap-response-9616-9989-27-header
>> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (ringbuffer.c:294   )   debug: qb_rb_close:       Closing ringbuffer: /dev/shm/qb-cmap-event-9616-9989-27-header
>> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (  corosync.c:134   )  notice: corosync_node_name:        Unable to get node name for nodeid 172793107
>
> I wonder if you need to be including the nodeid too. ie.
>
> node {
>   name: dev-cluster2-node2
>   ring0_addr: 10.76.157.17
>   nodeid: 2
> }
>
> I _thought_ that was implicit.
> Chrissie: is "nodelist.node.%d.nodeid" always available for corosync2 or only if explicitly defined in the config?
>

You do need to specify a nodeid if you don't want corosync to imply it 
from the IP address (or you're using IPv6). corosync won't imply a 
nodeif from the order of the nodes in corosync.conf - that's not 
reliable enough. Also bear in mind that 0 is not a valid node number :-)

Chrissie