[Pacemaker] different behavior cibadmin -Ql with cman and corosync2
Andrew Beekhof
andrew at beekhof.net
Tue Sep 3 04:20:30 UTC 2013
On 02/09/2013, at 5:27 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>
>
> 30.08.2013, 07:18, "Andrew Beekhof" <andrew at beekhof.net>:
>> On 29/08/2013, at 7:31 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>
>>> 29.08.2013, 12:25, "Andrey Groshev" <greenx at yandex.ru>:
>>>> 29.08.2013, 02:55, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>> On 28/08/2013, at 5:38 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>> 28.08.2013, 04:06, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>> On 27/08/2013, at 1:13 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>> 27.08.2013, 05:39, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>> On 26/08/2013, at 3:09 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>> 26.08.2013, 03:34, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>>> On 23/08/2013, at 9:39 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> Today I try remake my test cluster from cman to corosync2.
>>>>>>>>>>>> I drew attention to the following:
>>>>>>>>>>>> If I reset cluster with cman through cibadmin --erase --force
>>>>>>>>>>>> In cib is still there exist names of nodes.
>>>>>>>>>>> Yes, the cluster puts back entries for all the nodes it know about automagically.
>>>>>>>>>>>> cibadmin -Ql
>>>>>>>>>>>> .....
>>>>>>>>>>>> <nodes>
>>>>>>>>>>>> <node id="dev-cluster2-node2.unix.tensor.ru" uname="dev-cluster2-node2"/>
>>>>>>>>>>>> <node id="dev-cluster2-node4.unix.tensor.ru" uname="dev-cluster2-node4"/>
>>>>>>>>>>>> <node id="dev-cluster2-node3.unix.tensor.ru" uname="dev-cluster2-node3"/>
>>>>>>>>>>>> </nodes>
>>>>>>>>>>>> ....
>>>>>>>>>>>>
>>>>>>>>>>>> Even if cman and pacemaker running only one node.
>>>>>>>>>>> I'm assuming all three are configured in cluster.conf?
>>>>>>>>>> Yes, there exist list nodes.
>>>>>>>>>>>> And if I do too on cluster with corosync2
>>>>>>>>>>>> I see only names of nodes which run corosync and pacemaker.
>>>>>>>>>>> Since you're not included your config, I can only guess that your corosync.conf does not have a nodelist.
>>>>>>>>>>> If it did, you should get the same behaviour.
>>>>>>>>>> I try and expected_node and nodelist.
>>>>>>>>> And it didn't work? What version of pacemaker?
>>>>>>>> It does not work as I expected.
>>>>>>> Thats because you've used IP addresses in the node list.
>>>>>>> ie.
>>>>>>>
>>>>>>> node {
>>>>>>> ring0_addr: 10.76.157.17
>>>>>>> }
>>>>>>>
>>>>>>> try including the node name as well, eg.
>>>>>>>
>>>>>>> node {
>>>>>>> name: dev-cluster2-node2
>>>>>>> ring0_addr: 10.76.157.17
>>>>>>> }
>>>>>> The same thing.
>>>>> I don't know what to say. I tested it here yesterday and it worked as expected.
>>>> I found that the reason that You and I have different results - I did not have reverse DNS zone for these nodes.
>>>> I know what it should be, but (PACEMAKER + CMAN) worked without a reverse area!
>>> Hasty. Deleted all. Reinstalled. Configured. Not working again. Damn!
>>
>> It would have surprised me... pacemaker 1.1.11 doesn't do any dns lookups - reverse or otherwise.
>> Can you set
>>
>> PCMK_trace_files=corosync.c
>>
>> in your environment and retest?
>>
>> On RHEL6 that means putting the following in /etc/sysconfig/pacemaker
>> export PCMK_trace_files=corosync.c
>>
>> It should produce additional logging[1] that will help diagnose the issue.
>>
>> [1] http://blog.clusterlabs.org/blog/2013/pacemaker-logging/
>>
>
> Hello, Andrew.
>
> You are a little misunderstood me.
No, I understood you fine.
> I wrote that I rushed to judgment.
> After I did the reverse DNS zone, the cluster behaved correctly.
> BUT after I took apart the cluster dropped configs and restarted on the new cluster,
> cluster again don't showed all the nodes in the nodes (only node with running pacemaker).
>
> A small portion of the log. Full log
> In which (I thought) there is something interesting.
>
> Aug 30 12:31:11 [9986] dev-cluster2-node4 cib: ( corosync.c:423 ) trace: check_message_sanity: Verfied message 4: (dest=<all>:cib, from=dev-cluster2-node4:cib.9986, compressed=0, size=1551, total=2143)
> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: ( corosync.c:96 ) trace: corosync_node_name: Checking 172793107 vs 0 from nodelist.node.0.nodeid
> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: ( ipcc.c:378 ) debug: qb_ipcc_disconnect: qb_ipcc_disconnect()
> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: (ringbuffer.c:294 ) debug: qb_rb_close: Closing ringbuffer: /dev/shm/qb-cmap-request-9616-9989-27-header
> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: (ringbuffer.c:294 ) debug: qb_rb_close: Closing ringbuffer: /dev/shm/qb-cmap-response-9616-9989-27-header
> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: (ringbuffer.c:294 ) debug: qb_rb_close: Closing ringbuffer: /dev/shm/qb-cmap-event-9616-9989-27-header
> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: ( corosync.c:134 ) notice: corosync_node_name: Unable to get node name for nodeid 172793107
I wonder if you need to be including the nodeid too. ie.
node {
name: dev-cluster2-node2
ring0_addr: 10.76.157.17
nodeid: 2
}
I _thought_ that was implicit.
Chrissie: is "nodelist.node.%d.nodeid" always available for corosync2 or only if explicitly defined in the config?
> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: ( cluster.c:338 ) notice: get_node_name: Defaulting to uname -n for the local corosync node name
> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: ( attrd.c:651 ) debug: attrd_cib_callback: Update 4 for probe_complete=true passed
> Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug [QB ] HUP conn (9616-9989-27)
> Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug [QB ] qb_ipcs_disconnect(9616-9989-27) state:2
> Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug [QB ] epoll_ctl(del): Bad file descriptor (9)
> Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug [MAIN ] cs_ipcs_connection_closed()
> Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug [CMAP ] exit_fn for conn=0x7fa96bcb31b0
> Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug [MAIN ] cs_ipcs_connection_destroyed()
> Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug [QB ] Free'ing ringbuffer: /dev/shm/qb-cmap-response-9616-9989-27-header
> Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug [QB ] Free'ing ringbuffer: /dev/shm/qb-cmap-event-9616-9989-27-header
> Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug [QB ] Free'ing ringbuffer: /dev/shm/qb-cmap-request-9616-9989-27-header
> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: ( corosync.c:423 ) trace: check_message_sanity: Verfied message 1: (dest=<all>:attrd, from=dev-cluster2-node4:attrd.9989, compressed=0, size=181, total=773)
> Aug 30 12:31:42 [9984] dev-cluster2-node4 pacemakerd: ( mainloop.c:270 ) info: crm_signal_dispatch: Invoking handler for signal 10: User defined signal 1
> Aug 30 12:31:59 [9986] dev-cluster2-node4 cib: ( ipc.c:307 ) info: crm_client_new: Connecting 0x16c98e0 for uid=0 gid=0 pid=10007 id=f2f15044-8f76-4ea7-a714-984660619ae7
> Aug 30 12:31:59 [9986] dev-cluster2-node4 cib: ( ipc_setup.c:476 ) debug: handle_new_connection: IPC credentials authenticated (9986-10007-13)
> Aug 30 12:31:59 [9986] dev-cluster2-node4 cib: ( ipc_shm.c:294 ) debug: qb_ipcs_shm_connect: connecting to client [10007]
> Aug 30 12:31:59 [9986] dev-cluster2-node4 cib: (ringbuffer.c:227 ) debug: qb_rb_open_2: shm size:524288; real_size:524288; rb->word_size:131072
> Aug 30 12:31:59 [9986] dev-cluster2-node4 cib: (ringbuffer.c:227 ) debug: qb_rb_open_2: shm size:524288; real_size:524288; rb->word_size:131072
> Aug 30 12:31:59 [9986] dev-cluster2-node4 cib: (ringbuffer.c:227 ) debug: qb_rb_open_2: shm size:524288; real_size:524288; rb->word_size:131072
> Aug 30 12:31:59 [9986] dev-cluster2-node4 cib: ( io.c:579 ) debug: activateCibXml: Triggering CIB write for cib_erase op
> Aug 30 12:31:59 [9991] dev-cluster2-node4 crmd: (te_callbacks:122 ) debug: te_update_diff: Processing diff (cib_erase): 0.9.3 -> 0.11.1 (S_IDLE)
> Aug 30 12:31:59 [9991] dev-cluster2-node4 crmd: ( te_utils.c:423 ) info: abort_transition_graph: te_update_diff:126 - Triggered transition abort (complete=1, node=, tag=diff, id=(null), magic=NA, cib=0.11.1) : Non-status change
>
>
>
>>>>>> # corosync-cmapctl |grep nodelist
>>>>>> nodelist.local_node_pos (u32) = 2
>>>>>> nodelist.node.0.name (str) = dev-cluster2-node2
>>>>>> nodelist.node.0.ring0_addr (str) = 10.76.157.17
>>>>>> nodelist.node.1.name (str) = dev-cluster2-node3
>>>>>> nodelist.node.1.ring0_addr (str) = 10.76.157.18
>>>>>> nodelist.node.2.name (str) = dev-cluster2-node4
>>>>>> nodelist.node.2.ring0_addr (str) = 10.76.157.19
>>>>>>
>>>>>> # corosync-quorumtool -s
>>>>>> Quorum information
>>>>>> ------------------
>>>>>> Date: Wed Aug 28 11:29:49 2013
>>>>>> Quorum provider: corosync_votequorum
>>>>>> Nodes: 1
>>>>>> Node ID: 172793107
>>>>>> Ring ID: 52
>>>>>> Quorate: No
>>>>>>
>>>>>> Votequorum information
>>>>>> ----------------------
>>>>>> Expected votes: 3
>>>>>> Highest expected: 3
>>>>>> Total votes: 1
>>>>>> Quorum: 2 Activity blocked
>>>>>> Flags:
>>>>>>
>>>>>> Membership information
>>>>>> ----------------------
>>>>>> Nodeid Votes Name
>>>>>> 172793107 1 dev-cluster2-node4 (local)
>>>>>>
>>>>>> # cibadmin -Q
>>>>>> <cib epoch="25" num_updates="3" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.7" cib-last-written="Wed Aug 28 11:24:06 2013" update-origin="dev-cluster2-node4" update-client="crmd" have-quorum="0" dc-uuid="172793107">
>>>>>> <configuration>
>>>>>> <crm_config>
>>>>>> <cluster_property_set id="cib-bootstrap-options">
>>>>>> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.11-1.el6-4f672bc"/>
>>>>>> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
>>>>>> </cluster_property_set>
>>>>>> </crm_config>
>>>>>> <nodes>
>>>>>> <node id="172793107" uname="dev-cluster2-node4"/>
>>>>>> </nodes>
>>>>>> <resources/>
>>>>>> <constraints/>
>>>>>> </configuration>
>>>>>> <status>
>>>>>> <node_state id="172793107" uname="dev-cluster2-node4" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
>>>>>> <lrm id="172793107">
>>>>>> <lrm_resources/>
>>>>>> </lrm>
>>>>>> <transient_attributes id="172793107">
>>>>>> <instance_attributes id="status-172793107">
>>>>>> <nvpair id="status-172793107-probe_complete" name="probe_complete" value="true"/>
>>>>>> </instance_attributes>
>>>>>> </transient_attributes>
>>>>>> </node_state>
>>>>>> </status>
>>>>>> </cib>
>>>>>>>> I figured out a way get around this, but it would be easier to do if the CIB has worked as a with CMAN.
>>>>>>>> I just do not start the main resource if the attribute is not defined or it is not true.
>>>>>>>> This slightly changes the logic of the cluster.
>>>>>>>> But I'm not sure what the correct behavior.
>>>>>>>>
>>>>>>>> libqb 0.14.4
>>>>>>>> corosync 2.3.1
>>>>>>>> pacemaker 1.1.11
>>>>>>>>
>>>>>>>> All build from source in previews week.
>>>>>>>>>> Now in corosync.conf:
>>>>>>>>>>
>>>>>>>>>> totem {
>>>>>>>>>> version: 2
>>>>>>>>>> crypto_cipher: none
>>>>>>>>>> crypto_hash: none
>>>>>>>>>> interface {
>>>>>>>>>> ringnumber: 0
>>>>>>>>>> bindnetaddr: 10.76.157.18
>>>>>>>>>> mcastaddr: 239.94.1.56
>>>>>>>>>> mcastport: 5405
>>>>>>>>>> ttl: 1
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> logging {
>>>>>>>>>> fileline: off
>>>>>>>>>> to_stderr: no
>>>>>>>>>> to_logfile: yes
>>>>>>>>>> logfile: /var/log/cluster/corosync.log
>>>>>>>>>> to_syslog: yes
>>>>>>>>>> debug: on
>>>>>>>>>> timestamp: on
>>>>>>>>>> logger_subsys {
>>>>>>>>>> subsys: QUORUM
>>>>>>>>>> debug: on
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> quorum {
>>>>>>>>>> provider: corosync_votequorum
>>>>>>>>>> }
>>>>>>>>>> nodelist {
>>>>>>>>>> node {
>>>>>>>>>> ring0_addr: 10.76.157.17
>>>>>>>>>> }
>>>>>>>>>> node {
>>>>>>>>>> ring0_addr: 10.76.157.18
>>>>>>>>>> }
>>>>>>>>>> node {
>>>>>>>>>> ring0_addr: 10.76.157.19
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>
>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>> ,
>>>>>>>>> _______________________________________________
>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>
>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>> _______________________________________________
>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>> ,
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>> ,
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>> ,
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130903/67ca5e76/attachment-0004.sig>
More information about the Pacemaker
mailing list