[Pacemaker] errors in corosync.log
Shravan Mishra
shravan.mishra at gmail.com
Mon Jan 18 16:20:51 UTC 2010
Hi,
I'm seeing following messages in corosync.log
=============
Jan 18 09:50:41 corosync [pcmk ] ERROR: check_message_sanity: Message
payload is corrupted: expected 1929 bytes, got 669
Jan 18 09:50:41 corosync [pcmk ] ERROR: check_message_sanity: Child
28857 spawned to record non-fatal assertion failure line 1286: sane
Jan 18 09:50:41 corosync [pcmk ] ERROR: check_message_sanity: Invalid
message 70: (dest=local:cib, from=node1.itactics.com:cib.22575,
compressed=0, size=1929, total=2521)
......
========
I'm not entirely sure what's casuing them.
Thanks
Shravan
On Mon, Jan 18, 2010 at 9:03 AM, Shravan Mishra
<shravan.mishra at gmail.com> wrote:
> Hi ,
>
> Since the interfaces on the two nodes are connected via cross over
> cable so there is no chance of that happening and since I'm using rrp:
> passive, which means that the other ring i.e. ring 1 will come into
> play only when ring 0 fails,I assume. I say this because ring 1
> interface is on the network.
>
>
> Once interesting that I observed was that
> lintomcrypt is being used for crypto reasons because I have secauth: on.
>
> But I couldn't find that library on my machine.
>
> I'm wondering if it's because of that.
>
> Basically we are using 3 interfaces eth0, eth1 and eth2.
>
> eth0 and eth2 are for ring 0 and ring 1 respectively. eth1 is the
> primary interface.
>
> This is what my drbd.conf looks like:
>
>
> ==================
> # please have a a look at the example configuration file in
> # /usr/share/doc/drbd82/drbd.conf
> #
> global {
> usage-count no;
> }
> common {
> protocol C;
> startup {
> wfc-timeout 120;
> degr-wfc-timeout 120;
> }
> }
> resource var_nsm {
> syncer {
> rate 333M;
> }
> handlers {
> fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
> after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
> }
> net {
> after-sb-1pri discard-secondary;
> }
> on node1.itactics.com {
> device /dev/drbd1;
> disk /dev/sdb3;
> address 172.20.20.1:7791;
> meta-disk internal;
> }
> on node2.itactics.com {
> device /dev/drbd1;
> disk /dev/sdb3;
> address 172.20.20.2:7791;
> meta-disk internal;
> }
> }
> =================
>
>
> eth0's of the two nodes are connected via cross over as I mentioned
> and eth1 and eth2 are on the network.
>
> I'm not a networking expert but is it possible that broadcast done by
> ,let's say, any node not in my cluster, will still cause it to come to
> my nodes through other interfaces which are attached to the network?
>
>
> We in the dev and the QA guys are testing this in parallel.
>
> And let's say there is QA cluster of two nodes and dev cluster of 2 nodes.
>
> And interfaces for both of them are hooked as I mentioned above and that
> corosync.conf for both the clusters have "bindnetaddr: 192.168.2.0".
>
> Is there possibility of bad messages for the cluster casused by the other.
>
>
> We are in the final leg of the testing and this came up.
>
> Thanks for the help.
>
>
> Shravan
>
>
>
>
>
>
> On Mon, Jan 18, 2010 at 2:58 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
>> On Sat, Jan 16, 2010 at 9:20 PM, Shravan Mishra
>> <shravan.mishra at gmail.com> wrote:
>>> Hi Guys,
>>>
>>> I'm running the following version of pacemaker and corosync
>>> corosync=1.1.1-1-2
>>> pacemaker=1.0.9-2-1
>>>
>>> Every thing had been running fine for quite some time now but then I
>>> started seeing following errors in the corosync logs,
>>>
>>>
>>> =========
>>> Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
>>> digest... ignoring.
>>> Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
>>> Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
>>> digest... ignoring.
>>> Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
>>> Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
>>> digest... ignoring.
>>> Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
>>> ========
>>>
>>> I can perform all the crm shell commands and what not but it's
>>> troubling that the above is happening.
>>>
>>> My crm_mon output looks good.
>>>
>>>
>>> I also checked the authkey and did md5sum on both it's same.
>>>
>>> Then I stopped corosync and regenerated the authkey with
>>> corosync-keygen and copied it to the the other machine but I still get
>>> the above message in the corosync log.
>>
>> Are you sure there's not a third node somewhere broadcasting on that
>> mcast and port combination?
>>
>>>
>>> Is there anything other authkey that I should look into ?
>>>
>>>
>>> corosync.conf
>>>
>>> ============
>>>
>>> # Please read the corosync.conf.5 manual page
>>> compatibility: whitetank
>>>
>>> totem {
>>> version: 2
>>> token: 3000
>>> token_retransmits_before_loss_const: 10
>>> join: 60
>>> consensus: 1500
>>> vsftype: none
>>> max_messages: 20
>>> clear_node_high_bit: yes
>>> secauth: on
>>> threads: 0
>>> rrp_mode: passive
>>>
>>> interface {
>>> ringnumber: 0
>>> bindnetaddr: 192.168.2.0
>>> #mcastaddr: 226.94.1.1
>>> broadcast: yes
>>> mcastport: 5405
>>> }
>>> interface {
>>> ringnumber: 1
>>> bindnetaddr: 172.20.20.0
>>> #mcastaddr: 226.94.1.1
>>> broadcast: yes
>>> mcastport: 5405
>>> }
>>> }
>>>
>>>
>>> logging {
>>> fileline: off
>>> to_stderr: yes
>>> to_logfile: yes
>>> to_syslog: yes
>>> logfile: /tmp/corosync.log
>>> debug: off
>>> timestamp: on
>>> logger_subsys {
>>> subsys: AMF
>>> debug: off
>>> }
>>> }
>>>
>>> service {
>>> name: pacemaker
>>> ver: 0
>>> }
>>>
>>> aisexec {
>>> user:root
>>> group: root
>>> }
>>>
>>> amf {
>>> mode: disabled
>>> }
>>>
>>>
>>> ===============
>>>
>>>
>>> Thanks
>>> Shravan
>>>
>>> _______________________________________________
>>> Pacemaker mailing list
>>> Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>
More information about the Pacemaker
mailing list