[Pacemaker] errors in corosync.log

Mon Jan 18 09:03:34 EST 2010

Hi ,

Since the interfaces on the two nodes are connected via cross over
cable so there is no chance of that happening and since I'm using rrp:
passive, which means that the other ring i.e. ring 1 will come into
play only when ring 0 fails,I assume.  I say this because ring 1
interface is on the network.

Once interesting that I observed was that
 lintomcrypt is being used for crypto reasons because I have secauth: on.

But I couldn't find that library on my machine.

I'm wondering if it's because of that.

Basically we are using 3 interfaces eth0, eth1 and eth2.

eth0 and eth2 are for ring 0 and ring 1 respectively. eth1 is the
primary interface.

This is what my drbd.conf looks like:

==================
# please have a a look at the example configuration file in
# /usr/share/doc/drbd82/drbd.conf
#
global {
        usage-count no;
}
common {
                protocol C;
      startup {
        wfc-timeout 120;
        degr-wfc-timeout 120;
      }
}
resource var_nsm {
                syncer {
                rate 333M;
        }
                handlers {
                        fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
                        after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
                }
                net {
                        after-sb-1pri discard-secondary;
                }
                on node1.itactics.com {
        device /dev/drbd1;
         disk /dev/sdb3;
         address 172.20.20.1:7791;
         meta-disk internal;
      }
    on node2.itactics.com {
        device /dev/drbd1;
         disk /dev/sdb3;
         address 172.20.20.2:7791;
         meta-disk internal;
                }
}
=================

eth0's of the two nodes are connected via cross over as I mentioned
and eth1 and eth2 are on the network.

I'm not a networking expert but is it possible that broadcast done by
,let's say, any node not in my cluster, will still cause it to come to
my nodes through other interfaces which are attached to the network?

We in the dev and the QA guys are testing this in parallel.

And let's say there is QA cluster of two nodes and dev cluster of 2 nodes.

And interfaces for both of them are hooked as I mentioned above and that
corosync.conf for both the clusters have  "bindnetaddr: 192.168.2.0".

Is there possibility of bad messages for the cluster casused by the other.

We are in the final leg of the testing and this came up.

Thanks for the help.

Shravan

On Mon, Jan 18, 2010 at 2:58 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
> On Sat, Jan 16, 2010 at 9:20 PM, Shravan Mishra
> <shravan.mishra at gmail.com> wrote:
>> Hi Guys,
>>
>> I'm running the following version of pacemaker and corosync
>> corosync=1.1.1-1-2
>> pacemaker=1.0.9-2-1
>>
>> Every thing had been running fine for quite some time now but then I
>> started seeing following errors in the corosync logs,
>>
>>
>> =========
>> Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
>> digest... ignoring.
>> Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
>> Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
>> digest... ignoring.
>> Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
>> Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
>> digest... ignoring.
>> Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
>> ========
>>
>> I can perform all the crm shell commands and what not but it's
>> troubling that the above is happening.
>>
>> My crm_mon output looks good.
>>
>>
>> I also checked the authkey and did md5sum on both it's same.
>>
>> Then I stopped corosync and regenerated the authkey with
>> corosync-keygen and copied it to the the other machine but I still get
>> the above message in the corosync log.
>
> Are you sure there's not a third node somewhere broadcasting on that
> mcast and port combination?
>
>>
>> Is there anything other authkey that I should look into ?
>>
>>
>> corosync.conf
>>
>> ============
>>
>> # Please read the corosync.conf.5 manual page
>> compatibility: whitetank
>>
>> totem {
>>        version: 2
>>        token: 3000
>>        token_retransmits_before_loss_const: 10
>>        join: 60
>>        consensus: 1500
>>        vsftype: none
>>        max_messages: 20
>>        clear_node_high_bit: yes
>>        secauth: on
>>        threads: 0
>>        rrp_mode: passive
>>
>>        interface {
>>                ringnumber: 0
>>                bindnetaddr: 192.168.2.0
>>                #mcastaddr: 226.94.1.1
>>                broadcast: yes
>>                mcastport: 5405
>>        }
>>        interface {
>>                ringnumber: 1
>>                bindnetaddr: 172.20.20.0
>>                #mcastaddr: 226.94.1.1
>>                broadcast: yes
>>                mcastport: 5405
>>        }
>> }
>>
>>
>> logging {
>>        fileline: off
>>        to_stderr: yes
>>        to_logfile: yes
>>        to_syslog: yes
>>        logfile: /tmp/corosync.log
>>        debug: off
>>        timestamp: on
>>        logger_subsys {
>>                subsys: AMF
>>                debug: off
>>        }
>> }
>>
>> service {
>>        name: pacemaker
>>        ver: 0
>> }
>>
>> aisexec {
>>        user:root
>>        group: root
>> }
>>
>> amf {
>>        mode: disabled
>> }
>>
>>
>> ===============
>>
>>
>> Thanks
>> Shravan
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>