[ClusterLabs] Is there a Trick to Making Corosync Work on Microsoft Azure?

Wed Aug 23 21:51:01 EDT 2017

I figured out the cause. CMAN got installed by yum, and so none of my changes to corosync.conf had any effect, including the udpu directive. Now I just have to figure out how to enable unicast in cman.

--
Eric Robinson

From: Eric Robinson [mailto:eric.robinson at psmnv.com]
Sent: Wednesday, August 23, 2017 3:16 PM
To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
Subject: [ClusterLabs] Is there a Trick to Making Corosync Work on Microsoft Azure?

I created two nodes on Micrsoft Azure, but I can't get them to join a cluster. Any thoughts?

OS: RHEL 6.9
Corosync version: 1.4.7-5.el6.x86_64
Node names: ha001a (172.28.0.4/23), ha001b (172.28.0.5/23)

The nodes are on the same subnet and can ping and ssh to each other just fine by either host name or IP address.

I have configured corosync to use unicast.

corosync-cfgtool looks fine...

[root at ha001b corosync]# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
        id      = 172.28.0.5
        status  = ring 0 active with no faults

...but corosync-objctl only shows the local node...

[root at ha001b corosync]# corosync-objctl |grep join
totem.join=60
runtime.totem.pg.mrp.srp.memb_join_tx=1
runtime.totem.pg.mrp.srp.memb_join_rx=1
runtime.totem.pg.mrp.srp.members.2.join_count=1
runtime.totem.pg.mrp.srp.members.2.status=joined

...pcs status shows...

Cluster name: ha001
Stack: cman
Current DC: ha001b (version 1.1.15-5.el6-e174ec8) - partition with quorum
Last updated: Wed Aug 23 18:04:33 2017          Last change: Wed Aug 23 17:51:07 2017 by root via cibadmin on ha001b

2 nodes and 0 resources configured

Online: [ ha001b ]
OFFLINE: [ ha001a ]

No resources

Daemon Status:
  cman: active/disabled
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/disabled

...it shows the opposite on the other node...

[root at ha001a ~]# corosync-objctl |grep join
totem.join=60
runtime.totem.pg.mrp.srp.memb_join_tx=1
runtime.totem.pg.mrp.srp.memb_join_rx=1
runtime.totem.pg.mrp.srp.members.1.join_count=1
runtime.totem.pg.mrp.srp.members.1.status=joined
[root at ha001a ~]# pcs status
Cluster name: ha001
Stack: cman
Current DC: ha001a (version 1.1.15-5.el6-e174ec8) - partition with quorum
Last updated: Wed Aug 23 18:06:04 2017          Last change: Wed Aug 23 17:51:03 2017 by root via cibadmin on ha001a

2 nodes and 0 resources configured

Online: [ ha001a ]
OFFLINE: [ ha001b ]

No resources

Daemon Status:
  cman: active/disabled
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/disabled

...here is my corosync.conf...

compatibility: whitetank

totem {
        version: 2
        secauth: off
        interface {
                member {
                        memberaddr: 172.28.0.4
                }
                member {
                        memberaddr: 172.28.0.5
                }
                ringnumber: 0
                bindnetaddr: 172.28.0.0
                mcastport: 5405
                ttl: 1
        }
        transport: udpu
}

logging {
        fileline: off
        to_logfile: yes
        to_syslog: yes
        logfile: /var/log/cluster/corosync.log
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
        }
}

I used tcpdump and I see a lot of traffic between them on port 2224, but nothing else.

Is there an issue because the bindinetaddr is 172.28.0.0 but the members have a /23 mask?

--
Eric Robinson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170824/5c1ea54a/attachment-0003.html>