[Pacemaker] No communication between nodes (setup problem)

Wed Jan 30 14:27:24 CET 2013

Hans,

   Is the multicast port 5405 "opened" in the firewall? That has bitten me before.

Thanks,

Keith

________________________________
From: Hans Bert [dadeda2002 at yahoo.de]
Sent: Wednesday, January 30, 2013 8:22 AM
To: and k; The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] No communication between nodes (setup problem)

Hi,

in the meantime I modified the configuration to check if it works with multicast

totem {
  version: 2
  secauth: off
  cluster_name: mcscluster
  interface {
    ringnumber: 0
    bindnetaddr: 192.168.100.0
    mcastaddr: 239.255.1.12
    mcastport: 5405
    ttl: 1
  }
}

but unfortunately it is still not working.
I started wireshark and I can see on both hosts MC packages from both hosts.

YES selinux is disabled on both nodes

[root at server1 corosync]# selinuxenabled
[root at server1 corosync]# echo $?
1

Something else I found out is:

[root at server1 corosync]# pcs status nodes both
Error mapping 192.168.100.111
Error mapping 192.168.100.112
Corosync Nodes:
 Online:
 Offline: 192.168.100.111 192.168.100.112
Pacemaker Nodes:
 Online: server1
 Standby:
 Offline:

[root at server2 corosync]# pcs status nodes both
Error mapping 192.168.100.111
Error mapping 192.168.100.112
Corosync Nodes:
 Online:
 Offline: 192.168.100.111 192.168.100.112
Pacemaker Nodes:
 Online: server2
 Standby:
 Offline:

any further hints?

Best regards,
Hans

________________________________

Hi,

It seem to be problem with network traffic.

Have you tried to sniff network traffic to be sure that udp traffic reaches from one node to another ??

Try on server1:

tcpdump -i interface -p udp -s 192.168.100.112

on server2:

tcpdump -i interface -p udp -s 192.168.100.111

if there will be no packet traffic, that means you have some network issue.

BTW: Is SELinux enabled on nodes ??

--
Regards
Andrew

2013/1/30 Hans Bert
Hello,

we had to move from Fedora 16 to Fedora 18 and wanted to set up Corosync with Pacemaker and PCS as management tool.
With F16 our cluster was running pretty good, but with F18 after 5 days we are reaching the point were we don't have
got ideas what might be the problem(s).

The cluster is build of two servers (server1=192.168.100.111; server2=192.168.100.112)

Based on the Howto for F18 with pcs we created the following corosync.conf:

totem {
  version: 2
  secauth: off
  cluster_name: mcscluster
  transport: udpu
}

nodelist {
  node {
    ring0_addr: 192.168.100.111
  }
  node {
    ring0_addr: 192.168.100.112
  }
}

quorum {
  provider: corosync_votequorum
}

logging {
  fileline: off
  to_stderr: no
  to_logfile: yes
  to_syslog: yes
  logfile: /var/log/cluster/corosync.log
  debug: on
  timestamp: on
}

After we started the server a status check shows us:

[root at server1 corosync]#pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
1868867776          1 server1 (local)

[root at server1 ~]# pcs status
Last updated: Wed Jan 30 10:45:17 2013
Last change: Wed Jan 30 10:18:56 2013 via cibadmin on server1
Stack: corosync
Current DC: server1 (1868867776) - partition WITHOUT quorum
Version: 1.1.8-3.fc18-394e906
1 Nodes configured, unknown expected votes
0 Resources configured.

Online: [ server1 ]

Full list of resources:

And on the other server:

[root at server2 corosync]# pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
1885644992          1 server2 (local)

[root at server2 corosync]# pcs status
Last updated: Wed Jan 30 10:44:40 2013
Last change: Wed Jan 30 10:19:36 2013 via cibadmin on server2
Stack: corosync
Current DC: server2 (1885644992) - partition WITHOUT quorum
Version: 1.1.8-3.fc18-394e906
1 Nodes configured, unknown expected votes
0 Resources configured.

Online: [ server2 ]

The only warnings and errors in the logfile are:

[root at server1 ~]# cat /var/log/cluster/corosync.log | egrep "warning|error"
Jan 30 10:25:59 [1608] server1       crmd:  warning: do_log:    FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
Jan 30 10:25:59 [1607] server1    pengine:  warning: cluster_status:    We do not have quorum - fencing and resource management disabled
Jan 30 10:28:25 [1525] server1 corosync debug   [QUORUM] getinfo response error: 1
Jan 30 10:40:59 [1607] server1    pengine:  warning: cluster_status:    We do not have quorum - fencing and resource management disabled

root at server2 corosync]# cat /var/log/cluster/corosync.log | egrep "warning|error"
Jan 30 10:27:18 [1458] server2       crmd:  warning: do_log:    FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
Jan 30 10:27:18 [1457] server2    pengine:  warning: cluster_status:    We do not have quorum - fencing and resource management disabled
Jan 30 10:29:19 [1349] server2 corosync debug   [QUORUM] getinfo response error: 1
Jan 30 10:42:18 [1457] server2    pengine:  warning: cluster_status:    We do not have quorum - fencing and resource management disabled
Jan 30 10:44:36 [1349] server2 corosync debug   [QUORUM] getinfo response error: 1

We have installed the following packages:

corosync-2.2.0-1.fc18.i686
corosynclib-2.2.0-1.fc18.i686
drbd-bash-completion-8.3.13-1.fc18.i686
drbd-pacemaker-8.3.13-1.fc18.i686
drbd-utils-8.3.13-1.fc18.i686
pacemaker-1.1.8-3.fc18.i686
pacemaker-cli-1.1.8-3.fc18.i686
pacemaker-cluster-libs-1.1.8-3.fc18.i686
pacemaker-libs-1.1.8-3.fc18.i686
pcs-0.9.27-3.fc18.i686

Firewalls are disabled, Pinging and SSH communication is working without any problems.

With best regards

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org<mailto:Pacemaker at oss.clusterlabs.org>
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org<http://www.clusterlabs.org/>
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org<http://bugs.clusterlabs.org/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20130130/09c7e995/attachment.html>