[Pacemaker] 4th node does not return to cluster
Gabriel Gomiz
ggomiz at cooperativaobrera.com.ar
Tue Aug 16 23:27:14 CET 2011
Hi to all... :)
We are experiencing some difficulties with a pacemaker 4 node cluster. 3 nodes are ok but a 4th
node, after some corosync failures (with core dumps) and pacemaker restarts included, does not
returns to cluster.
In the other 3 nodes the 4th appears online, but in the 4th node there is a empty cib when I display
crm.
Something weird in the logs is this kind of messages:
Aug 16 19:07:15 lorien.cooperativaobrera.com.ar cib: [28120]: WARN: cib_peer_callback: Discarding
cib_modify message (421) from mordor.cooperativaobrera.com.ar: not in our membership
It seems as the 4th node is not considering itself as a member of the cluster. How can I rejoin the
member again?
Any help you cah give me will be highly appreciated.
Many thanks in advance
PD: If you need any additional logs, tests I can make, etc. I'm willing to make it.
-----
DATA:
OS is CENTOS 6.0 64 bits
PACEMAKER version 1.1.5
COROSYNC 1.2.3-21
NODE 1:
[DB1] gandalf # crm_mon -1
============
Last updated: Tue Aug 16 19:21:05 2011
Stack: openais
Current DC: gandalf.cooperativaobrera.com.ar - partition with quorum
Version: 1.1.5-1.1.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
4 Nodes configured, 4 expected votes
1 Resources configured.
============
Online: [ isildur.cooperativaobrera.com.ar gandalf.cooperativaobrera.com.ar
mordor.cooperativaobrera.com.ar lorien.cooperativaobrera.com.ar ]
Resource Group: dashboard
fs_dashboard (ocf::heartbeat:Filesystem): Started isildur.cooperativaobrera.com.ar
ip_dashboard (ocf::heartbeat:IPaddr): Started isildur.cooperativaobrera.com.ar
srv_httpd_dashboard (lsb:httpd.dashboard): Started isildur.cooperativaobrera.com.ar
srv_dashjobs (lsb:dashjobs): Started isildur.cooperativaobrera.com.ar
NODE 2:
[DB2] isildur # crm_mon -1
============
Last updated: Tue Aug 16 19:21:28 2011
Stack: openais
Current DC: gandalf.cooperativaobrera.com.ar - partition with quorum
Version: 1.1.5-1.1.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
4 Nodes configured, 4 expected votes
1 Resources configured.
============
Online: [ isildur.cooperativaobrera.com.ar gandalf.cooperativaobrera.com.ar
mordor.cooperativaobrera.com.ar lorien.cooperativaobrera.com.ar ]
Resource Group: dashboard
fs_dashboard (ocf::heartbeat:Filesystem): Started isildur.cooperativaobrera.com.ar
ip_dashboard (ocf::heartbeat:IPaddr): Started isildur.cooperativaobrera.com.ar
srv_httpd_dashboard (lsb:httpd.dashboard): Started isildur.cooperativaobrera.com.ar
srv_dashjobs (lsb:dashjobs): Started isildur.cooperativaobrera.com.ar
NODE 3:
[VM1] mordor # crm_mon -1
============
Last updated: Tue Aug 16 19:21:40 2011
Stack: openais
Current DC: gandalf.cooperativaobrera.com.ar - partition with quorum
Version: 1.1.5-1.1.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
4 Nodes configured, 4 expected votes
1 Resources configured.
============
Online: [ isildur.cooperativaobrera.com.ar gandalf.cooperativaobrera.com.ar
mordor.cooperativaobrera.com.ar lorien.cooperativaobrera.com.ar ]
Resource Group: dashboard
fs_dashboard (ocf::heartbeat:Filesystem): Started isildur.cooperativaobrera.com.ar
ip_dashboard (ocf::heartbeat:IPaddr): Started isildur.cooperativaobrera.com.ar
srv_httpd_dashboard (lsb:httpd.dashboard): Started isildur.cooperativaobrera.com.ar
srv_dashjobs (lsb:dashjobs): Started isildur.cooperativaobrera.com.ar
NODE 4:
[VM2] lorien # crm_mon -1
============
Last updated: Tue Aug 16 19:21:54 2011
Current DC: NONE
0 Nodes configured, unknown expected votes
0 Resources configured.
============
LOGS ON NODE 4:
<attached>
CONFIG COROSYNC (NODE 4, other nodes are the same but changing bindnetaddr):
compatibility: whitetank
totem {
version: 2
secauth: off
threads: 0
interface {
ringnumber: 0
bindnetaddr: 192.168.238.43
mcastaddr: 226.94.2.1
mcastport: 5405
}
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
to_syslog: yes
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}
}
amf {
mode: disabled
}
service {
# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 1
}
--
.^. Lic. Gabriel Gomiz - Red Hat Certified Engineer (RHCE)
/V\ Jefe de Sistemas - Administrador Red y Servidores
// \\ Gerencia de Sistemas - Cooperativa Obrera Ltda.
/( )\ Tel (0291) 456-0084
^^-^^ s/Window[$s]/LINUX!!/g or die;
PGP: http://admin.cooperativaobrera.com.ar/pgp/ggomiz.txt
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: corosync.log
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20110816/fd8c4605/attachment-0001.ksh>
More information about the Pacemaker
mailing list