[Pacemaker] Node name problems after upgrading to 1.1.9
Bernardo Cabezas Serra
bcabezas at apsl.net
Thu Jun 27 10:20:08 UTC 2013
Hello,
Our cluster was working OK on corosync stack, with corosync 2.3.0 and
pacemaker 1.1.8.
After upgrading (full versions and configs below), we began to have
problems with node names.
It's a two node cluster, with node names "turifel" (DC) and "selavi".
When selavi joins cluster, we have this warning at selavi log:
-----
Jun 27 11:54:29 selavi attrd[11998]: notice: corosync_node_name:
Unable to get node name for nodeid 168385827
Jun 27 11:54:29 selavi attrd[11998]: notice: get_node_name: Defaulting
to uname -n for the local corosync node name
-----
This is ok, and also happenned with version 1.1.8.
At corosync level, all seems ok:
----
Jun 27 11:51:18 turifel corosync[6725]: [TOTEM ] A processor joined or
left the membership and a new membership (10.9.93.35:1184) was formed.
Jun 27 11:51:18 turifel corosync[6725]: [QUORUM] Members[2]: 168385827
168385835
Jun 27 11:51:18 turifel corosync[6725]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 27 11:51:18 turifel crmd[19526]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node selavi[168385827] - state is now member
(was lost)
-------
But when starting pacemaker on selavi (the new node), turifel log shows
this:
----
Jun 27 11:54:28 turifel crmd[19526]: notice: do_state_transition:
State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN
cause=C_FSA_INTERNAL origin=peer_update_callback ]
Jun 27 11:54:28 turifel crmd[19526]: warning: crm_get_peer: Node
'selavi' and 'selavi' share the same cluster nodeid: 168385827
Jun 27 11:54:28 turifel crmd[19526]: warning: crmd_cs_dispatch:
Recieving messages from a node we think is dead: selavi[0]
Jun 27 11:54:29 turifel crmd[19526]: warning: crm_get_peer: Node
'selavi' and 'selavi' share the same cluster nodeid: 168385827
Jun 27 11:54:29 turifel crmd[19526]: warning: do_state_transition: Only
1 of 2 cluster nodes are eligible to run resources - continue 0
Jun 27 11:54:29 turifel attrd[19524]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
----
And selavi remains on pending state. Some times turifel (DC) fences
selavi, but other times remains pending forever.
On turifel node, all resources gives warnings like this one:
warning: custom_action: Action p_drbd_ha0:0_monitor_0 on selavi is
unrunnable (pending)
On both nodes, uname -n and crm_node -n gives correct node names (selavi
and turifel respectively)
¿Do you think it's a configuration problem?
Below I give information about versions and configurations.
Best regards,
Bernardo.
-----
Versions (git/hg compiled versions):
corosync: 2.3.0.66-615d
pacemaker: 1.1.9-61e4b8f
cluster-glue: 1.0.11
libqb: 0.14.4.43-bb4c3
resource-agents: 3.9.5.98-3b051
crmsh: 1.2.5
Cluster also has drbd, dlm and gfs2, but I think versions are unrelevant
here.
--------
Output of pacemaker configuration:
./configure --prefix=/opt/ha --without-cman \
--without-heartbeat --with-corosync \
--enable-fatal-warnings=no --with-lcrso-dir=/opt/ha/libexec/lcrso
pacemaker configuration:
Version = 1.1.9 (Build: 61e4b8f)
Features = generated-manpages ascii-docs ncurses
libqb-logging libqb-ipc lha-fencing upstart nagios corosync-native snmp
libesmtp
Prefix = /opt/ha
Executables = /opt/ha/sbin
Man pages = /opt/ha/share/man
Libraries = /opt/ha/lib
Header files = /opt/ha/include
Arch-independent files = /opt/ha/share
State information = /opt/ha/var
System configuration = /opt/ha/etc
Corosync Plugins = /opt/ha/lib
Use system LTDL = yes
HA group name = haclient
HA user name = hacluster
CFLAGS = -I/opt/ha/include -I/opt/ha/include
-I/opt/ha/include/heartbeat -I/opt/ha/include -I/opt/ha/include
-ggdb -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return
-Wbad-function-cast -Wcast-align -Wdeclaration-after-statement
-Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security
-Wformat-nonliteral -Wmissing-prototypes -Wmissing-declarations
-Wnested-externs -Wno-long-long -Wno-strict-aliasing
-Wunused-but-set-variable -Wpointer-arith -Wstrict-prototypes
-Wwrite-strings
Libraries = -lgnutls -lcorosync_common -lplumb -lpils
-lqb -lbz2 -lxslt -lxml2 -lc -luuid -lpam -lrt -ldl -lglib-2.0 -lltdl
-L/opt/ha/lib -lqb -ldl -lrt -lpthread
Stack Libraries = -L/opt/ha/lib -lqb -ldl -lrt -lpthread
-L/opt/ha/lib -lcpg -L/opt/ha/lib -lcfg -L/opt/ha/lib -lcmap
-L/opt/ha/lib -lquorum
----
Corosync config:
totem {
version: 2
crypto_cipher: none
crypto_hash: none
cluster_name: fiestaha
interface {
ringnumber: 0
ttl: 1
bindnetaddr: 10.9.93.0
mcastaddr: 226.94.1.1
mcastport: 5405
}
}
logging {
fileline: off
to_stderr: yes
to_logfile: no
to_syslog: yes
syslog_facility: local7
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
quorum {
provider: corosync_votequorum
expected_votes: 2
two_node: 1
wait_for_all: 0
}
--
APSL
*Bernardo Cabezas Serra*
*Responsable Sistemas*
Camí Vell de Bunyola 37, esc. A, local 7
07009 Polígono de Son Castelló, Palma
Mail: bcabezas at apsl.net
Skype: bernat.cabezas
Tel: 971439771
More information about the Pacemaker
mailing list