[Pacemaker] Node name problems after upgrading to 1.1.9
emmanuel segura
emi2fast at gmail.com
Thu Jun 27 10:26:18 UTC 2013
Hello Bernardo
I don't know if this is the problem, but try this option
clear_node_high_bit
This configuration option is optional and is only relevant
when no nodeid is specified. Some openais clients require a signed 32
bit nodeid that is
greater than zero however by default openais uses all 32 bits
of the IPv4 address space when generating a nodeid. Set this option to yes
to force the high
bit to be zero and therefor ensure the nodeid is a positive
signed 32 bit integer.
WARNING: The clusters behavior is undefined if this option is
enabled on only a subset of the cluster (for example during a rolling
upgrade).
Thanks
2013/6/27 Bernardo Cabezas Serra <bcabezas at apsl.net>
> Hello,
>
> Our cluster was working OK on corosync stack, with corosync 2.3.0 and
> pacemaker 1.1.8.
>
> After upgrading (full versions and configs below), we began to have
> problems with node names.
> It's a two node cluster, with node names "turifel" (DC) and "selavi".
>
> When selavi joins cluster, we have this warning at selavi log:
>
> -----
> Jun 27 11:54:29 selavi attrd[11998]: notice: corosync_node_name:
> Unable to get node name for nodeid 168385827
> Jun 27 11:54:29 selavi attrd[11998]: notice: get_node_name: Defaulting
> to uname -n for the local corosync node name
> -----
>
> This is ok, and also happenned with version 1.1.8.
>
> At corosync level, all seems ok:
> ----
> Jun 27 11:51:18 turifel corosync[6725]: [TOTEM ] A processor joined or
> left the membership and a new membership (10.9.93.35:1184) was formed.
> Jun 27 11:51:18 turifel corosync[6725]: [QUORUM] Members[2]: 168385827
> 168385835
> Jun 27 11:51:18 turifel corosync[6725]: [MAIN ] Completed service
> synchronization, ready to provide service.
> Jun 27 11:51:18 turifel crmd[19526]: notice: crm_update_peer_state:
> pcmk_quorum_notification: Node selavi[168385827] - state is now member
> (was lost)
> -------
>
> But when starting pacemaker on selavi (the new node), turifel log shows
> this:
>
> ----
> Jun 27 11:54:28 turifel crmd[19526]: notice: do_state_transition:
> State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN
> cause=C_FSA_INTERNAL origin=peer_update_callback ]
> Jun 27 11:54:28 turifel crmd[19526]: warning: crm_get_peer: Node
> 'selavi' and 'selavi' share the same cluster nodeid: 168385827
> Jun 27 11:54:28 turifel crmd[19526]: warning: crmd_cs_dispatch:
> Recieving messages from a node we think is dead: selavi[0]
> Jun 27 11:54:29 turifel crmd[19526]: warning: crm_get_peer: Node
> 'selavi' and 'selavi' share the same cluster nodeid: 168385827
> Jun 27 11:54:29 turifel crmd[19526]: warning: do_state_transition: Only
> 1 of 2 cluster nodes are eligible to run resources - continue 0
> Jun 27 11:54:29 turifel attrd[19524]: notice: attrd_local_callback:
> Sending full refresh (origin=crmd)
> ----
>
> And selavi remains on pending state. Some times turifel (DC) fences
> selavi, but other times remains pending forever.
>
> On turifel node, all resources gives warnings like this one:
> warning: custom_action: Action p_drbd_ha0:0_monitor_0 on selavi is
> unrunnable (pending)
>
> On both nodes, uname -n and crm_node -n gives correct node names (selavi
> and turifel respectively)
>
> ¿Do you think it's a configuration problem?
>
>
> Below I give information about versions and configurations.
>
> Best regards,
> Bernardo.
>
>
> -----
> Versions (git/hg compiled versions):
>
> corosync: 2.3.0.66-615d
> pacemaker: 1.1.9-61e4b8f
> cluster-glue: 1.0.11
> libqb: 0.14.4.43-bb4c3
> resource-agents: 3.9.5.98-3b051
> crmsh: 1.2.5
>
> Cluster also has drbd, dlm and gfs2, but I think versions are unrelevant
> here.
>
> --------
> Output of pacemaker configuration:
> ./configure --prefix=/opt/ha --without-cman \
> --without-heartbeat --with-corosync \
> --enable-fatal-warnings=no --with-lcrso-dir=/opt/ha/libexec/lcrso
>
> pacemaker configuration:
> Version = 1.1.9 (Build: 61e4b8f)
> Features = generated-manpages ascii-docs ncurses
> libqb-logging libqb-ipc lha-fencing upstart nagios corosync-native snmp
> libesmtp
>
> Prefix = /opt/ha
> Executables = /opt/ha/sbin
> Man pages = /opt/ha/share/man
> Libraries = /opt/ha/lib
> Header files = /opt/ha/include
> Arch-independent files = /opt/ha/share
> State information = /opt/ha/var
> System configuration = /opt/ha/etc
> Corosync Plugins = /opt/ha/lib
>
> Use system LTDL = yes
>
> HA group name = haclient
> HA user name = hacluster
>
> CFLAGS = -I/opt/ha/include -I/opt/ha/include
> -I/opt/ha/include/heartbeat -I/opt/ha/include -I/opt/ha/include
> -ggdb -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return
> -Wbad-function-cast -Wcast-align -Wdeclaration-after-statement
> -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security
> -Wformat-nonliteral -Wmissing-prototypes -Wmissing-declarations
> -Wnested-externs -Wno-long-long -Wno-strict-aliasing
> -Wunused-but-set-variable -Wpointer-arith -Wstrict-prototypes
> -Wwrite-strings
> Libraries = -lgnutls -lcorosync_common -lplumb -lpils
> -lqb -lbz2 -lxslt -lxml2 -lc -luuid -lpam -lrt -ldl -lglib-2.0 -lltdl
> -L/opt/ha/lib -lqb -ldl -lrt -lpthread
> Stack Libraries = -L/opt/ha/lib -lqb -ldl -lrt -lpthread
> -L/opt/ha/lib -lcpg -L/opt/ha/lib -lcfg -L/opt/ha/lib -lcmap
> -L/opt/ha/lib -lquorum
>
> ----
> Corosync config:
>
> totem {
> version: 2
> crypto_cipher: none
> crypto_hash: none
> cluster_name: fiestaha
> interface {
> ringnumber: 0
> ttl: 1
> bindnetaddr: 10.9.93.0
> mcastaddr: 226.94.1.1
> mcastport: 5405
> }
> }
> logging {
> fileline: off
> to_stderr: yes
> to_logfile: no
> to_syslog: yes
> syslog_facility: local7
> debug: off
> timestamp: on
> logger_subsys {
> subsys: QUORUM
> debug: off
> }
> }
> quorum {
> provider: corosync_votequorum
> expected_votes: 2
> two_node: 1
> wait_for_all: 0
> }
>
>
>
>
>
>
>
>
>
>
>
>
> --
> APSL
> *Bernardo Cabezas Serra*
> *Responsable Sistemas*
> Camí Vell de Bunyola 37, esc. A, local 7
> 07009 Polígono de Son Castelló, Palma
> Mail: bcabezas at apsl.net
> Skype: bernat.cabezas
> Tel: 971439771
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
--
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130627/164c7379/attachment.htm>
More information about the Pacemaker
mailing list