[Pacemaker] Node name problems after upgrading to 1.1.9

Thu Jun 27 10:26:18 UTC 2013

Hello Bernardo

I don't know if this is the problem, but try this option

      clear_node_high_bit
              This configuration option is optional and is only relevant
when no nodeid is specified.  Some openais clients require  a  signed  32
bit  nodeid  that  is
              greater than zero however by default openais uses all 32 bits
of the IPv4 address space when generating a nodeid.  Set this option to yes
to force the high
              bit to be zero and therefor ensure the nodeid is a positive
signed 32 bit integer.

              WARNING: The clusters behavior is undefined if this option is
enabled on only a subset of the cluster (for example during a rolling
upgrade).

Thanks


2013/6/27 Bernardo Cabezas Serra <bcabezas at apsl.net>

> Hello,
>
> Our cluster was working OK on corosync stack, with corosync 2.3.0 and
> pacemaker 1.1.8.
>
> After upgrading (full versions and configs below), we began to have
> problems with node names.
> It's a two node cluster, with node names "turifel" (DC) and "selavi".
>
> When selavi joins cluster, we have this warning at selavi log:
>
> -----
> Jun 27 11:54:29 selavi attrd[11998]:   notice: corosync_node_name:
> Unable to get node name for nodeid 168385827
> Jun 27 11:54:29 selavi attrd[11998]:   notice: get_node_name: Defaulting
> to uname -n for the local corosync node name
> -----
>
> This is ok, and also happenned with version 1.1.8.
>
> At corosync level, all seems ok:
> ----
> Jun 27 11:51:18 turifel corosync[6725]:   [TOTEM ] A processor joined or
> left the membership and a new membership (10.9.93.35:1184) was formed.
> Jun 27 11:51:18 turifel corosync[6725]:   [QUORUM] Members[2]: 168385827
> 168385835
> Jun 27 11:51:18 turifel corosync[6725]:   [MAIN  ] Completed service
> synchronization, ready to provide service.
> Jun 27 11:51:18 turifel crmd[19526]:   notice: crm_update_peer_state:
> pcmk_quorum_notification: Node selavi[168385827] - state is now member
> (was lost)
> -------
>
> But when starting pacemaker on selavi (the new node), turifel log shows
> this:
>
> ----
> Jun 27 11:54:28 turifel crmd[19526]:   notice: do_state_transition:
> State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN
> cause=C_FSA_INTERNAL origin=peer_update_callback ]
> Jun 27 11:54:28 turifel crmd[19526]:  warning: crm_get_peer: Node
> 'selavi' and 'selavi' share the same cluster nodeid: 168385827
> Jun 27 11:54:28 turifel crmd[19526]:  warning: crmd_cs_dispatch:
> Recieving messages from a node we think is dead: selavi[0]
> Jun 27 11:54:29 turifel crmd[19526]:  warning: crm_get_peer: Node
> 'selavi' and 'selavi' share the same cluster nodeid: 168385827
> Jun 27 11:54:29 turifel crmd[19526]:  warning: do_state_transition: Only
> 1 of 2 cluster nodes are eligible to run resources - continue 0
> Jun 27 11:54:29 turifel attrd[19524]:   notice: attrd_local_callback:
> Sending full refresh (origin=crmd)
> ----
>
> And selavi remains on pending state. Some times turifel (DC) fences
> selavi, but other times remains pending forever.
>
> On turifel node, all resources gives warnings like this one:
>  warning: custom_action: Action p_drbd_ha0:0_monitor_0 on selavi is
> unrunnable (pending)
>
> On both nodes, uname -n and crm_node -n gives correct node names (selavi
> and turifel respectively)
>
> ¿Do you think it's a configuration problem?
>
>
> Below I give information about versions and configurations.
>
> Best regards,
> Bernardo.
>
>
> -----
> Versions (git/hg compiled versions):
>
> corosync: 2.3.0.66-615d
> pacemaker: 1.1.9-61e4b8f
> cluster-glue: 1.0.11
> libqb:  0.14.4.43-bb4c3
> resource-agents: 3.9.5.98-3b051
> crmsh: 1.2.5
>
> Cluster also has drbd, dlm and gfs2, but I think versions are unrelevant
> here.
>
> --------
> Output of pacemaker configuration:
> ./configure --prefix=/opt/ha --without-cman \
>     --without-heartbeat --with-corosync \
>     --enable-fatal-warnings=no --with-lcrso-dir=/opt/ha/libexec/lcrso
>
> pacemaker configuration:
>   Version                  = 1.1.9 (Build: 61e4b8f)
>   Features                 = generated-manpages ascii-docs ncurses
> libqb-logging libqb-ipc lha-fencing upstart nagios  corosync-native snmp
> libesmtp
>
>   Prefix                   = /opt/ha
>   Executables              = /opt/ha/sbin
>   Man pages                = /opt/ha/share/man
>   Libraries                = /opt/ha/lib
>   Header files             = /opt/ha/include
>   Arch-independent files   = /opt/ha/share
>   State information        = /opt/ha/var
>   System configuration     = /opt/ha/etc
>   Corosync Plugins         = /opt/ha/lib
>
>   Use system LTDL          = yes
>
>   HA group name            = haclient
>   HA user name             = hacluster
>
>   CFLAGS                   = -I/opt/ha/include -I/opt/ha/include
> -I/opt/ha/include/heartbeat    -I/opt/ha/include   -I/opt/ha/include
> -ggdb  -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return
> -Wbad-function-cast -Wcast-align -Wdeclaration-after-statement
> -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security
> -Wformat-nonliteral -Wmissing-prototypes -Wmissing-declarations
> -Wnested-externs -Wno-long-long -Wno-strict-aliasing
> -Wunused-but-set-variable -Wpointer-arith -Wstrict-prototypes
> -Wwrite-strings
>   Libraries                = -lgnutls -lcorosync_common -lplumb -lpils
> -lqb -lbz2 -lxslt -lxml2 -lc -luuid -lpam -lrt -ldl  -lglib-2.0   -lltdl
> -L/opt/ha/lib -lqb -ldl -lrt -lpthread
>   Stack Libraries          =   -L/opt/ha/lib -lqb -ldl -lrt -lpthread
> -L/opt/ha/lib -lcpg   -L/opt/ha/lib -lcfg   -L/opt/ha/lib -lcmap
> -L/opt/ha/lib -lquorum
>
> ----
> Corosync config:
>
> totem {
>         version: 2
>         crypto_cipher: none
>         crypto_hash: none
>         cluster_name: fiestaha
>         interface {
>                 ringnumber: 0
>                 ttl: 1
>                 bindnetaddr: 10.9.93.0
>                 mcastaddr: 226.94.1.1
>                 mcastport: 5405
>         }
> }
> logging {
>         fileline: off
>         to_stderr: yes
>         to_logfile: no
>         to_syslog: yes
>         syslog_facility: local7
>         debug: off
>         timestamp: on
>         logger_subsys {
>                 subsys: QUORUM
>                 debug: off
>         }
> }
> quorum {
>         provider: corosync_votequorum
>         expected_votes: 2
>         two_node: 1
>         wait_for_all: 0
> }
>
>
>
>
>
>
>
>
>
>
>
>
> --
> APSL
> *Bernardo Cabezas Serra*
> *Responsable Sistemas*
> Camí Vell de Bunyola 37, esc. A, local 7
> 07009 Polígono de Son Castelló, Palma
> Mail: bcabezas at apsl.net
> Skype: bernat.cabezas
> Tel: 971439771
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130627/164c7379/attachment.htm>