[Pacemaker] Multiple thread after rebooting server: the node doesn't go online

Giovanni Di Milia gdimilia at cfa.harvard.edu
Thu Nov 12 23:21:41 UTC 2009


I set up a cluster of two servers CentOS 5.4 x86_64 with pacemaker  
1.06 and corosync 1.1.2

I only installed the x86_64 packages (yum install pacemaker try to  
install also the 32 bits one).

I configured a shared cluster IP (it's a public ip) and a cluster  
website.

Everything work fine if i try to stop corosync on one of the two  
servers (the services pass from one machine to the other without  
problems), but if I reboot one server, when it returns alive it cannot  
go online in the cluster.
I also noticed that there are several thread of corosync and if I kill  
all of them and then I start again corosync, everything work fine again.

I don't know what is happening and I'm not able to reproduce the same  
situation on some virtual servers!

Thanks,
Giovanni



the configuration of corosync is the following:

##############################################
# Please read the corosync.conf.5 manual page
compatibility: whitetank

aisexec {
	# Run as root - this is necessary to be able to manage resources with  
Pacemaker
	user:	root
	group:	root
}

service {
	# Load the Pacemaker Cluster Resource Manager
	ver:       0
	name:      pacemaker
	use_mgmtd: yes
	use_logd:  yes
}

totem {
	version: 2

	# How long before declaring a token lost (ms)
	token:          5000

	# How many token retransmits before forming a new configuration
	token_retransmits_before_loss_const: 10

	# How long to wait for join messages in the membership protocol (ms)
	join:           1000

	# How long to wait for consensus to be achieved before starting a new  
round of membership configuration (ms)
	consensus:      2500

	# Turn off the virtual synchrony filter
	vsftype:        none

	# Number of messages that may be sent by one processor on receipt of  
the token
	max_messages:   20

	# Stagger sending the node join messages by 1..send_join ms
	send_join: 45

	# Limit generated nodeids to 31-bits (positive signed integers)
	clear_node_high_bit: yes

	# Disable encryption
	secauth:	off

	# How many threads to use for encryption/decryption
	threads:   	0

	# Optionally assign a fixed node id (integer)
	# nodeid:         1234

	interface {
		ringnumber: 0

		# The following values need to be set based on your environment
bindnetaddr: XXX.XXX.XXX.0 #here I put the right ip for my configuration
mcastaddr: 226.94.1.1
mcastport: 4000
	}
}

logging {
	fileline: off
         to_stderr: yes
         to_logfile: yes
         to_syslog: yes
         logfile: /tmp/corosync.log
         debug: off
         timestamp: on
         logger_subsys {
                 subsys: AMF
                 debug: off
         }
}

amf {
	mode: disabled
}

##################################################






More information about the Pacemaker mailing list