[Pacemaker] Installation problems
Erich Weiler
weiler at soe.ucsc.edu
Sun Mar 7 16:33:41 UTC 2010
Hi Y'all,
I'm having some issues getting things running on a stock CentOS 5.4
install, and I was hoping someone could point me in the right direction...
Through the epel and clusterlabs repos that are referenced in the wiki,
I installed:
corosync-1.2.0-1.el5
openais-1.1.0-1.el5
pacemaker-1.0.7-4.el5
(and all dependencies, via yum)
and it all installed fine, according to yum. I installed
/etc/corosync/corosync.conf as follows:
-----
# Please read the corosync.conf.5 manual page
compatibility: whitetank
aisexec {
user: root
group: root
}
totem {
version: 2
# How long before declaring a token lost (ms)
token: 5000
# How many token retransmits before forming a new configuration
token_retransmits_before_loss_const: 20
# How long to wait for join messages in the membership protocol (ms)
join: 1000
# How long to wait for consensus to be achieved before starting
a new round of membership configuration (ms)
consensus: 7500
# Turn off the virtual synchrony filter
vsftype: none
# Number of messages that may be sent by one processor on
receipt of the token
max_messages: 20
# Disable encryption
secauth: off
# How many threads to use for encryption/decryption
threads: 0
# Limit generated nodeids to 31-bits (positive signed integers)
clear_node_high_bit: yes
# Optionally assign a fixed node id (integer)
# nodeid: 1234
interface {
ringnumber: 0
bindnetaddr: 10.1.0.255
mcastaddr: 226.94.1.90
mcastport: 4000
}
}
logging {
fileline: off
to_stderr: yes
to_logfile: yes
to_syslog: yes
logfile: /var/log/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}
}
amf {
mode: disabled
}
service {
# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 0
}
-----
Then I tried:
# /etc/init.d/corosync start
Starting Corosync Cluster Engine (corosync): [ OK ]
but then when I run crm_mon, it hangs here:
"Attempting connection to the cluster...."
and nothing happens. A 'ps' shows corosync in a weird state:
[root at server ~]# ps -afe | grep coro
root 12942 1 0 08:20 ? 00:00:00 corosync
root 12947 12942 0 08:20 ? 00:00:00 [corosync] <defunct>
root 12955 12858 0 08:20 pts/0 00:00:00 grep coro
I also tried starting corosync via '/etc/init.d/openais start' after
changing the line in the /etc/init.d/openais script:
export
COROSYNC_DEFAULT_CONFIG_IFACE="openaisserviceenableexperimental:corosync_parser"
and it seems to start, but crm_mon still can't connect and I still get
"Attempting connection to the cluster...." and corosync is in a defunct
state. Has anyone else had this problem? Are the rpms from
epel/clusterlabs not jiving with each other in some way perhaps?
Here is a clip from /var/log/corosync.log:
Mar 07 08:20:04 corosync [MAIN ] Corosync Cluster Engine ('1.2.0'):
started and ready to provide service.
Mar 07 08:20:04 corosync [MAIN ] Corosync built-in features: nss rdma
Mar 07 08:20:04 corosync [MAIN ] Successfully read main configuration
file '/etc/corosync/corosync.conf'.
Mar 07 08:20:04 corosync [TOTEM ] Initializing transport (UDP/IP).
Mar 07 08:20:04 corosync [TOTEM ] Initializing transmit/receive
security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Mar 07 08:20:04 corosync [MAIN ] Compatibility mode set to whitetank.
Using V1 and V2 of the synchronization engine.
Mar 07 08:20:04 corosync [TOTEM ] The network interface [10.1.1.84] is
now up.
Mar 07 08:20:04 corosync [pcmk ] info: process_ais_conf: Reading configure
Mar 07 08:20:04 corosync [pcmk ] info: config_find_init: Local handle:
5650605097994944514 for logging
Mar 07 08:20:04 corosync [pcmk ] info: config_find_next: Processing
additional logging options...
Mar 07 08:20:04 corosync [pcmk ] info: get_config_opt: Found 'off' for
option: debug
Mar 07 08:20:04 corosync [pcmk ] info: get_config_opt: Defaulting to
'off' for option: to_file
Mar 07 08:20:04 corosync [pcmk ] info: get_config_opt: Defaulting to
'daemon' for option: syslog_facility
Mar 07 08:20:04 corosync [pcmk ] info: config_find_init: Local handle:
2730409743423111171 for service
Mar 07 08:20:04 corosync [pcmk ] info: config_find_next: Processing
additional service options...
Mar 07 08:20:04 corosync [pcmk ] info: get_config_opt: Defaulting to
'pcmk' for option: clustername
Mar 07 08:20:04 corosync [pcmk ] info: get_config_opt: Defaulting to
'no' for option: use_logd
Mar 07 08:20:04 corosync [pcmk ] info: get_config_opt: Defaulting to
'no' for option: use_mgmtd
Mar 07 08:20:04 corosync [pcmk ] info: pcmk_startup: CRM: Initialized
Mar 07 08:20:04 corosync [pcmk ] Logging: Initialized pcmk_startup
Mar 07 08:20:04 corosync [pcmk ] info: pcmk_startup: Maximum core file
size is: 18446744073709551615
Mar 07 08:20:04 corosync [pcmk ] ERROR: pcmk_startup: Child 12947
spawned to record non-fatal assertion failure line 544: pwentry != NULL
Mar 07 08:20:04 corosync [pcmk ] ERROR: pcmk_startup: Cluster user
hacluster does not exist
Mar 07 08:20:04 corosync [SERV ] Service engine loaded: Pacemaker
Cluster Manager 1.0.7
Mar 07 08:20:04 corosync [SERV ] Service engine loaded: corosync
extended virtual synchrony service
Mar 07 08:20:04 corosync [SERV ] Service engine loaded: corosync
configuration service
Mar 07 08:20:04 corosync [SERV ] Service engine loaded: corosync
cluster closed process group service v1.01
Mar 07 08:20:04 corosync [SERV ] Service engine loaded: corosync
cluster config database access v1.01
Mar 07 08:20:04 corosync [SERV ] Service engine loaded: corosync
profile loading service
Mar 07 08:20:04 corosync [SERV ] Service engine loaded: corosync
cluster quorum service v0.1
Mar 07 08:20:04 corosync [pcmk ] notice: pcmk_peer_update: Transitional
membership event on ring 44: memb=0, new=0, lost=0
Mar 07 08:20:04 corosync [pcmk ] notice: pcmk_peer_update: Stable
membership event on ring 44: memb=1, new=1, lost=0
Mar 07 08:20:04 corosync [pcmk ] info: update_member: Creating entry
for node 1409351946 born on 44
Mar 07 08:20:04 corosync [pcmk ] info: update_member: Node
1409351946/unknown is now: member
Mar 07 08:20:04 corosync [pcmk ] info: pcmk_peer_update: NEW: .pending.
1409351946
Mar 07 08:20:05 corosync [pcmk ] info: pcmk_peer_update: MEMB:
.pending. 1409351946
Mar 07 08:20:05 corosync [pcmk ] info: pcmk_update_nodeid: Local node
id: 1409351946
Mar 07 08:20:05 corosync [pcmk ] info: update_member: Node (null) now
has 1 quorum votes (was 0)
Mar 07 08:20:05 corosync [pcmk ] info: send_member_notification:
Sending membership update 44 to 0 children
Mar 07 08:20:05 corosync [pcmk ] info: update_member: Node (null) now
has process list: 00000000000000000000000000000002 (2)
Mar 07 08:20:05 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Mar 07 08:20:05 corosync [pcmk ] info: update_member: 0xec71ac0 Node
1409351946 now known as (was: (null))
Mar 07 08:20:05 corosync [pcmk ] info: send_member_notification:
Sending membership update 44 to 0 children
Mar 07 08:20:05 corosync [MAIN ] Completed service synchronization,
ready to provide service.
Mar 07 08:22:59 corosync [SERV ] Unloading all Corosync service engines.
Mar 07 08:22:59 corosync [pcmk ] notice: pcmk_shutdown: Shuting down
Pacemaker
Mar 07 08:22:59 corosync [pcmk ] notice: pcmk_shutdown: crmd confirmed
stopped
Mar 07 08:22:59 corosync [pcmk ] notice: pcmk_shutdown: pengine
confirmed stopped
Mar 07 08:22:59 corosync [pcmk ] notice: pcmk_shutdown: attrd confirmed
stopped
Mar 07 08:22:59 corosync [pcmk ] notice: pcmk_shutdown: lrmd confirmed
stopped
Mar 07 08:22:59 corosync [pcmk ] notice: pcmk_shutdown: cib confirmed
stopped
Mar 07 08:22:59 corosync [pcmk ] notice: pcmk_shutdown: stonithd
confirmed stopped
Mar 07 08:22:59 corosync [pcmk ] notice: pcmk_shutdown: Shutdown complete
Mar 07 08:22:59 corosync [SERV ] Service engine unloaded: Pacemaker
Cluster Manager 1.0.7
Mar 07 08:22:59 corosync [SERV ] Service engine unloaded: corosync
extended virtual synchrony service
Mar 07 08:22:59 corosync [SERV ] Service engine unloaded: corosync
configuration service
Mar 07 08:22:59 corosync [SERV ] Service engine unloaded: corosync
cluster closed process group service v1.01
Mar 07 08:22:59 corosync [SERV ] Service engine unloaded: corosync
cluster config database access v1.01
Mar 07 08:22:59 corosync [SERV ] Service engine unloaded: corosync
profile loading service
Mar 07 08:22:59 corosync [SERV ] Service engine unloaded: corosync
cluster quorum service v0.1
Mar 07 08:22:59 corosync [MAIN ] Corosync Cluster Engine exiting with
status -1 at main.c:158.
Any hints welcome!!
TIA,
erich
More information about the Pacemaker
mailing list