[Pacemaker] lenny + clvm + pacemaker/openais...
Alain St-Denis
alain.st-denis at ec.gc.ca
Thu May 28 17:19:34 UTC 2009
Andrew Beekhof wrote:
> You might want to check out Martin's packages.
> If I understood correctly, he's built the version of clvm used by SUSE
> (which we know works) against 0.80.5
>
> Look for his email with the subject "lvm2-clvm RPMs in opensuse.org
> package repo?"
Thanks!
I installed Martin's packages. Here's what I have:
pacemaker-openais 1.0.3+svn20090522-2~bpo50+1
clvm-openais 2.02.44-4~bpo50+1
libopenais-legacy-2 0.80.5+svn20090522-2~bpo50+1
openais-legacy 0.80.5+svn20090522-2~bpo50+1
heartbeat-common 2.99.2+sles11r9-3~bpo50+1
libheartbeat2 2.99.2+sles11r9-3~bpo50+1
Now, soon after I start clvmd, aisexec dies with a segv (in
openais_conn_private_data_get). On my 3 nodes test cluster, I start openais
on all nodes, then I start clvmd on one of the nodes. Not long after, aisexec
dies on the other nodes. Here are the last messages logged by aisexec:
May 28 16:19:04.924914 [TOTEM] entering GATHER state from 11.
May 28 16:19:05.079052 [TOTEM] Saving state aru 20 high seq received 20
May 28 16:19:05.079094 [TOTEM] Storing new sequence id for ring 298
May 28 16:19:05.079155 [TOTEM] entering COMMIT state.
May 28 16:19:05.079500 [TOTEM] entering RECOVERY state.
May 28 16:19:05.079558 [TOTEM] position [0] member 142.135.16.107:
May 28 16:19:05.079571 [TOTEM] previous ring seq 660 rep 142.135.16.107
May 28 16:19:05.079578 [TOTEM] aru a high delivered a received flag 1
May 28 16:19:05.079587 [TOTEM] position [1] member 142.135.16.109:
May 28 16:19:05.079594 [TOTEM] previous ring seq 660 rep 142.135.16.109
May 28 16:19:05.079612 [TOTEM] aru 20 high delivered 20 received flag 1
May 28 16:19:05.079627 [TOTEM] Did not need to originate any messages in
recovery.
May 28 16:19:05.080669 [CLM ] CLM CONFIGURATION CHANGE
May 28 16:19:05.080711 [CLM ] New Configuration:
May 28 16:19:05.080724 [CLM ] r(0) ip(142.135.16.109)
May 28 16:19:05.080733 [CLM ] Members Left:
May 28 16:19:05.080774 [CLM ] Members Joined:
May 28 16:19:05.080790 [crm ] notice: pcmk_peer_update: Transitional
membership event on ring 664: memb=1, new=0, lost=0
May 28 16:19:05.080805 [crm ] info: pcmk_peer_update: memb: lab09 1829799822
May 28 16:19:05.080843 [CLM ] CLM CONFIGURATION CHANGE
May 28 16:19:05.080855 [CLM ] New Configuration:
May 28 16:19:05.080865 [CLM ] r(0) ip(142.135.16.107)
May 28 16:19:05.080901 [CLM ] r(0) ip(142.135.16.109)
May 28 16:19:05.080914 [CLM ] Members Left:
May 28 16:19:05.080923 [CLM ] Members Joined:
May 28 16:19:05.080938 [CLM ] r(0) ip(142.135.16.107)
May 28 16:19:05.080972 [crm ] notice: pcmk_peer_update: Stable membership
event on ring 664: memb=2, new=1, lost=0
May 28 16:19:05.080985 [MAIN ] info: update_member: Node 1796245390/lab07 is
now: member
May 28 16:19:05.081001 [crm ] info: pcmk_peer_update: NEW: lab07 1796245390
May 28 16:19:05.081036 [crm ] info: pcmk_peer_update: MEMB: lab07 1796245390
May 28 16:19:05.081044 [crm ] info: pcmk_peer_update: MEMB: lab09 1829799822
May 28 16:19:05.081063 [crm ] info: send_member_notification: Sending
membership update 664 to 2 children
May 28 16:19:05.081118 [SYNC ] This node is within the primary component and
will provide service.
May 28 16:19:05.081144 [TOTEM] entering OPERATIONAL state.
May 28 16:19:05.082382 [MAIN ] info: update_member: 0x7f1188002510 Node
1796245390 (lab07) born on: 664
May 28 16:19:05.082416 [crm ] info: send_member_notification: Sending
membership update 664 to 2 children
May 28 16:19:05.082757 [CLM ] got nodejoin message 142.135.16.107
May 28 16:19:05.082832 [CLM ] got nodejoin message 142.135.16.109
May 28 16:19:05.087292 [CPG ] got joinlist message from node 1829799822
Then it crashes. Martin (or anybody), have you seen this? I attached my
openais.conf file. Maybe I'm doing something stupid in there?
Alain
--
Alain St-Denis
Supercomputing, Systems and Storage / Superinformatique, systèmes et stockage,
High Performance Computing Support / Soutien aux calculs en haute performance
Chief Information Officer Branch / Direction Générale du dirigeant principal
de l'information
Environment Canada / Environnement Canada
Tel: +1 514 421 4697
-------------- next part --------------
# Please read the openais.conf.5 manual page
aisexec {
# Run as root - this is necessary to be able to manage resources with Pacemaker
user: root
group: root
}
service {
# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 0
}
totem {
version: 2
# How long before declaring a token lost (ms)
token: 10000
# How many token retransmits before forming a new configuration
token_retransmits_before_loss_const: 20
# How long to wait for join messages in the membership protocol (ms)
join: 60
# How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
consensus: 4800
# Turn off the virtual synchrony filter
vsftype: none
# Number of messages that may be sent by one processor on receipt of the token
max_messages: 20
# Limit generated nodeids to 31-bits (positive signed integers)
clear_node_high_bit: yes
# Disable encryption
secauth: off
# How many threads to use for encryption/decryption
threads: 0
# Optionally assign a fixed node id (integer)
# nodeid: 1234
interface {
ringnumber: 0
# The following values need to be set based on your environment
bindnetaddr: 142.135.16.0
mcastaddr: 226.94.1.1
mcastport: 5405
}
}
logging {
debug: on
fileline: off
to_syslog: yes
to_stderr: yes
syslog_facility: daemon
timestamp: on
}
amf {
mode: disabled
}
More information about the Pacemaker
mailing list