[Pacemaker] stonithd dumps core since 1.0.0
Roderick van Domburg
r.s.a.vandomburg at nedforce.nl
Tue Oct 14 13:15:21 UTC 2008
Hello everyone,
We have been running cman+gfs2 and heartbeat+pacemaker simultaneously
on our systems. This worked great until we updated to heartbeat-2.99.2
and pacemaker-1.0.0 yesterday, which crashes while calling
is_openais_cluster(). Previously we ran heartbeat-2.99.1 and
pacemaker-0.7.3 successfully.
I'll post this to the linux-ha list too.
/var/log/messages:
Oct 14 14:49:55 node1 logd: [1455]: info: logd started with default
configuration.
Oct 14 14:49:55 node1 logd: [1463]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Oct 14 14:49:55 node1 logd: [1455]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Oct 14 14:49:55 node1 heartbeat: [1479]: info: Enabling logging daemon
Oct 14 14:49:55 node1 heartbeat: [1479]: info: logfile and debug file
are those specified in logd config file (default /etc/logd.cf)
Oct 14 14:49:55 node1 heartbeat: [1479]: info: ******************
Oct 14 14:49:55 node1 heartbeat: [1479]: info: Configuration
validated. Starting heartbeat 2.99.2
Oct 14 14:49:55 node1 heartbeat: [1480]: info: heartbeat: version 2.99.2
Oct 14 14:49:55 node1 heartbeat: [1480]: info: Heartbeat generation:
1219055953
Oct 14 14:49:55 node1 heartbeat: [1480]: info: glib: UDP multicast
heartbeat started for group 239.0.0.45 port 694 interface eth0 (ttl=1
loop=0)
Oct 14 14:49:55 node1 heartbeat: [1480]: info:
G_main_add_TriggerHandler: Added signal manual handler
Oct 14 14:49:55 node1 heartbeat: [1480]: info:
G_main_add_TriggerHandler: Added signal manual handler
Oct 14 14:49:55 node1 heartbeat: [1480]: notice: Using watchdog
device: /dev/watchdog
Oct 14 14:49:55 node1 heartbeat: [1480]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Oct 14 14:49:55 node1 heartbeat: [1480]: info: Local status now set
to: 'up'
Oct 14 14:50:55 node1 heartbeat: [1480]: WARN: node node2: is dead
Oct 14 14:50:55 node1 heartbeat: [1480]: info: Comm_now_up(): updating
status to active
Oct 14 14:50:55 node1 heartbeat: [1480]: info: Local status now set
to: 'active'
Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/
usr/lib64/heartbeat/ccm" (498,496)
Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/
usr/lib64/heartbeat/cib" (498,496)
Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/
usr/lib64/heartbeat/lrmd -r" (0,0)
Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/
usr/lib64/heartbeat/stonithd" (0,0)
Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/
usr/lib64/heartbeat/attrd" (498,496)
Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/
usr/lib64/heartbeat/crmd" (498,496)
Oct 14 14:50:55 node1 heartbeat: [1489]: info: Starting "/usr/lib64/
heartbeat/ccm" as uid 498 gid 496 (pid 1489)
Oct 14 14:50:55 node1 heartbeat: [1492]: info: Starting "/usr/lib64/
heartbeat/stonithd" as uid 0 gid 0 (pid 1492)
Oct 14 14:50:55 node1 heartbeat: [1491]: info: Starting "/usr/lib64/
heartbeat/lrmd -r" as uid 0 gid 0 (pid 1491)
Oct 14 14:50:55 node1 heartbeat: [1493]: info: Starting "/usr/lib64/
heartbeat/attrd" as uid 498 gid 496 (pid 1493)
Oct 14 14:50:55 node1 heartbeat: [1490]: info: Starting "/usr/lib64/
heartbeat/cib" as uid 498 gid 496 (pid 1490)
Oct 14 14:50:55 node1 heartbeat: [1494]: info: Starting "/usr/lib64/
heartbeat/crmd" as uid 498 gid 496 (pid 1494)
Oct 14 14:50:55 node1 lrmd: [1491]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Oct 14 14:50:55 node1 stonithd: [1492]: info:
G_main_add_SignalHandler: Added signal handler for signal 10
Oct 14 14:50:55 node1 stonithd: [1492]: info:
G_main_add_SignalHandler: Added signal handler for signal 12
Oct 14 14:50:55 node1 cib: [1490]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Oct 14 14:50:55 node1 cib: [1490]: info: G_main_add_TriggerHandler:
Added signal manual handler
Oct 14 14:50:55 node1 cib: [1490]: info: G_main_add_SignalHandler:
Added signal handler for signal 17
Oct 14 14:50:55 node1 attrd: [1493]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Oct 14 14:50:55 node1 attrd: [1493]: info: main: Starting up....
Oct 14 14:50:55 node1 attrd: [1493]: ERROR: main: HA Signon failed
Oct 14 14:50:55 node1 attrd: [1493]: ERROR: main: Aborting startup
Oct 14 14:50:55 node1 heartbeat: [1480]: WARN: Managed /usr/lib64/
heartbeat/attrd process 1493 exited with return code 100.
Oct 14 14:50:55 node1 ccm: [1489]: info: Hostname: node1
Oct 14 14:50:55 node1 crmd: [1494]: info: main: CRM Hg Version: node:
9a6c6d1dd87154b11fdf9ff7fadf5fd33500bca4
Oct 14 14:50:55 node1 crmd: [1494]: info: crmd_init: Starting crmd
Oct 14 14:50:55 node1 crmd: [1494]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Oct 14 14:50:55 node1 crmd: [1494]: info: G_main_add_TriggerHandler:
Added signal manual handler
Oct 14 14:50:55 node1 crmd: [1494]: info: G_main_add_SignalHandler:
Added signal handler for signal 17
Oct 14 14:50:55 node1 stonithd: [1492]: ERROR: crm_abort:
is_heartbeat_cluster: Triggered fatal assert at utils.c:1626 :
is_openais_cluster()
Oct 14 14:50:55 node1 cib: [1490]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/
heartbeat/crm/cib.xml.sig)
Oct 14 14:50:55 node1 lrmd: [1491]: info: G_main_add_SignalHandler:
Added signal handler for signal 17
Oct 14 14:50:55 node1 lrmd: [1491]: info: G_main_add_SignalHandler:
Added signal handler for signal 10
Oct 14 14:50:55 node1 lrmd: [1491]: info: G_main_add_SignalHandler:
Added signal handler for signal 12
Oct 14 14:50:55 node1 lrmd: [1491]: info: Started.
Oct 14 14:50:55 node1 heartbeat: [1480]: WARN: Managed /usr/lib64/
heartbeat/stonithd process 1492 killed by signal 6 [SIGABRT - Abort].
Oct 14 14:50:55 node1 heartbeat: [1480]: ERROR: Managed /usr/lib64/
heartbeat/stonithd process 1492 dumped core
Oct 14 14:50:55 node1 heartbeat: [1480]: ERROR: Respawning client "/
usr/lib64/heartbeat/stonithd":
Oct 14 14:50:55 node1 heartbeat: [1480]: info: Starting child client "/
usr/lib64/heartbeat/stonithd" (0,0)
Oct 14 14:50:56 node1 cib: [1490]: info: startCib: CIB Initialization
completed successfully
Oct 14 14:50:56 node1 cib: [1490]: CRIT: cib_init: Cannot sign in to
the cluster... terminating
Oct 14 14:50:56 node1 heartbeat: [1480]: WARN: Managed /usr/lib64/
heartbeat/cib process 1490 exited with return code 100.
Oct 14 14:50:56 node1 heartbeat: [1480]: EMERG: Rebooting system.
Reason: /usr/lib64/heartbeat/cib
Oct 14 14:50:56 node1 crmd: [1494]: WARN: do_cib_control: Couldn't
complete CIB registration 1 times... pause and retry
Oct 14 14:50:56 node1 crmd: [1494]: info: crmd_init: Starting crmd's
mainloop
Oct 14 14:50:56 node1 heartbeat: [1495]: info: Starting "/usr/lib64/
heartbeat/stonithd" as uid 0 gid 0 (pid 1495)
Oct 14 14:50:56 node1 stonithd: [1495]: info:
G_main_add_SignalHandler: Added signal handler for signal 10
Oct 14 14:50:56 node1 stonithd: [1495]: info:
G_main_add_SignalHandler: Added signal handler for signal 12
Oct 14 14:50:56 node1 stonithd: [1495]: ERROR: crm_abort:
is_heartbeat_cluster: Triggered fatal assert at utils.c:1626 :
is_openais_cluster()
Oct 14 14:50:57 node1 kernel: md: stopping all md devices.
Oct 14 14:51:17 node1 syslogd 1.4.1: restart.
This occurs no matter whether cman and openais are running or not.
I have attached the coredump.
Version information:
- CentOS 5.2 x86_64 (2.6.18-92.1.13.el5xen)
- heartbeat-common.x86_64 2.99.2-21.1
- heartbeat-resources.x86_64 2.99.2-21.1
- heartbeat.x86_64 2.99.2-21.1
- libheartbeat2.x86_64 2.99.2-21.1
- pacemaker.x86_64 1.0.0-1.6
- libpacemaker3.x86_64 1.0.0-1.6
- openais.x86_64 0.80.3-19.1
- cman.x86_64 2.0.84-2.el5_2.1
ha.cf:
autojoin none
mcast eth0 239.0.0.45 694 1 0
warntime 15
deadtime 60
initdead 60
keepalive 3
node node1
node node2
crm on
watchdog /dev/watchdog
use_logd on
openais.conf:
totem {
version: 2
secauth: on
threads: 1
heartbeat_failures_allowed: 3
interface {
ringnumber: 0
bindnetaddr: 10.0.3.1
mcastaddr: 239.0.0.45
mcastport: 5405
}
}
logging {
debug: off
timestamp: on
}
amf {
mode: disabled
}
I have tried switching either to another IP, but to no avail.
Any insights into this behavior?
Kind regards,
Roderick
-------------- next part --------------
A non-text attachment was scrubbed...
Name: core.1492
Type: application/octet-stream
Size: 724992 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20081014/17418b93/attachment-0001.obj>
-------------- next part --------------
More information about the Pacemaker
mailing list