[Pacemaker] heartbeat stop hangs sometimes

Markus M. adrock0905 at alice.de
Mon Feb 22 12:00:29 UTC 2010


Hello,

sometimes "heartbeat stop" seems to hang (latest packets from 
clusterlabs.org, RHEL5 x86_64, 2-node cluster with only one node running).

The last lines from ha-debug are like this:

Feb 22 12:52:48 dbprod21 ccm: [24053]: info: client (pid=24058) removed 
from ccm
Feb 22 12:52:48 dbprod21 crmd: [24058]: info: do_ha_control: 
Disconnected from Heartbeat
Feb 22 12:52:48 dbprod21 crmd: [24058]: info: do_cib_control: 
Disconnecting CIB
Feb 22 12:52:48 dbprod21 cib: [24054]: info: cib_process_readwrite: We 
are now in R/O mode
Feb 22 12:52:48 dbprod21 crmd: [24058]: info: 
crmd_cib_connection_destroy: Connection to the CIB terminated...
Feb 22 12:52:48 dbprod21 cib: [24054]: WARN: send_ipc_message: IPC 
Channel to 24058 is not connected
Feb 22 12:52:48 dbprod21 crmd: [24058]: info: do_exit: Performing 
A_EXIT_0 - gracefully exiting the CRMd
Feb 22 12:52:48 dbprod21 cib: [24054]: WARN: send_via_callback_channel: 
Delivery of reply to client 24058/d9c9c281-4f38-46d8-b83e-54135f6c75e9 
failed
Feb 22 12:52:48 dbprod21 crmd: [24058]: info: free_mem: Dropping 
I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ]
Feb 22 12:52:48 dbprod21 cib: [24054]: WARN: do_local_notify: A-Sync 
reply to crmd failed: reply failed
Feb 22 12:52:48 dbprod21 crmd: [24058]: info: do_exit: [crmd] stopped (0)
Feb 22 12:52:48 dbprod21 heartbeat: [24040]: info: killing 
/usr/lib64/heartbeat/attrd process group 24057 with signal 15

# ps -efw | grep heart

root     24040     1  0 12:49 ?        00:00:00 heartbeat: master 
control process
root     24043 24040  0 12:49 ?        00:00:00 heartbeat: FIFO reader
root     24044 24040  0 12:49 ?        00:00:00 heartbeat: write: ucast eth0
root     24045 24040  0 12:49 ?        00:00:00 heartbeat: read: ucast eth0
root     24046 24040  0 12:49 ?        00:00:00 heartbeat: write: ucast eth0
root     24047 24040  0 12:49 ?        00:00:00 heartbeat: read: ucast eth0
root     24048 24040  0 12:49 ?        00:00:00 heartbeat: write: serial 
/dev/ttyS0
root     24049 24040  0 12:49 ?        00:00:00 heartbeat: read: serial 
/dev/ttyS0
101      24053 24040  0 12:50 ?        00:00:00 /usr/lib64/heartbeat/ccm
101      24054 24040  0 12:50 ?        00:00:00 /usr/lib64/heartbeat/cib
root     24055 24040  0 12:50 ?        00:00:00 /usr/lib64/heartbeat/lrmd -r
root     24056 24040  0 12:50 ?        00:00:00 
/usr/lib64/heartbeat/stonithd
101      24057 24040  0 12:50 ?        00:00:00 /usr/lib64/heartbeat/attrd
root     24366 22245  0 12:52 pts/2    00:00:00 /bin/sh 
/etc/init.d/heartbeat stop
root     24377 24366  0 12:52 pts/2    00:00:00 heartbeat 


What could be the problem leading to this behaviour? Of course it's 
possible to kill the processes manually but that's not what i really like...

Regards
Markus




More information about the Pacemaker mailing list