[Pacemaker] crm_cluster_connect: Triggered fatal assert at cluster.c:65 : hb_conn != NULL
Nikita Michalko
michalko.system at a-i-p.com
Mon Jul 18 16:17:02 CET 2011
Hi all!
I have succesfully configured and running 2-nodes-cluster. By testing
different scenaries became I that error.
Situation:
1st node was running, the 2nd was rebooted and heartbeat started only on the
1st node - it was OK, all resources were running on the 1st node.
Then I removed on the 2nd node all files in /var/lib/heartbeat/crm/ and in
/var/lib//pengine/.
After starting the heartbeat/PM on the 2nd node, I'm facing to the following
errors:
--- SNIP ---
Jul 18 15:54:25 pollux cib: [16884]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
/var/lib/heartbeat/crm/cib.xml.sig)
Jul 18 15:54:25 pollux cib: [16884]: WARN: validate_cib_digest: No on-disk
digest present
Jul 18 15:54:25 pollux cib: [16884]: info: validate_with_relaxng: Creating RNG
parser context
Jul 18 15:54:25 pollux cib: [16884]: info: startCib: CIB Initialization
completed successfully
Jul 18 15:54:25 pollux cib: [16884]: info: crm_cluster_connect: Connecting to
cluster infrastructure: heartbeat
Jul 18 15:54:25 pollux cib: [16884]: ERROR: crm_abort: crm_cluster_connect:
Triggered fatal assert at cluster.c:65 : hb_conn != NULL
Jul 18 15:54:25 pollux heartbeat: [16824]: WARN: Managed
/usr/lib64/heartbeat/cib process 16884 killed by signal 6 [SIGABRT - Abort].
Jul 18 15:54:25 pollux heartbeat: [16824]: ERROR: Managed
/usr/lib64/heartbeat/cib process 16884 dumped core
Jul 18 15:54:25 pollux heartbeat: [16824]: ERROR: Client
/usr/lib64/heartbeat/cib "respawning too fast"
Jul 18 15:54:26 pollux crmd: [16850]: info: crm_timer_popped: Wait Timer
(I_NULL) just popped! (2000ms)
Jul 18 15:54:27 pollux crmd: [16850]: info: do_cib_control: Could not connect
to the CIB service: connection failed
Jul 18 15:54:27 pollux crmd: [16850]: WARN: do_cib_control: Couldn't complete
CIB registration 5 times... pause and retry
Jul 18 15:54:29 pollux crmd: [16850]: info: crm_timer_popped: Wait Timer
(I_NULL) just popped! (2000ms)
...
crm_verify -V -x /var/lib/heartbeat/crm/cib.xml - > OK!
After stopping the PM/HA on the 1st node and removing all relevant PM/HA
files, it is the same on the 1st node. Making new configuration with crm
configure shows errors:
Signon to CIB failed: connection failed
Init failed, could not perform requested operations
ERROR: cannot parse xml: no element found: line 1, column 0
Versions:
pacemaker : 1.1.5 (Build: c86cb93c5a57c1f507a21be69d24fd28dee85397)
cluster-glue : 1.0.7 (Build: 6fa74ce2ed7ef6df41be2b634cd4aa89c318a8dc)
resource-agents: 1.0.4 (Build: 7a11934b142d1daf42a04fbaa0391a3ac47cee4c)
heartbeat: 3.0.5
What do I wrong?
Configuration attached...
TIA!
Nikita Michalko
-------------- next part --------------
A non-text attachment was scrubbed...
Name: NM_cib.xml
Type: application/xml
Size: 13010 bytes
Desc: not available
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20110718/63c2d62d/attachment.wsdl>
More information about the Pacemaker
mailing list