[Pacemaker] crm_cluster_connect: Triggered fatal assert at cluster.c:65 : hb_conn != NULL

Andrew Beekhof andrew at beekhof.net
Mon Jul 18 23:58:41 UTC 2011


On Tue, Jul 19, 2011 at 1:17 AM, Nikita Michalko
<michalko.system at a-i-p.com> wrote:
> Hi all!
>
> I have succesfully configured and running 2-nodes-cluster. By testing
> different scenaries became I that error.
> Situation:
> 1st node was running, the 2nd was rebooted and heartbeat started only on the
> 1st node - it was OK, all resources were running on the 1st node.
> Then I removed on the 2nd node all files  in /var/lib/heartbeat/crm/ and in
> /var/lib//pengine/.
> After starting the heartbeat/PM on the 2nd node, I'm facing to the following
> errors:
> --- SNIP ---
> Jul 18 15:54:25 pollux cib: [16884]: info: retrieveCib: Reading cluster
> configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
> /var/lib/heartbeat/crm/cib.xml.sig)
> Jul 18 15:54:25 pollux cib: [16884]: WARN: validate_cib_digest: No on-disk
> digest present
> Jul 18 15:54:25 pollux cib: [16884]: info: validate_with_relaxng: Creating RNG
> parser context
> Jul 18 15:54:25 pollux cib: [16884]: info: startCib: CIB Initialization
> completed successfully
> Jul 18 15:54:25 pollux cib: [16884]: info: crm_cluster_connect: Connecting to
> cluster infrastructure: heartbeat
> Jul 18 15:54:25 pollux cib: [16884]: ERROR: crm_abort: crm_cluster_connect:
> Triggered fatal assert at cluster.c:65 : hb_conn != NULL
> Jul 18 15:54:25 pollux heartbeat: [16824]: WARN: Managed
> /usr/lib64/heartbeat/cib process 16884 killed by signal 6 [SIGABRT - Abort].
> Jul 18 15:54:25 pollux heartbeat: [16824]: ERROR: Managed
> /usr/lib64/heartbeat/cib process 16884 dumped core
> Jul 18 15:54:25 pollux heartbeat: [16824]: ERROR: Client
> /usr/lib64/heartbeat/cib "respawning too fast"
> Jul 18 15:54:26 pollux crmd: [16850]: info: crm_timer_popped: Wait Timer
> (I_NULL) just popped! (2000ms)
> Jul 18 15:54:27 pollux crmd: [16850]: info: do_cib_control: Could not connect
> to the CIB service: connection failed
> Jul 18 15:54:27 pollux crmd: [16850]: WARN: do_cib_control: Couldn't complete
> CIB registration 5 times... pause and retry
> Jul 18 15:54:29 pollux crmd: [16850]: info: crm_timer_popped: Wait Timer
> (I_NULL) just popped! (2000ms)
> ...
> crm_verify -V -x /var/lib/heartbeat/crm/cib.xml - > OK!
> After stopping the PM/HA on the 1st node and removing all relevant PM/HA
> files, it is the same on the 1st node. Making new configuration with crm
> configure shows errors:
> Signon to CIB failed: connection failed
> Init failed, could not perform requested operations
> ERROR: cannot parse xml: no element found: line 1, column 0
>
> Versions:
>
> pacemaker :     1.1.5 (Build: c86cb93c5a57c1f507a21be69d24fd28dee85397)

Mercurial has no record of this changeset.
Where did you get the packages from?

> cluster-glue :     1.0.7 (Build: 6fa74ce2ed7ef6df41be2b634cd4aa89c318a8dc)
> resource-agents: 1.0.4 (Build: 7a11934b142d1daf42a04fbaa0391a3ac47cee4c)
> heartbeat:        3.0.5
>
> What do I wrong?
> Configuration attached...
>
>
> TIA!
> Nikita Michalko
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>




More information about the Pacemaker mailing list