[Pacemaker] Pacemaker won't start after node was fenced

Andrew Beekhof andrew at beekhof.net
Mon Feb 23 19:42:55 EST 2015


> On 27 Jan 2015, at 5:23 pm, Jake Smith <jsmith at argotec.com> wrote:
> 
> Had a failover of my active/passive cluster and now the passive node will not rejoin the cluster.
>  
> 2 nodes running Ubuntu 12.04
> coro 1.4.2-2, openais 1.1.4-4, pcmk 1.1.6-2ubuntu3
>  
> Corosync ring membership is fine on both rings.
>  
> Tried stopping coro/pace and clearing /var/lib/heartbeat/crm/ and then restarting on passive node without success.
> Tried rebooting passive node (again – it was successfully fenced)
> Tried updating pacemaker to latest in distro (1.1.6-2ubuntu3.3) then went back on passive node
> Tried putting active node in maintenance mode and stopping pacemaker and corosync on both nodes.  Then restarting on both nodes.  Corosync came back fine as before but now I have the same problem on both nodes with pacemaker not starting successfully.  Both show exactly same now - attrd: [24883]: ERROR: main: HA Signon failed.
>  
> Log:
> Jan 27 01:09:59 Condor crmd: [24885]: info: crmd_init: Starting crmd
> Jan 27 01:09:59 Condor cib: [24881]: info: validate_with_relaxng: Creating RNG parser context
> Jan 27 01:09:59 Condor lrmd: [24882]: info: enabling coredumps
> Jan 27 01:09:59 Condor lrmd: [24882]: info: Started.
> Jan 27 01:09:59 Condor corosync[24778]:   [IPC   ] Invalid IPC credentials.

This seems to be the root of the errors.
Pacemaker looks a little old, could you consider updating?

> Jan 27 01:09:59 Condor attrd: [24883]: ERROR: main: HA Signon failed
> Jan 27 01:09:59 Condor attrd: [24883]: ERROR: main: Aborting startup
> Jan 27 01:09:59 Condor pacemakerd: [24877]: ERROR: pcmk_child_exit: Child process attrd exited (pid=24883, rc=100)
> Jan 27 01:09:59 Condor pacemakerd: [24877]: notice: pcmk_child_exit: Child process attrd no longer wishes to be respawned
> Jan 27 01:09:59 Condor pacemakerd: [24877]: info: update_node_processes: Node Condor now has process list: 00000000000000000000000000110312 (was 00000000000000000000000000111312)
> Jan 27 01:09:59 Condor stonith-ng: [24880]: info: init_ais_connection_classic: AIS connection established
> Jan 27 01:09:59 Condor stonith-ng: [24880]: info: get_ais_nodeid: Server details: id=167837962 uname=Condor cname=pcmk
> Jan 27 01:09:59 Condor stonith-ng: [24880]: info: init_ais_connection_once: Connection to 'classic openais (with plugin)': established
> Jan 27 01:09:59 Condor stonith-ng: [24880]: info: crm_new_peer: Node Condor now has id: 167837962
> Jan 27 01:09:59 Condor stonith-ng: [24880]: info: crm_new_peer: Node 167837962 is now known as Condor
> Jan 27 01:09:59 Condor stonith-ng: [24880]: info: main: Starting stonith-ng mainloop
> Jan 27 01:09:59 Condor stonith-ng: [24880]: info: crm_update_peer: Node Condor: id=167837962 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000110312 (new)
> Jan 27 01:09:59 Condor cib: [24881]: info: startCib: CIB Initialization completed successfully
> Jan 27 01:09:59 Condor cib: [24881]: info: get_cluster_type: Cluster type is: 'openais'
> Jan 27 01:09:59 Condor cib: [24881]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
> Jan 27 01:09:59 Condor cib: [24881]: info: init_ais_connection_classic: Creating connection to our Corosync plugin
> Jan 27 01:09:59 Condor corosync[24778]:   [IPC   ] Invalid IPC credentials.
> Jan 27 01:09:59 Condor cib: [24881]: info: init_ais_connection_classic: Connection to our AIS plugin (9) failed: unknown (100)
> Jan 27 01:09:59 Condor cib: [24881]: CRIT: cib_init: Cannot sign in to the cluster... terminating
> Jan 27 01:09:59 Condor pacemakerd: [24877]: ERROR: pcmk_child_exit: Child process cib exited (pid=24881, rc=100)
> Jan 27 01:09:59 Condor pacemakerd: [24877]: notice: pcmk_child_exit: Child process cib no longer wishes to be respawned
> Jan 27 01:09:59 Condor pacemakerd: [24877]: info: update_node_processes: Node Condor now has process list: 00000000000000000000000000110212 (was 00000000000000000000000000110312)
> Jan 27 01:09:59 Condor stonith-ng: [24880]: info: crm_update_peer: Node Condor: id=167837962 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000110212 (new)
> Jan 27 01:10:00 Condor crmd: [24885]: info: do_cib_control: Could not connect to the CIB service: connection failed
> Jan 27 01:10:00 Condor crmd: [24885]: WARN: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry
> Jan 27 01:10:00 Condor crmd: [24885]: info: crmd_init: Starting crmd's mainloop
> Jan 27 01:10:01 Condor CRON[24888]: (root) CMD (/etc/init.d/watchdog -e >/dev/null 2>&1)
> Jan 27 01:10:02 Condor crmd: [24885]: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms)
> Jan 27 01:10:03 Condor crmd: [24885]: info: do_cib_control: Could not connect to the CIB service: connection failed
> Jan 27 01:10:03 Condor crmd: [24885]: WARN: do_cib_control: Couldn't complete CIB registration 2 times... pause and retry
> Jan 27 01:10:05 Condor crmd: [24885]: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms)
> Jan 27 01:10:06 Condor crmd: [24885]: info: do_cib_control: Could not connect to the CIB service: connection failed
> Jan 27 01:10:06 Condor crmd: [24885]: WARN: do_cib_control: Couldn't complete CIB registration 3 times... pause and retry
> Jan 27 01:10:08 Condor crmd: [24885]: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms)
> Jan 27 01:10:09 Condor crmd: [24885]: info: do_cib_control: Could not connect to the CIB service: connection failed
> Jan 27 01:10:09 Condor crmd: [24885]: WARN: do_cib_control: Couldn't complete CIB registration 4 times... pause and retry
> Jan 27 01:10:11 Condor crmd: [24885]: info: crm_timer_popped: Wait Timer (I_NULL) just popped (2000ms)
> Jan 27 01:10:12 Condor crmd: [24885]: info: do_cib_control: Could not connect to the CIB service: connection failed
> Jan 27 01:10:12 Condor crmd: [24885]: WARN: do_cib_control: Couldn't complete CIB registration 5 times... pause and retry
>  
> Jacob A. Smith
> IT Manager
> Argotec, LLC
> 
>  
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Pacemaker mailing list