[Pacemaker] Failed in restart of Corosync.

Steven Dake sdake at redhat.com
Mon Oct 19 00:24:58 EDT 2009


This bug is reported and we are working on a solution.

Regards
-steve

On Mon, 2009-10-19 at 11:05 +0900, renayama19661014 at ybb.ne.jp wrote:
> Hi,
> 
> I understand that a combination is not official in Corosync and Pacemaker.
> However, I contributed it because I thought that it was important that I reported a problem.
> 
> I started next combination Corosync.(on Redhat5.4(x86))
> 
> * corosync trunk 2530
> * Cluster-Resource-Agents-6d652f7cf9d8
> * Reusable-Cluster-Components-4edc8f99701c
> * Pacemaker-1-0-de2a3778ace7
> 
> I stopped service(corosync) next.
> But, I did KILL of a process because a process of Pacemaker did not stop well.
> 
> ------------------------------------------------------------------------------------
> [root at rh54-1 ~]# service Corosync stop
> Stopping Corosync Cluster Engine (corosync):               [  OK  ]
> Waiting for services to unload:                            [  OK  ]
> [root at rh54-1 ~]# ps -ef |grep coro
> root      5263  4617  0 10:54 pts/0    00:00:00 grep coro
> [root at rh54-1 ~]# ps -ef |grep heartbeat 
> root      4882     1  0 10:52 ?        00:00:00 /usr/lib/heartbeat/stonithd
> 500       4883     1  0 10:52 ?        00:00:00 /usr/lib/heartbeat/cib
> root      4884     1  0 10:52 ?        00:00:00 /usr/lib/heartbeat/lrmd
> 500       4885     1  0 10:52 ?        00:00:00 /usr/lib/heartbeat/attrd
> 500       4886     1  0 10:52 ?        00:00:00 /usr/lib/heartbeat/pengine
> 500       4887     1  0 10:52 ?        00:00:00 /usr/lib/heartbeat/crmd
> root      5278  4617  0 10:54 pts/0    00:00:00 grep heartbeat
> [root at rh54-1 ~]# kill -9 4882 4883 4884 4885 4886 4887
> [root at rh54-1 ~]# ps -ef |grep heartbeat 
> root      5310  4617  0 10:54 pts/0    00:00:00 grep heartbeat
> 
> ------------------------------------------------------------------------------------
> 
> I started Corosync again.
> But, a cib process of Pacemaker seems not to be able to communicate with Corosync.
> 
> 
> ------------------------------------------------------------------------------------
> Oct 19 10:55:29 rh54-1 cib: [5354]: info: startCib: CIB Initialization completed successfully
> Oct 19 10:55:29 rh54-1 cib: [5354]: info: crm_cluster_connect: Connecting to OpenAIS
> Oct 19 10:55:29 rh54-1 cib: [5354]: info: init_ais_connection: Creating connection to our AIS plugin
> Oct 19 10:55:30 rh54-1 mgmtd: [5359]: info: login to cib live: 1, ret:-10
> Oct 19 10:55:30 rh54-1 crmd: [5358]: info: do_cib_control: Could not connect to the CIB service:
> connection failed
> Oct 19 10:55:30 rh54-1 crmd: [5358]: WARN: do_cib_control: Couldn't complete CIB registration 1
> times... pause and retry
> Oct 19 10:55:30 rh54-1 crmd: [5358]: info: crmd_init: Starting crmd's mainloop
> Oct 19 10:55:31 rh54-1 mgmtd: [5359]: info: login to cib live: 2, ret:-10
> Oct 19 10:55:32 rh54-1 mgmtd: [5359]: info: login to cib live: 3, ret:-10
> Oct 19 10:55:32 rh54-1 crmd: [5358]: info: crm_timer_popped: Wait Timer (I_NULL) just popped!
> Oct 19 10:55:33 rh54-1 mgmtd: [5359]: info: login to cib live: 4, ret:-10
> Oct 19 10:55:33 rh54-1 crmd: [5358]: info: do_cib_control: Could not connect to the CIB service:
> connection failed
> Oct 19 10:55:33 rh54-1 crmd: [5358]: WARN: do_cib_control: Couldn't complete CIB registration 2
> times... pause and retry
> 
> ------------------------------------------------------------------------------------
> 
> On this account it does not start definitely even if Pacemaker waits till when.
> 
> As for the problem, Corosync seems to fail in poll(?) somehow or other.
> However, possibly the cause may depend on the failure of the first stop.
> 
> ------------------------------------------------------------------------------------
> [root at rh54-1 ~]# ps -ef |grep coro
> root      5348     1  0 10:55 ?        00:00:00 /usr/sbin/corosync
> root      5400  4617  0 10:56 pts/0    00:00:00 grep coro
> [root at rh54-1 ~]# strace -p 5348
> Process 5348 attached - interrupt to quit
> futex(0x805c8c0, FUTEX_WAIT_PRIVATE, 2, NULL
> ------------------------------------------------------------------------------------
> 
> Is there a method with the avoidance of this phenomenon what it is?
> Can I evade a problem by deleting some file?
> 
> * I hope it so that a combination of Corosync and Pacemaker becomes the practical use early.
> 
> Best Regards,
> Hideo Yamauchi.
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker





More information about the Pacemaker mailing list