[Pacemaker] Failed in restart of Corosync.
Steven Dake
sdake at redhat.com
Mon Oct 19 04:24:58 UTC 2009
This bug is reported and we are working on a solution.
Regards
-steve
On Mon, 2009-10-19 at 11:05 +0900, renayama19661014 at ybb.ne.jp wrote:
> Hi,
>
> I understand that a combination is not official in Corosync and Pacemaker.
> However, I contributed it because I thought that it was important that I reported a problem.
>
> I started next combination Corosync.(on Redhat5.4(x86))
>
> * corosync trunk 2530
> * Cluster-Resource-Agents-6d652f7cf9d8
> * Reusable-Cluster-Components-4edc8f99701c
> * Pacemaker-1-0-de2a3778ace7
>
> I stopped service(corosync) next.
> But, I did KILL of a process because a process of Pacemaker did not stop well.
>
> ------------------------------------------------------------------------------------
> [root at rh54-1 ~]# service Corosync stop
> Stopping Corosync Cluster Engine (corosync): [ OK ]
> Waiting for services to unload: [ OK ]
> [root at rh54-1 ~]# ps -ef |grep coro
> root 5263 4617 0 10:54 pts/0 00:00:00 grep coro
> [root at rh54-1 ~]# ps -ef |grep heartbeat
> root 4882 1 0 10:52 ? 00:00:00 /usr/lib/heartbeat/stonithd
> 500 4883 1 0 10:52 ? 00:00:00 /usr/lib/heartbeat/cib
> root 4884 1 0 10:52 ? 00:00:00 /usr/lib/heartbeat/lrmd
> 500 4885 1 0 10:52 ? 00:00:00 /usr/lib/heartbeat/attrd
> 500 4886 1 0 10:52 ? 00:00:00 /usr/lib/heartbeat/pengine
> 500 4887 1 0 10:52 ? 00:00:00 /usr/lib/heartbeat/crmd
> root 5278 4617 0 10:54 pts/0 00:00:00 grep heartbeat
> [root at rh54-1 ~]# kill -9 4882 4883 4884 4885 4886 4887
> [root at rh54-1 ~]# ps -ef |grep heartbeat
> root 5310 4617 0 10:54 pts/0 00:00:00 grep heartbeat
>
> ------------------------------------------------------------------------------------
>
> I started Corosync again.
> But, a cib process of Pacemaker seems not to be able to communicate with Corosync.
>
>
> ------------------------------------------------------------------------------------
> Oct 19 10:55:29 rh54-1 cib: [5354]: info: startCib: CIB Initialization completed successfully
> Oct 19 10:55:29 rh54-1 cib: [5354]: info: crm_cluster_connect: Connecting to OpenAIS
> Oct 19 10:55:29 rh54-1 cib: [5354]: info: init_ais_connection: Creating connection to our AIS plugin
> Oct 19 10:55:30 rh54-1 mgmtd: [5359]: info: login to cib live: 1, ret:-10
> Oct 19 10:55:30 rh54-1 crmd: [5358]: info: do_cib_control: Could not connect to the CIB service:
> connection failed
> Oct 19 10:55:30 rh54-1 crmd: [5358]: WARN: do_cib_control: Couldn't complete CIB registration 1
> times... pause and retry
> Oct 19 10:55:30 rh54-1 crmd: [5358]: info: crmd_init: Starting crmd's mainloop
> Oct 19 10:55:31 rh54-1 mgmtd: [5359]: info: login to cib live: 2, ret:-10
> Oct 19 10:55:32 rh54-1 mgmtd: [5359]: info: login to cib live: 3, ret:-10
> Oct 19 10:55:32 rh54-1 crmd: [5358]: info: crm_timer_popped: Wait Timer (I_NULL) just popped!
> Oct 19 10:55:33 rh54-1 mgmtd: [5359]: info: login to cib live: 4, ret:-10
> Oct 19 10:55:33 rh54-1 crmd: [5358]: info: do_cib_control: Could not connect to the CIB service:
> connection failed
> Oct 19 10:55:33 rh54-1 crmd: [5358]: WARN: do_cib_control: Couldn't complete CIB registration 2
> times... pause and retry
>
> ------------------------------------------------------------------------------------
>
> On this account it does not start definitely even if Pacemaker waits till when.
>
> As for the problem, Corosync seems to fail in poll(?) somehow or other.
> However, possibly the cause may depend on the failure of the first stop.
>
> ------------------------------------------------------------------------------------
> [root at rh54-1 ~]# ps -ef |grep coro
> root 5348 1 0 10:55 ? 00:00:00 /usr/sbin/corosync
> root 5400 4617 0 10:56 pts/0 00:00:00 grep coro
> [root at rh54-1 ~]# strace -p 5348
> Process 5348 attached - interrupt to quit
> futex(0x805c8c0, FUTEX_WAIT_PRIVATE, 2, NULL
> ------------------------------------------------------------------------------------
>
> Is there a method with the avoidance of this phenomenon what it is?
> Can I evade a problem by deleting some file?
>
> * I hope it so that a combination of Corosync and Pacemaker becomes the practical use early.
>
> Best Regards,
> Hideo Yamauchi.
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
More information about the Pacemaker
mailing list