[Pacemaker] Failed in restart of Corosync.
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Mon Oct 19 05:12:52 UTC 2009
Hi Steven,
All right.
Thank you.
Best Regards,
Hideo Yamauchi.
--- Steven Dake <sdake at redhat.com> wrote:
> This bug is reported and we are working on a solution.
>
> Regards
> -steve
>
> On Mon, 2009-10-19 at 11:05 +0900, renayama19661014 at ybb.ne.jp wrote:
> > Hi,
> >
> > I understand that a combination is not official in Corosync and Pacemaker.
> > However, I contributed it because I thought that it was important that I reported a problem.
> >
> > I started next combination Corosync.(on Redhat5.4(x86))
> >
> > * corosync trunk 2530
> > * Cluster-Resource-Agents-6d652f7cf9d8
> > * Reusable-Cluster-Components-4edc8f99701c
> > * Pacemaker-1-0-de2a3778ace7
> >
> > I stopped service(corosync) next.
> > But, I did KILL of a process because a process of Pacemaker did not stop well.
> >
> > ------------------------------------------------------------------------------------
> > [root at rh54-1 ~]# service Corosync stop
> > Stopping Corosync Cluster Engine (corosync): [ OK ]
> > Waiting for services to unload: [ OK ]
> > [root at rh54-1 ~]# ps -ef |grep coro
> > root 5263 4617 0 10:54 pts/0 00:00:00 grep coro
> > [root at rh54-1 ~]# ps -ef |grep heartbeat
> > root 4882 1 0 10:52 ? 00:00:00 /usr/lib/heartbeat/stonithd
> > 500 4883 1 0 10:52 ? 00:00:00 /usr/lib/heartbeat/cib
> > root 4884 1 0 10:52 ? 00:00:00 /usr/lib/heartbeat/lrmd
> > 500 4885 1 0 10:52 ? 00:00:00 /usr/lib/heartbeat/attrd
> > 500 4886 1 0 10:52 ? 00:00:00 /usr/lib/heartbeat/pengine
> > 500 4887 1 0 10:52 ? 00:00:00 /usr/lib/heartbeat/crmd
> > root 5278 4617 0 10:54 pts/0 00:00:00 grep heartbeat
> > [root at rh54-1 ~]# kill -9 4882 4883 4884 4885 4886 4887
> > [root at rh54-1 ~]# ps -ef |grep heartbeat
> > root 5310 4617 0 10:54 pts/0 00:00:00 grep heartbeat
> >
> > ------------------------------------------------------------------------------------
> >
> > I started Corosync again.
> > But, a cib process of Pacemaker seems not to be able to communicate with Corosync.
> >
> >
> > ------------------------------------------------------------------------------------
> > Oct 19 10:55:29 rh54-1 cib: [5354]: info: startCib: CIB Initialization completed successfully
> > Oct 19 10:55:29 rh54-1 cib: [5354]: info: crm_cluster_connect: Connecting to OpenAIS
> > Oct 19 10:55:29 rh54-1 cib: [5354]: info: init_ais_connection: Creating connection to our AIS
> plugin
> > Oct 19 10:55:30 rh54-1 mgmtd: [5359]: info: login to cib live: 1, ret:-10
> > Oct 19 10:55:30 rh54-1 crmd: [5358]: info: do_cib_control: Could not connect to the CIB
> service:
> > connection failed
> > Oct 19 10:55:30 rh54-1 crmd: [5358]: WARN: do_cib_control: Couldn't complete CIB registration
> 1
> > times... pause and retry
> > Oct 19 10:55:30 rh54-1 crmd: [5358]: info: crmd_init: Starting crmd's mainloop
> > Oct 19 10:55:31 rh54-1 mgmtd: [5359]: info: login to cib live: 2, ret:-10
> > Oct 19 10:55:32 rh54-1 mgmtd: [5359]: info: login to cib live: 3, ret:-10
> > Oct 19 10:55:32 rh54-1 crmd: [5358]: info: crm_timer_popped: Wait Timer (I_NULL) just popped!
> > Oct 19 10:55:33 rh54-1 mgmtd: [5359]: info: login to cib live: 4, ret:-10
> > Oct 19 10:55:33 rh54-1 crmd: [5358]: info: do_cib_control: Could not connect to the CIB
> service:
> > connection failed
> > Oct 19 10:55:33 rh54-1 crmd: [5358]: WARN: do_cib_control: Couldn't complete CIB registration
> 2
> > times... pause and retry
> >
> > ------------------------------------------------------------------------------------
> >
> > On this account it does not start definitely even if Pacemaker waits till when.
> >
> > As for the problem, Corosync seems to fail in poll(?) somehow or other.
> > However, possibly the cause may depend on the failure of the first stop.
> >
> > ------------------------------------------------------------------------------------
> > [root at rh54-1 ~]# ps -ef |grep coro
> > root 5348 1 0 10:55 ? 00:00:00 /usr/sbin/corosync
> > root 5400 4617 0 10:56 pts/0 00:00:00 grep coro
> > [root at rh54-1 ~]# strace -p 5348
> > Process 5348 attached - interrupt to quit
> > futex(0x805c8c0, FUTEX_WAIT_PRIVATE, 2, NULL
> > ------------------------------------------------------------------------------------
> >
> > Is there a method with the avoidance of this phenomenon what it is?
> > Can I evade a problem by deleting some file?
> >
> > * I hope it so that a combination of Corosync and Pacemaker becomes the practical use early.
> >
> > Best Regards,
> > Hideo Yamauchi.
> > _______________________________________________
> > Pacemaker mailing list
> > Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
More information about the Pacemaker
mailing list