[Pacemaker] Strange behavior when starting up Corosync in a single node setup

Mon Oct 25 07:26:53 UTC 2010

-------- Original-Nachricht --------
> Datum: Thu, 21 Oct 2010 18:46:32 +0200
> Von: "Stephan-Frank Henry" <Frank dot Henry at gmx dot net>
> > Andrew Beekhof
> > Mon, 13 Sep 2010 06:25:48 -0700
> > 
> > Looks like corosync can't talk to itself - ie. it never sees the
> > multicast messages it sends out.
> > This would result in the pacemaker errors you're seeing.
> > 
> > Almost always this is a firewall issue :-)
> > Perhaps try disabling it completely?
>
<snip>
>
> Strangest of all, if I stop (and kill) corosync and restart it via init.d
> manually, it works fine.
> Even without any of the changes mentioned below.

FYI:
I have worked around this issue for now.
Biggest 'problem' was working around the blocking call to start corosync, but I was able to solve that by creating some scripts and executing one in the background with a sleep.

* I have removed drbd from the runtimes (it gets started by corosync anyway).
  update-rc.d -f drbd remove
* I have added two scripts
  * ha_delay: a simple script that waits 45s and then just calls /etc/init.d/corosync start
  * ha, a shell around corosync that will pass through all calls != start and redirects start to the ha_delay script in the background
   i.e. /etc/init.d/ha start &
  * replaced the rcS.d sym-link to corosync to point to ha
    the execution order could be left as it has little influence.

The scripts are really simple and if anyone wishes, I can post them.

Thanks to everyone for all the support!

Frank
-- 
GMX DSL Doppel-Flat ab 19,99 €/mtl.! Jetzt auch mit 
gratis Notebook-Flat! http://portal.gmx.net/de/go/dsl