[Pacemaker] Multiple thread after rebooting server: the node doesn't go online

Andrew Beekhof andrew at beekhof.net
Fri Nov 20 07:20:35 UTC 2009


On Thu, Nov 19, 2009 at 9:40 PM, Giovanni Di Milia
<gdimilia at cfa.harvard.edu> wrote:
>
> On Nov 19, 2009, at 3:03 PM, Andrew Beekhof wrote:
>
> Another problem has appeared:
>
> after the reboot of one server I often have a cluster partition and both
>
> servers elect themselves DC.
>
> Even if the partition doesn't appear just after the reboot of one server
>
> (i.e. serverA), if I try to restart corosync on the other server (i.e.
>
> serverB), the partition appear.
>
> Then if I also restart corosync on the first server (serverA) everything
>
> work fine again.
>
> But if I restart corosync on the second server (serverB) nothing change and
>
> the partition appears again.
>
> It's seems to me that there is still something wrong with the first run of
>
> corosync just after the server reboot.
>
> I've found that it starts a bit too early by default.
> Various systems seem to like messing with the network stack (xen is
> one but there are others) which confuses corosync.
>
> I wrote a shell script that "manually starts" corosync 5 minutes after the
> server starts and in this case the problem appears every time!
> It's driving me crazy, because I can see that my script starts a while after
> the server is up and I'm pretty sure everything is running!
> On the other hand, if I start manually corosync just after the server is up,
> everything works fine!

i wonder if there is something in the environment.
perhaps have your script dump the output of
   env | sort
to a file and compare to the logged in case.

>
> You're not getting addresses from a dhcp server are you?
> Thats another common cause, since there can be a significant delay in
> obtaining the address - which again messes with corosync.
>
> Absolutely no!
> I have two servers with static public IP.
> I also added the two server in the /etc/hosts file: in general I followed
> all the guidelines I found in the documentation.
>
> I didn't configure any fencing method, because I think that my configuration
>
> is really simple and I don't need it.
>
> Do you need your data though?
>
> Do you mean it's better to configure a fencing method anyway?

yes




More information about the Pacemaker mailing list