[Pacemaker] Impossible to add a 4th node to a cluster
Guillaume Chanaud
guillaume.chanaud at connecting-nature.com
Thu Oct 28 16:47:27 UTC 2010
Le 28/10/2010 18:30, Guillaume Chanaud a écrit :
> Le 28/10/2010 17:55, Pavlos Parissis a écrit :
>> On 28 October 2010 16:09, Guillaume Chanaud
>> <guillaume.chanaud at connecting-nature.com> wrote:
>>> Hello,
>>>
>>> i have a cluster of two master/slave drbd server running into a vlan
>>> (machines are dedicated servers)
>>> (filer1 and filer2)
>>> I added a third node to the cluster (a "blank node" for the moment)
>>> correctly
>>> (server1)
>>> When i add a 4th node to the cluster (which is a "mirror" of server1)
>>> (server2)
>>> this node start as standalone...Here is the message.log :
>>>
>>> Oct 28 15:59:27 ns209045 corosync[16543]: [TOTEM ] A processor
>>> joined or
>>> left the membership and a new membership was formed.
>>> Oct 28 15:59:28 ns209045 corosync[16543]: [pcmk ] notice:
>>> pcmk_peer_update: Transitional membership event on ring 945392: memb=1,
>>> new=0, lost=0
>>> Oct 28 15:59:28 ns209045 corosync[16543]: [pcmk ] info:
>>> pcmk_peer_update:
>>> memb: server2 16820416
>>> Oct 28 15:59:28 ns209045 corosync[16543]: [pcmk ] notice:
>>> pcmk_peer_update: Stable membership event on ring 945392: memb=1,
>>> new=0,
>>> lost=0
>>> Oct 28 15:59:28 ns209045 corosync[16543]: [pcmk ] info:
>>> pcmk_peer_update:
>>> MEMB: server2 16820416
>>> Oct 28 15:59:28 ns209045 corosync[16543]: [TOTEM ] A processor
>>> joined or
>>> left the membership and a new membership was formed.
>>> Oct 28 15:59:29 ns209045 corosync[16543]: [pcmk ] notice:
>>> pcmk_peer_update: Transitional membership event on ring 945416: memb=1,
>>> new=0, lost=0
>>> Oct 28 15:59:29 ns209045 corosync[16543]: [pcmk ] info:
>>> pcmk_peer_update:
>>> memb: server2 16820416
>>> Oct 28 15:59:29 ns209045 corosync[16543]: [pcmk ] notice:
>>> pcmk_peer_update: Stable membership event on ring 945416: memb=1,
>>> new=0,
>>> lost=0
>>> Oct 28 15:59:29 ns209045 corosync[16543]: [pcmk ] info:
>>> pcmk_peer_update:
>>> MEMB: server2 16820416
>>>
>>> [...] Message repeat many many times
>>>
>>> Now i stop the server1, and i start the server2...server2 start
>>> correctly
>>> and is added to the cluster...but when
>>> i want to start server1, same thing happens...(so things are
>>> inverted but
>>> result is the same...when i start one the serverX, the other can't
>>> start...)
>>>
>>> My corosync.conf is configured in broadcast, not multicast....I have
>>> lots of
>>> problem with multicast because lots of briged VM on the vlan
>>> doesn't see the multicast packets, or doesn't join the multicast group
>>> correctly...
>>>
>>> Any hint on this ??
>> corosync and auth files are the same on server2?
>>
>
> Yes of course :D (copied by scp), as i told server1 can join when
> server2 is offline, and server 2 can join when server1 is offline, but
> if one is online, the other can't join and log the above things in
> loop...
>
> In fact i have loooooooottttttssssss of problem with
> corosync/pacemaker...multicast/broadcast between physical
> servers/virtual....lots of different shit everywhere, error log are
> always different depending on what i try...
>
> The strange things is that the filer1 filer2 server2 and server1 are
> all running the same distro (gentoo) with same tools and are on the
> same vlan (which is working for lots of services like nfs...)
Another things i've just seen...
When one of the server1/server2 connect to the cluster, log start to
fill with this message on all nodes :
ct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Oct 28 18:46:01 filer2 corosync[10928]: [MAIN ] Completed service
synchronization, ready to provide service.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 513 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 513 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 513 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 513 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 572 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 572 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 573 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [pcmk ] notice:
pcmk_peer_update: Transitional membership event on ring 1162480: memb=3,
new=0, lost=0
Oct 28 18:46:01 filer2 corosync[10928]: [pcmk ] info:
pcmk_peer_update: memb: server1 16820416
Oct 28 18:46:01 filer2 corosync[10928]: [pcmk ] info:
pcmk_peer_update: memb: filer1 83929280
Oct 28 18:46:01 filer2 corosync[10928]: [pcmk ] info:
pcmk_peer_update: memb: filer2 100706496
Oct 28 18:46:01 filer2 corosync[10928]: [pcmk ] notice:
pcmk_peer_update: Stable membership event on ring 1162480: memb=3,
new=0, lost=0
Oct 28 18:46:01 filer2 corosync[10928]: [pcmk ] info:
pcmk_peer_update: MEMB: server1 16820416
Oct 28 18:46:01 filer2 corosync[10928]: [pcmk ] info:
pcmk_peer_update: MEMB: filer1 83929280
Oct 28 18:46:01 filer2 corosync[10928]: [pcmk ] info:
pcmk_peer_update: MEMB: filer2 100706496
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Oct 28 18:46:01 filer2 corosync[10928]: [MAIN ] Completed service
synchronization, ready to provide service.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 632 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 632 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 632 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 692 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 692 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 751 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]: [TOTEM ] Process pause
detected for 751 ms, flushing membership messages.
which is not the case when filer1/filer2 are the only nodes of the
cluster...
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
More information about the Pacemaker
mailing list