[Pacemaker] Impossible to add a 4th node to a cluster

Guillaume Chanaud guillaume.chanaud at connecting-nature.com
Thu Oct 28 12:47:27 EDT 2010


  Le 28/10/2010 18:30, Guillaume Chanaud a écrit :
>  Le 28/10/2010 17:55, Pavlos Parissis a écrit :
>> On 28 October 2010 16:09, Guillaume Chanaud
>> <guillaume.chanaud at connecting-nature.com>  wrote:
>>>   Hello,
>>>
>>> i have a cluster of two master/slave drbd server running into a vlan
>>> (machines are dedicated servers)
>>> (filer1 and filer2)
>>> I added a third node to the cluster (a "blank node" for the moment)
>>> correctly
>>> (server1)
>>> When i add a 4th node to the cluster (which is a "mirror" of server1)
>>> (server2)
>>> this node start as standalone...Here is the message.log :
>>>
>>> Oct 28 15:59:27 ns209045 corosync[16543]:   [TOTEM ] A processor 
>>> joined or
>>> left the membership and a new membership was formed.
>>> Oct 28 15:59:28 ns209045 corosync[16543]:   [pcmk  ] notice:
>>> pcmk_peer_update: Transitional membership event on ring 945392: memb=1,
>>> new=0, lost=0
>>> Oct 28 15:59:28 ns209045 corosync[16543]:   [pcmk  ] info: 
>>> pcmk_peer_update:
>>> memb: server2 16820416
>>> Oct 28 15:59:28 ns209045 corosync[16543]:   [pcmk  ] notice:
>>> pcmk_peer_update: Stable membership event on ring 945392: memb=1, 
>>> new=0,
>>> lost=0
>>> Oct 28 15:59:28 ns209045 corosync[16543]:   [pcmk  ] info: 
>>> pcmk_peer_update:
>>> MEMB: server2 16820416
>>> Oct 28 15:59:28 ns209045 corosync[16543]:   [TOTEM ] A processor 
>>> joined or
>>> left the membership and a new membership was formed.
>>> Oct 28 15:59:29 ns209045 corosync[16543]:   [pcmk  ] notice:
>>> pcmk_peer_update: Transitional membership event on ring 945416: memb=1,
>>> new=0, lost=0
>>> Oct 28 15:59:29 ns209045 corosync[16543]:   [pcmk  ] info: 
>>> pcmk_peer_update:
>>> memb: server2 16820416
>>> Oct 28 15:59:29 ns209045 corosync[16543]:   [pcmk  ] notice:
>>> pcmk_peer_update: Stable membership event on ring 945416: memb=1, 
>>> new=0,
>>> lost=0
>>> Oct 28 15:59:29 ns209045 corosync[16543]:   [pcmk  ] info: 
>>> pcmk_peer_update:
>>> MEMB: server2 16820416
>>>
>>> [...] Message repeat many many times
>>>
>>> Now i stop the server1, and i start the server2...server2 start 
>>> correctly
>>> and is added to the cluster...but when
>>> i want to start server1, same thing happens...(so things are 
>>> inverted but
>>> result is the same...when i start one the serverX, the other can't 
>>> start...)
>>>
>>> My corosync.conf is configured in broadcast, not multicast....I have 
>>> lots of
>>> problem with multicast because lots of briged VM on the vlan
>>> doesn't see the multicast packets, or doesn't join the multicast group
>>> correctly...
>>>
>>> Any hint on this ??
>> corosync and auth files are the same on server2?
>>
>
> Yes of course :D (copied by scp), as i told server1 can join when 
> server2 is offline, and server 2 can join when server1 is offline, but 
> if one is online, the other can't join and log the above things in 
> loop...
>
> In fact i have loooooooottttttssssss of problem with 
> corosync/pacemaker...multicast/broadcast between physical 
> servers/virtual....lots of different shit everywhere, error log are 
> always different depending on what i try...
>
> The strange things is that the filer1 filer2 server2 and server1 are 
> all running the same distro (gentoo) with same tools and are on the 
> same vlan (which is working for lots of services like nfs...)
Another things i've just seen...
When one of the server1/server2 connect to the cluster, log start to 
fill with this message on all nodes :

ct 28 18:46:01 filer2 corosync[10928]:   [TOTEM ] A processor joined or 
left the membership and a new membership was formed.
Oct 28 18:46:01 filer2 corosync[10928]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
Oct 28 18:46:01 filer2 corosync[10928]:   [TOTEM ] Process pause 
detected for 513 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]:   [TOTEM ] Process pause 
detected for 513 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]:   [TOTEM ] Process pause 
detected for 513 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]:   [TOTEM ] Process pause 
detected for 513 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]:   [TOTEM ] Process pause 
detected for 572 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]:   [TOTEM ] Process pause 
detected for 572 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]:   [TOTEM ] Process pause 
detected for 573 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]:   [pcmk  ] notice: 
pcmk_peer_update: Transitional membership event on ring 1162480: memb=3, 
new=0, lost=0
Oct 28 18:46:01 filer2 corosync[10928]:   [pcmk  ] info: 
pcmk_peer_update: memb: server1 16820416
Oct 28 18:46:01 filer2 corosync[10928]:   [pcmk  ] info: 
pcmk_peer_update: memb: filer1 83929280
Oct 28 18:46:01 filer2 corosync[10928]:   [pcmk  ] info: 
pcmk_peer_update: memb: filer2 100706496
Oct 28 18:46:01 filer2 corosync[10928]:   [pcmk  ] notice: 
pcmk_peer_update: Stable membership event on ring 1162480: memb=3, 
new=0, lost=0
Oct 28 18:46:01 filer2 corosync[10928]:   [pcmk  ] info: 
pcmk_peer_update: MEMB: server1 16820416
Oct 28 18:46:01 filer2 corosync[10928]:   [pcmk  ] info: 
pcmk_peer_update: MEMB: filer1 83929280
Oct 28 18:46:01 filer2 corosync[10928]:   [pcmk  ] info: 
pcmk_peer_update: MEMB: filer2 100706496
Oct 28 18:46:01 filer2 corosync[10928]:   [TOTEM ] A processor joined or 
left the membership and a new membership was formed.
Oct 28 18:46:01 filer2 corosync[10928]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
Oct 28 18:46:01 filer2 corosync[10928]:   [TOTEM ] Process pause 
detected for 632 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]:   [TOTEM ] Process pause 
detected for 632 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]:   [TOTEM ] Process pause 
detected for 632 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]:   [TOTEM ] Process pause 
detected for 692 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]:   [TOTEM ] Process pause 
detected for 692 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]:   [TOTEM ] Process pause 
detected for 751 ms, flushing membership messages.
Oct 28 18:46:01 filer2 corosync[10928]:   [TOTEM ] Process pause 
detected for 751 ms, flushing membership messages.

which is not the case when filer1/filer2 are the only nodes of the 
cluster...
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





More information about the Pacemaker mailing list