[Pacemaker] Problems after updating from debian squeeze to wheezy
Arnold Krille
arnold at arnoldarts.de
Mon Jul 29 18:46:38 CEST 2013
Hi all,
I have a little problem here and would like to get some help:
I have (had?) a working three-node cluster of two active nodes (nebel1
and nebel2) and one standby-node (nebel3) running debian squeeze +
backports. That is pacemaker 1.1.7-1~bpo60+1 and corosync
1.4.2-1~bpo60+1.
Now I updated the standby-node nebel3 to debian wheezy which went
without problems itself. But as neither the versions of pacemaker and
corosync changed, I expected the updated nebel3 to join the original
cluster again. Little did I know... So while nebel3 has pacemaker
1.1.7-1 and corosync 1.4.2-3, it seems something in the update broke it.
/etc/corosync/corosync.conf is still the same on all nodes.
I suspect the problem is somewhere in corosync as nebel1 and nebel2
only see each other:
$ ssh root at nebel2 -- corosync-objctl |grep member
runtime.totem.pg.mrp.srp.members.33648138.ip=r(0) ip(10.110.1.2) r(1)
ip(10.112.0.2)
runtime.totem.pg.mrp.srp.members.33648138.join_count=1
runtime.totem.pg.mrp.srp.members.33648138.status=joined
runtime.totem.pg.mrp.srp.members.16870922.ip=r(0) ip(10.110.1.1) r(1)
ip(10.112.0.1)
runtime.totem.pg.mrp.srp.members.16870922.join_count=1
runtime.totem.pg.mrp.srp.members.16870922.status=joined
runtime.totem.pg.mrp.srp.members.50425354.ip=r(0) ip(10.110.1.3) r(1)
ip(10.112.0.3)
runtime.totem.pg.mrp.srp.members.50425354.join_count=39
runtime.totem.pg.mrp.srp.members.50425354.status=left
nebel3 on the other hand:
$ ssh root at nebel3 -- corosync-objctl |grep member
runtime.totem.pg.mrp.srp.members.50425354.ip=r(0) ip(10.110.1.3) r(1)
ip(10.112.0.3)
runtime.totem.pg.mrp.srp.members.50425354.join_count=1
runtime.totem.pg.mrp.srp.members.50425354.status=joined
Both nebel2 and nebel3 think the communication-rings are free of
faults:
$ ssh root at nebel2 -- corosync-cfgtool -s
Printing ring status.
Local node ID 33648138
RING ID 0
id = 10.110.1.2
status = ring 0 active with no faults
RING ID 1
id = 10.112.0.2
status = ring 1 active with no faults
$ ssh root at nebel3 -- corosync-cfgtool -s
Printing ring status.
Local node ID 50425354
RING ID 0
id = 10.110.1.3
status = ring 0 active with no faults
RING ID 1
id = 10.112.0.3
status = ring 1 active with no faults
I can ping all the participating nodes via all their connections and
IPs from all nodes
The corosync.log on nebel2 doesn't mention nebel3 after it leaving the
cluster for reboot after the update. Likewise the corosync.log on nebel3
doesn't mention nebel2 and nebel1 anymore.
So, what did I miss during the update? How can I get nebel3 to join
back into the original cluster instead of forming its own 1-out-of-3
cluster (with the same resources defined)?
Any helps is highly appreciated!
- Arnold
More information about the Pacemaker
mailing list