[Pacemaker] never ending election

Tue Aug 5 11:51:09 UTC 2008

On Sun, Aug 3, 2008 at 11:18, David Riccitelli <david at interact.it> wrote:
> Hello there,
> Can somebody help me with this problem?
> I have 2 identical nodes, node #1 and node #2. Nodes are installed with
> CentOS 5 and the current version of heartbeat (2.1.3) and pacemaker (0.6.5).
> Each node has 2 network ports bonded together (mode 1). bonding is
> configured and working fine.
> The nodes have one resource configured. And I must say everything works
> fine. All the tests I'm running show perfect failovers, but one test:
>  1. node #1 has the resource, node #2 is waiting,
>  2. I remove both network cables from node #1,
>  3. node #2 doesn't sense node #1 anymore and believes it is dead,
>  4. node #2 brings up the resource,
>  5. then I put back node #1 in the network - I believe the nodes should see
> themselves and one of the two will leave the resource,
>  6. node #1 and node #2 see each other and start counting election votes,
> but for an indefinite time and the resource is active on two nodes at the
> same time:
> logs (same on both nodes - this pattern repeats forever, until heartbeat is
> manually stopped on one of the nodes):

Is there any chance you could add "debug 1" to ha.cf and retest?
It seems that the log messages that would shed light on this (the ones
that indicate why each side felt they "win") are debug ones :(