[Pacemaker] never ending election

Andrew Beekhof beekhof at gmail.com
Mon Aug 4 11:19:02 EDT 2008


On Mon, Aug 4, 2008 at 16:53, David Riccitelli <david at interact.it> wrote:
> The log for the second node are located here:
>  https://share.acrobat.com/adc/document.do?docid=144a8a57-4c6a-46d9-bfc4-cad7dd31fc02
> I don't have the one for the first node at the moment.

unfortunately i need both - since both think they should win the
election and I need to try and figure out who's right (and thus where
the bug is)

> The log starts with this line:
>  Aug  1 11:50:09 rmefp-srv02x heartbeat: [20780]: WARN: node rmefp-srv01x:
> is dead
> which is when I removed the two network cables from the first node;
> And the last meaningful line I believe is this:
> Aug  1 12:19:51 rmefp-srv02x crmd: [20793]: info: do_election_check: Still
> waiting on 2 non-votes (2 total)
> As the following line happens when I forced the restart of the heartbeat
> service (on the second node):
> Aug  1 12:19:55 rmefp-srv02x heartbeat: [6404]: info: No log entry found in
> ha.cf -- use logd
>
>
> Best regards,
> David Riccitelli
>
>
> ________________________________________________________________________
>
> David Riccitelli
>
> e-mail: david at interact.it
> skype: ziodave
> phone: +39.0658318336
>
>
>  roma - tel.+39.0658318301 fax.+39.0658318303 P.I. 04856801008
>
>
>
> Rispetta l'ambiente e non stampare questa e-mail a meno che non ti sia
> realmente utile.
> Please consider the environment and don't print this e-mail unless you
> really need to.
>
> NOTE SULLA PRIVACY
> Le informazioni trasmesse attraverso la presente e-mail ed i suoi allegati
> sono diretti esclusivamente al
> destinatario e devono ritenersi riservati con divieto di diffusione e di
> uso. La diffusione e la comunicazione
> da parte di soggetto diverso dal destinatario è vietata dall'art. 616 e ss.
> c.p. e dal d. l.vo n. 196/03.
> Se la presente e-mail ed i suoi allegati fossero stati ricevuti per errore
> da persona diversa dal destinatario
> siete pregati di distruggere tutto quanto ricevuto e di informare il
> mittente con lo stesso mezzo.
> ________________________________________________________________________
>
>
>
> On 04/ago/08, at 13:11, Andrew Beekhof wrote:
>
> Hard to say what's going on based on this log fragment.
> Can you put the full logs from both nodes somewhere?
>
> On Sun, Aug 3, 2008 at 11:18, David Riccitelli <david at interact.it> wrote:
>
> Hello there,
>
> Can somebody help me with this problem?
>
> I have 2 identical nodes, node #1 and node #2. Nodes are installed with
>
> CentOS 5 and the current version of heartbeat (2.1.3) and pacemaker (0.6.5).
>
> Each node has 2 network ports bonded together (mode 1). bonding is
>
> configured and working fine.
>
> The nodes have one resource configured. And I must say everything works
>
> fine. All the tests I'm running show perfect failovers, but one test:
>
> 1. node #1 has the resource, node #2 is waiting,
>
> 2. I remove both network cables from node #1,
>
> 3. node #2 doesn't sense node #1 anymore and believes it is dead,
>
> 4. node #2 brings up the resource,
>
> 5. then I put back node #1 in the network - I believe the nodes should see
>
> themselves and one of the two will leave the resource,
>
> 6. node #1 and node #2 see each other and start counting election votes,
>
> but for an indefinite time and the resource is active on two nodes at the
>
> same time:
>
> logs (same on both nodes - this pattern repeats forever, until heartbeat is
>
> manually stopped on one of the nodes):
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at clusterlabs.org
> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at clusterlabs.org
> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>
>




More information about the Pacemaker mailing list