[Pacemaker] token lost - need clarification

Michael Schwartzkopff ms at sys4.de
Tue Dec 17 03:28:51 EST 2013


Am Dienstag, 17. Dezember 2013, 09:17:31 schrieb marco at nucleus.it:
> Hi to all,
> i set up a 2 node cluster with a cross cable between the two nodes
> without stonith ; i know this is not the best way but this is the
> scenario i need at that time.
> 
> I know the releases are old:
> corosync-1.2.7-1.2
> libcorosync-1.2.7-1.2
> pacemaker-1.0.10-1.4
> libpacemaker3-1.0.10-1.4
> 
> Everything was ok for some days/months but a few day ago without
> network interruption ( no messages relative to ethernet modules or
> errors in network statistics or notifications by nagios ping checks )
> between the two nodes something went wrong.
> 
> From what i try to understand from the logs attached :
> Token Timeout (10000 ms) retransmit timeout (980 ms)
> token hold (774 ms) retransmits before loss (10 retrans)
> 
> 
> the 2 nodes lost a token and they try to solve the situation but
> node1 think node2 is up:
> 
> Dec  7 05:01:41 node1 pengine: [1138]: info: determine_online_status:
> Node node2 is online
> Dec  7 05:01:41 node1 pengine: [1138]: info:
> determine_online_status: Node node1 is online
> 
> and then lost
> 
> Dec  7 05:01:54 node1 corosync[1128]:   [pcmk  ] info:
> ais_mark_unseen_peer_dead: Node node2 was not seen in the previous
> transition
> Dec  7 05:01:54 node1 corosync[1128]:   [pcmk  ] info: update_member:
> Node 33559980/node2 is now: lost
> 
> while node2 think node1 was gone:
> 
> Dec  7 05:01:34 node2 corosync[6356]:   [pcmk  ] info:
> ais_mark_unseen_peer_dead: Node node1 was not seen in the previous
> transition Dec  7 05:01:34 node2 corosync[6356]:   [pcmk  ] info:
> update_member: Node 16782764/node1 is now: lost
> 
> then they go in spilt brain .
> Any suggestion about why node1 saw node2 ath the first time while node2
> declared immediately lost node1 ?

This depends who initiates the round. Both nodes recognized the failure within 
20 seconds. This is ok. Especially if you allow 10 Sekunds for a token 
timeout.

Mit freundlichen Grüßen,

Michael Schwartzkopff

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131217/ccbcd6b1/attachment-0003.sig>


More information about the Pacemaker mailing list