[Pacemaker] token lost - need clarification

marco at nucleus.it marco at nucleus.it
Tue Dec 17 03:17:31 EST 2013


Hi to all,
i set up a 2 node cluster with a cross cable between the two nodes
without stonith ; i know this is not the best way but this is the
scenario i need at that time.

I know the releases are old:
corosync-1.2.7-1.2
libcorosync-1.2.7-1.2
pacemaker-1.0.10-1.4
libpacemaker3-1.0.10-1.4

Everything was ok for some days/months but a few day ago without
network interruption ( no messages relative to ethernet modules or
errors in network statistics or notifications by nagios ping checks )
between the two nodes something went wrong.

From what i try to understand from the logs attached :
Token Timeout (10000 ms) retransmit timeout (980 ms)
token hold (774 ms) retransmits before loss (10 retrans)


the 2 nodes lost a token and they try to solve the situation but 
node1 think node2 is up:

Dec  7 05:01:41 node1 pengine: [1138]: info: determine_online_status:
Node node2 is online
Dec  7 05:01:41 node1 pengine: [1138]: info:
determine_online_status: Node node1 is online

and then lost

Dec  7 05:01:54 node1 corosync[1128]:   [pcmk  ] info:
ais_mark_unseen_peer_dead: Node node2 was not seen in the previous
transition
Dec  7 05:01:54 node1 corosync[1128]:   [pcmk  ] info: update_member:
Node 33559980/node2 is now: lost

while node2 think node1 was gone:

Dec  7 05:01:34 node2 corosync[6356]:   [pcmk  ] info:
ais_mark_unseen_peer_dead: Node node1 was not seen in the previous
transition Dec  7 05:01:34 node2 corosync[6356]:   [pcmk  ] info:
update_member: Node 16782764/node1 is now: lost 

then they go in spilt brain .
Any suggestion about why node1 saw node2 ath the first time while node2
declared immediately lost node1 ?


Second question :)
Do you know if there Is any documentation about
OPERATIONAL,GATHER,COMMIT state to understand better corosync?


thanks

-------------- next part --------------
A non-text attachment was scrubbed...
Name: corosync.conf
Type: application/octet-stream
Size: 1228 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131217/ae605834/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node1.log
Type: text/x-log
Size: 22772 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131217/ae605834/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node2.log
Type: text/x-log
Size: 47054 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131217/ae605834/attachment-0005.bin>


More information about the Pacemaker mailing list