[Pacemaker] TOTEM: Process pause detected? Leading to STONITH...
Sebastian Kaps
sebastian.kaps at imail.de
Thu Aug 4 12:46:00 UTC 2011
Hello,
here's another problem we're having:
Jul 31 03:51:02 node01 corosync[5870]: [TOTEM ] Process pause detected
for 11149 ms, flushing membership messages.
Jul 31 03:51:11 node01 corosync[5870]: [CLM ] CLM CONFIGURATION
CHANGE
Jul 31 03:51:11 node01 corosync[5870]: [CLM ] New Configuration:
Jul 31 03:51:11 node01 corosync[5870]: [CLM ] r(0) ip(192.168.1.1)
r(1) ip(x.y.z.3)
Jul 31 03:51:11 node01 corosync[5870]: [CLM ] Members Left:
Jul 31 03:51:11 node01 corosync[5870]: [CLM ] r(0) ip(192.168.1.2)
r(1) ip(x.y.z.1)
Jul 31 03:51:11 node01 corosync[5870]: [CLM ] Members Joined:
Jul 31 03:51:11 node01 corosync[5870]: [pcmk ] notice:
pcmk_peer_update: Transitional membership event on ring 9708: memb=1,
new=0, lost=1
Jul 31 03:51:11 node01 corosync[5870]: [pcmk ] info:
pcmk_peer_update: memb: node01 16885952
Jul 31 03:51:11 node01 corosync[5870]: [pcmk ] info:
pcmk_peer_update: lost: node02 33663168
Jul 31 03:51:11 node01 corosync[5870]: [CLM ] CLM CONFIGURATION
CHANGE
Jul 31 03:51:11 node01 corosync[5870]: [CLM ] New Configuration:
Jul 31 03:51:11 node01 corosync[5870]: [CLM ] r(0) ip(192.168.1.1)
r(1) ip(x.y.z.3)
Jul 31 03:51:11 node01 corosync[5870]: [CLM ] Members Left:
Jul 31 03:51:11 node01 corosync[5870]: [CLM ] Members Joined:
Jul 31 03:51:11 node01 crmd: [5912]: notice: ais_dispatch_message:
Membership 9708: quorum lost
Node01 gets Stonith'd shortly after that. There is no indication
whatsoever that this would happen in the logs.
For at least half an hour before that there's only the normal
status-message noise from monitor ops etc.
Jul 31 03:51:01 node02 corosync[5810]: [TOTEM ] A processor failed,
forming new configuration.
Jul 31 03:51:11 node02 corosync[5810]: [CLM ] CLM CONFIGURATION
CHANGE
Jul 31 03:51:11 node02 corosync[5810]: [CLM ] New Configuration:
Jul 31 03:51:11 node02 corosync[5810]: [CLM ] r(0) ip(192.168.1.2)
r(1) ip(x.y.z.1)
Jul 31 03:51:11 node02 corosync[5810]: [CLM ] Members Left:
Jul 31 03:51:11 node02 corosync[5810]: [CLM ] r(0) ip(192.168.1.1)
r(1) ip(x.y.z.3)
Jul 31 03:51:11 node02 corosync[5810]: [CLM ] Members Joined:
Jul 31 03:51:11 node02 corosync[5810]: [pcmk ] notice:
pcmk_peer_update: Transitional membership event on ring 9708: memb=1,
new=0, lost=1
Jul 31 03:51:11 node02 corosync[5810]: [pcmk ] info:
pcmk_peer_update: memb: node02 33663168
Jul 31 03:51:11 node02 corosync[5810]: [pcmk ] info:
pcmk_peer_update: lost: node01 16885952
Jul 31 03:51:11 node02 corosync[5810]: [CLM ] CLM CONFIGURATION
CHANGE
Jul 31 03:51:11 node02 corosync[5810]: [CLM ] New Configuration:
Jul 31 03:51:11 node02 corosync[5810]: [CLM ] r(0) ip(192.168.1.2)
r(1) ip(x.y.z.1)
Jul 31 03:51:11 node02 corosync[5810]: [CLM ] Members Left:
Jul 31 03:51:11 node02 corosync[5810]: [CLM ] Members Joined:
What does "Process pause detected" mean?
Quoting from my other recent post regarding the backup ring being
marked faulty sporadically:
|We're running a two-node cluster with redundant rings.
|Ring 0 is a 10 GB direct connection; ring 1 consists of two 1GB
interfaces that are bonded in
|active-backup mode and routed through two independent switches for
each node. The ring 1 network
|is our "normal" 1G LAN and should only be used in case the direct 10G
connection should fail.
|
|Corosync Cluster Engine, version '1.3.1'
|Copyright (c) 2006-2009 Red Hat, Inc.
|
|It's the version that comes with SLES11-SP1-HA.
Thanks in advance!
--
Sebastian
More information about the Pacemaker
mailing list