[ClusterLabs] Is "Process pause detected" triggered too easily?
Jean-Marc Saffroy
saffroy at gmail.com
Tue Sep 26 14:41:38 EDT 2017
Hello,
As the subject line suggests, I am wondering why I see so many of these
log lines (many means about 10 times per minute, usually several in the
same second):
Sep 26 19:56:24 [950] vm0 corosync notice [TOTEM ] Process pause detected
for 2555 ms, flushing membership messages.
Sep 26 19:56:24 [950] vm0 corosync notice [TOTEM ] Process pause detected
for 2558 ms, flushing membership messages.
Let me add some context:
- this is observed in 3 small VMs on my laptop
- the OS is CentOS 7.3, corosync is 2.4.0-9.el7_4.2
- these VMs only run corosync, nothing else
- the VM host (my laptop) is idle 60-80% of the time
- VMs are qemu-kvm guests, connected with tap interfaces
- AND the messages only appear when, on one of the VMs, I do stop/start
corosync in a tight loop, like this:
[root at vm2 ~]# while :; do echo $(date) stop; systemctl stop corosync ;
echo $(date) start;systemctl start corosync ; done
Tue Sep 26 19:50:19 CEST 2017 stop
Tue Sep 26 19:50:21 CEST 2017 start
Tue Sep 26 19:50:21 CEST 2017 stop
Tue Sep 26 19:50:22 CEST 2017 start
...
I understand that this kind of test is stressful (and quite articial), but
I'm still surprised to see these particular messages, because it seems to
me a bit unlikely that the corosync process is not properly scheduled for
seconds at a time so frequently (several times per minute).
So I wonder if maybe there could be other explanations?
Also, it looks like the side effect is that corosync drops important
messages (I think "join" messages?), and I fear that this can lead to
bigger issues with DLM (which is why I'm looking into this in the first
place).
In case that's helpful, attached are 10 minutes of corosync log and the
config file I'm using (it has 5 nodes declared, but I reproduce even with
just 3 nodes).
Thanks in advance for any suggestion!
Cheers,
JM
--
saffroy at gmail.com
-------------- next part --------------
# Please read the corosync.conf.5 manual page
totem {
config_version: 20170925231703
version: 2
transport: udpu
# How long before declaring a token lost (ms)
token: 3000
# How many token retransmits before forming a new configuration
token_retransmits_before_loss_const: 10
# How long to wait for join messages in the membership protocol (ms)
join: 100
#send_join: 60
# How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
consensus: 3600
# Turn off the virtual synchrony filter
vsftype: none
# Number of messages that may be sent by one processor on receipt of the token
max_messages: 20
# Limit generated nodeids to 31-bits (positive signed integers)
clear_node_high_bit: yes
# Disable encryption
secauth: off
# How many threads to use for encryption/decryption
threads: 0
# Optionally assign a fixed node id (integer)
# nodeid: 1234
# This specifies the mode of redundant ring, which may be none, active, or passive.
rrp_mode: none
interface {
# The following values need to be set based on your environment
ringnumber: 0
bindnetaddr: 172.16.0.33
#broadcast: yes
#mcastaddr: 226.94.1.1
#mcastport: 5405
}
cluster_name: dlm
}
amf {
mode: disabled
}
quorum {
# Quorum for the Pacemaker Cluster Resource Manager
provider: corosync_votequorum
#expected_votes: 2
quorum_votes: 0
votes: 0
}
aisexec {
user: root
group: root
}
logging {
fileline: off
to_stderr: yes
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: yes
syslog_facility: daemon
debug: on
timestamp: on
logger_subsys {
subsys: AMF
debug: on
tags: enter|leave|trace1|trace2|trace3|trace4|trace6
}
}
nodelist {
node {
# vm0
ring0_addr: 172.16.0.33
quorum_votes: 1
nodeid: 1
}
node {
# vm1
ring0_addr: 172.16.1.33
quorum_votes: 1
nodeid: 2
}
node {
# vm2
ring0_addr: 172.16.2.33
quorum_votes: 1
nodeid: 3
}
node {
# vm3
ring0_addr: 172.16.3.33
quorum_votes: 0
nodeid: 4
}
node {
# vm4
ring0_addr: 172.16.4.33
quorum_votes: 0
nodeid: 5
}
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: corosync.log.xz
Type: application/x-xz
Size: 186708 bytes
Desc:
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170926/fad420aa/attachment-0002.xz>
More information about the Users
mailing list