[Pacemaker] lots of timeout/rexmit messages after failed stonith and manual reboot

Johan Verrept Johan.Verrept at able.be
Mon Oct 5 09:55:32 EDT 2009


Hi,

    when playing with the RA at a certain point the stonith failed (it
didn't find the host in gethosts) and I rebooted the other node
manually. The result was a whole bunch of messages in my logs:

15:53:10 SYSLOG warning heartbeat [2748]: WARN: Gmain_timeout_dispatch:
Dispatch function for retransmit request was delayed 2750 ms (> 1000 ms)
before being called (GSource: 0x959e298)
15:53:10 SYSLOG info heartbeat [2748]: info: Gmain_timeout_dispatch:
started at 429631770 should have started at 429631495
15:53:10 SYSLOG warning heartbeat [2748]: WARN: Gmain_timeout_dispatch:
Dispatch function for retransmit request took too long to execute: 20 ms
(> 10 ms) (GSource: 0x959e298)
15:53:10 SYSLOG warning heartbeat [2748]: WARN: Gmain_timeout_dispatch:
Dispatch function for retransmit request was delayed 2740 ms (> 1000 ms)
before being called (GSource: 0x959e300)
15:53:10 SYSLOG info heartbeat [2748]: info: Gmain_timeout_dispatch:
started at 429631772 should have started at 429631498
15:53:10 SYSLOG warning heartbeat [2748]: WARN: Gmain_timeout_dispatch:
Dispatch function for retransmit request took too long to execute: 30 ms
(> 10 ms) (GSource: 0x959e300)
15:53:10 SYSLOG warning heartbeat [2748]: WARN: Gmain_timeout_dispatch:
Dispatch function for retransmit request was delayed 2750 ms (> 1000 ms)
before being called (GSource: 0x959e368)
15:53:10 SYSLOG info heartbeat [2748]: info: Gmain_timeout_dispatch:
started at 429631775 should have started at 429631500
15:53:10 SYSLOG warning heartbeat [2748]: WARN: Gmain_timeout_dispatch:
Dispatch function for retransmit request took too long to execute: 30 ms
(> 10 ms) (GSource: 0x959e368)
15:53:10 SYSLOG warning heartbeat [2748]: WARN: Gmain_timeout_dispatch:
Dispatch function for retransmit request was delayed 2750 ms (> 1000 ms)
before being called (GSource: 0x959e3d0)
15:53:10 SYSLOG info heartbeat [2748]: info: Gmain_timeout_dispatch:
started at 429631778 should have started at 429631503
15:53:10 SYSLOG warning heartbeat [2748]: WARN: Gmain_timeout_dispatch:
Dispatch function for retransmit request took too long to execute: 20 ms
(> 10 ms) (GSource: 0x959e3d0)


with the rebooted node reporting:

15:53:10 SYSLOG warning heartbeat [2721]: WARN: Rexmit of seq 251
requested. 131 is max.
15:53:10 SYSLOG warning heartbeat [2721]: WARN: Rexmit of seq 242
requested. 131 is max.
15:53:10 SYSLOG warning heartbeat [2721]: WARN: Rexmit of seq 252
requested. 131 is max.
15:53:10 SYSLOG warning heartbeat [2721]: WARN: Rexmit of seq 251
requested. 131 is max.
15:53:10 SYSLOG warning heartbeat [2721]: WARN: Rexmit of seq 314
requested. 131 is max.
15:53:10 SYSLOG warning heartbeat [2721]: WARN: Rexmit of seq 252
requested. 131 is max.

I got about a 100 of these per second.

What happened? How do I clean up something like this without rebooting
my cluster? 

	J.





More information about the Pacemaker mailing list