[Pacemaker] lots of timeout/rexmit messages after failed stonith and manual reboot

Tue Oct 6 13:53:38 UTC 2009

Possibly the node is really busy and Heartbeat isnt getting enough CPU...
But I'm just guessing, I don't use Heartbeat much these days.

On Mon, Oct 5, 2009 at 3:55 PM, Johan Verrept <Johan.Verrept at able.be> wrote:
>
> Hi,
>
>    when playing with the RA at a certain point the stonith failed (it
> didn't find the host in gethosts) and I rebooted the other node
> manually. The result was a whole bunch of messages in my logs:
>
> 15:53:10 SYSLOG warning heartbeat [2748]: WARN: Gmain_timeout_dispatch:
> Dispatch function for retransmit request was delayed 2750 ms (> 1000 ms)
> before being called (GSource: 0x959e298)
> 15:53:10 SYSLOG info heartbeat [2748]: info: Gmain_timeout_dispatch:
> started at 429631770 should have started at 429631495
> 15:53:10 SYSLOG warning heartbeat [2748]: WARN: Gmain_timeout_dispatch:
> Dispatch function for retransmit request took too long to execute: 20 ms
> (> 10 ms) (GSource: 0x959e298)
> 15:53:10 SYSLOG warning heartbeat [2748]: WARN: Gmain_timeout_dispatch:
> Dispatch function for retransmit request was delayed 2740 ms (> 1000 ms)
> before being called (GSource: 0x959e300)
> 15:53:10 SYSLOG info heartbeat [2748]: info: Gmain_timeout_dispatch:
> started at 429631772 should have started at 429631498
> 15:53:10 SYSLOG warning heartbeat [2748]: WARN: Gmain_timeout_dispatch:
> Dispatch function for retransmit request took too long to execute: 30 ms
> (> 10 ms) (GSource: 0x959e300)
> 15:53:10 SYSLOG warning heartbeat [2748]: WARN: Gmain_timeout_dispatch:
> Dispatch function for retransmit request was delayed 2750 ms (> 1000 ms)
> before being called (GSource: 0x959e368)
> 15:53:10 SYSLOG info heartbeat [2748]: info: Gmain_timeout_dispatch:
> started at 429631775 should have started at 429631500
> 15:53:10 SYSLOG warning heartbeat [2748]: WARN: Gmain_timeout_dispatch:
> Dispatch function for retransmit request took too long to execute: 30 ms
> (> 10 ms) (GSource: 0x959e368)
> 15:53:10 SYSLOG warning heartbeat [2748]: WARN: Gmain_timeout_dispatch:
> Dispatch function for retransmit request was delayed 2750 ms (> 1000 ms)
> before being called (GSource: 0x959e3d0)
> 15:53:10 SYSLOG info heartbeat [2748]: info: Gmain_timeout_dispatch:
> started at 429631778 should have started at 429631503
> 15:53:10 SYSLOG warning heartbeat [2748]: WARN: Gmain_timeout_dispatch:
> Dispatch function for retransmit request took too long to execute: 20 ms
> (> 10 ms) (GSource: 0x959e3d0)
>
>
> with the rebooted node reporting:
>
> 15:53:10 SYSLOG warning heartbeat [2721]: WARN: Rexmit of seq 251
> requested. 131 is max.
> 15:53:10 SYSLOG warning heartbeat [2721]: WARN: Rexmit of seq 242
> requested. 131 is max.
> 15:53:10 SYSLOG warning heartbeat [2721]: WARN: Rexmit of seq 252
> requested. 131 is max.
> 15:53:10 SYSLOG warning heartbeat [2721]: WARN: Rexmit of seq 251
> requested. 131 is max.
> 15:53:10 SYSLOG warning heartbeat [2721]: WARN: Rexmit of seq 314
> requested. 131 is max.
> 15:53:10 SYSLOG warning heartbeat [2721]: WARN: Rexmit of seq 252
> requested. 131 is max.
>
> I got about a 100 of these per second.
>
> What happened? How do I clean up something like this without rebooting
> my cluster?
>
>        J.
>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>