[Pacemaker] TOTEM: Process pause detected? Leading to STONITH...
Vladislav Bogdanov
bubble at hoster-ok.com
Fri Aug 12 10:19:26 UTC 2011
...
>> I would really like someone that has these process pause problems to
>> test a patch I have posted to see if it rectifies the situation. Our
>> significant QE team at Red Hat doesn't see these problems and I can't
>> generate them in engineering. It is possible your device drivers are
>> taking spinlocks for extended periods or some other kernel problem is
>> occurring.
>>
>> If you feel up to the task of building your own corosync, try out this
>> patch:
>>
>> http://marc.info/?l=openais&m=130989380207300&w=2
I do not see any corosync pauses after applied it (right after it have
been posted). Although I had vacations for two weeks, all other time I
test cluster under really high CPU load (frankly speaking I lowered it a
lot because of optimizations) and did not catch any pause (yet). One
more thing I did is updated igb driver and returned its buffers to
original 256 (bearing in mind that I originally had pause problem after
I increased that buffers to 4096). Do not know if it has influence.
> I'd love to test this, but it'll take a few weeks.
> The machines are already productive and we don't have comparable test machines.
> I'm currently (acutally ;) having a few days off, and when I'm back at the office,
> I'll update the Corosync version to v1.4.1 (because of the retransmit list
> problem) -- does the patch cleanly apply to v1.4.1?
yes
Best,
Vladislav
More information about the Pacemaker
mailing list