[Pacemaker] server lockup failures
Bernd Schubert
bs_lists at aakef.fastmail.fm
Wed Oct 28 12:44:42 UTC 2009
On Wednesday 28 October 2009, Andrew Beekhof wrote:
> On Wed, Oct 28, 2009 at 1:05 PM, Bernd Schubert
>
> <bs_lists at aakef.fastmail.fm> wrote:
> > Hello,
> >
> > I think there is a severe server failure pacemaker doesn't detect. Over
> > night a Lustre server failed in shrink_icache_memory() and probably it
> > had a lock on dcache_lock. Now this is a global filesystem lock and when
> > a filesystem fails while this is locked, any IO on this system just
> > hangs.
>
> And the FS in question was / so Pacemaker basically hung?
I couldn't login any more, but my guess is 'yes it hung'. But no, it was not
the root (/) FS. But if any FS crashes while it holds dcache_lock, any other
filesystem will hang as well. There is nothing we can do about that except of
rewriting the linux vfs ;) My question is just what can we do to get Pacemaker
fixed to stonith that node.
Cheers,
Bernd
--
Bernd Schubert
DataDirect Networks
More information about the Pacemaker
mailing list