[Pacemaker] server lockup failures

Andrew Beekhof andrew at beekhof.net
Thu Oct 29 03:58:13 EDT 2009


On Thu, Oct 29, 2009 at 12:51 AM, Bernd Schubert
<bernd.schubert at fastmail.fm> wrote:
> On Wednesday 28 October 2009, Andrew Beekhof wrote:
>> On Wed, Oct 28, 2009 at 2:44 PM, Bernd Schubert
>>
>> <bs_lists at aakef.fastmail.fm> wrote:
>> > On Wednesday 28 October 2009, Andrew Beekhof wrote:
>> >> On Wed, Oct 28, 2009 at 1:05 PM, Bernd Schubert
>> >>
>> >> <bs_lists at aakef.fastmail.fm> wrote:
>> >> > Hello,
>> >> >
>> >> > I think there is a severe server failure pacemaker doesn't detect.
>> >> > Over night a Lustre server failed in shrink_icache_memory() and
>> >> > probably it had a lock on dcache_lock. Now this is a global filesystem
>> >> > lock and when a filesystem fails while this is locked, any IO on this
>> >> > system just hangs.
>> >>
>> >> And the FS in question was / so Pacemaker basically hung?
>> >
>> > I couldn't login any more, but my guess is 'yes it hung'. But no, it was
>> > not the root (/) FS. But if any FS crashes while it holds dcache_lock,
>> > any other filesystem will hang as well.
>>
>> ooohhhhh
>>
>> > There is nothing we can do about that except of
>> > rewriting the linux vfs ;) My question is just what can we do to get
>> > Pacemaker fixed to stonith that node.
>>
>> Hmmm.  Was this an openais or heartbeat based cluster?
>> If all the processes hung I'd have expected it to drop out of the
>> membership list and get shot by the new DC...
>
> Heartbeat based, I still didn't have the time to look into openais.

I guess heartbeat wasn't hung then... otherwise it would have stopped
sending "i'm here" packets (and dropped out of the membership list).

> But I can
> test on my virtual machines during the next days. Since I have a good idea how
> to lock a node using dcache_lock, it also should be easily reproducible for me
> :)

That would be handy.
I'd be interested to know what each Pacemaker process was up to at the time.

Oh, were you logging to a file or syslog?  That might have some impact.




More information about the Pacemaker mailing list