[Pacemaker] OCFS2 fencing regulated by Pacemaker?

Darren.Mansell at opengi.co.uk Darren.Mansell at opengi.co.uk
Thu Feb 11 10:35:53 EST 2010


Once again, I apologise for the top-posting. I wish I could use a real
mail client but nothing apart from Outlook works properly with Exchange
:(.

Anyway - Yes We've had a really hard time with our 3-node SAN based
cluster. We implemented OCFS2 on top of a shared disk using a o2cb and
dlm clones. It seemed to work in the test environment but then when live
it's been a real nightmare. It seems if you even breathe on it it will
start a shootout, but as it's now a production system I can't do much
about it.

Some mornings we arrive in and see that all 3 servers got STONITHd
overnight but we can't see any reason why. We would disable STONITH to
see what state the cluster gets in before fencing but the worst that
happens is we get 10 mins of service unavailability, which is a lot
better than 12 hours.

To complicate matters further, the apps we are using on the cluster /
shared storage are Tomcat based and allegedly don't work too well with
other file locking mechanisms. This is developer hearsay though, I can't
substantiate it. The only leads I have are that the dlm seems to lose
quorum and sets the fencing ops off. The logs never seem to tie up
though, so it's very difficult to fault find.

With all this in mind, I haven't been able to file any bugs or make
support requests to Novell due to not knowing exactly what is causing
the issue. At the moment, if we leave well alone it performs well. If I
was to have to reboot a node, I would expect the others get to be fenced
afterwards.

Thanks for the help
Darren

-----Original Message-----
From: Dejan Muhamedagic [mailto:dejanmm at fastmail.fm] 
Sent: 11 February 2010 14:12
To: pacemaker at oss.clusterlabs.org; mail at sandervanvugt.nl
Subject: Re: [Pacemaker] OCFS2 fencing regulated by Pacemaker?

Hi,

On Thu, Feb 11, 2010 at 01:16:20PM +0100, Sander van Vugt wrote:
> On Thu, 2010-02-11 at 13:03 +0100, Dejan Muhamedagic wrote:
> > Hi,
> > 
> > On Thu, Feb 11, 2010 at 10:11:33AM -0000,
Darren.Mansell at opengi.co.uk wrote:
> > > Hello.
> > > 
> > > Yes, we get the same kind of thing. SLES11 HAE 64-bit.
> > 
> > Is there a bugzilla for this?
> > 
> Nope. Before filing a bug, I'd first like to be as sure as possible
that
> it really is a bug and not a problem behind the keyboard. 

If you have strong doubts, closing a bugzilla is easy :) BTW,
this was meant for Darren actually, as it seemed like he was
having really hard time dealing with his cluster.

> BTW: I don't see where on bugzilla.novell.com I should enter a bug for
> something that is in the SLES HAE (and the Bugzilla FAQ didn't help
> me). 

Use "SUSE Linux Enterprise High Availability Extension" for the
product line.

Thanks,

Dejan

> Thanks,
> Sander
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker

_______________________________________________
Pacemaker mailing list
Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker




More information about the Pacemaker mailing list