[Pacemaker] Lustre and Multiple Mount Protection
Dejan Muhamedagic
dejanmm at fastmail.fm
Mon Jan 4 13:20:35 UTC 2010
Hi,
On Thu, Dec 31, 2009 at 05:15:47PM +0100, Bernd Schubert wrote:
> Hello Dejan,
>
> On Wednesday 30 December 2009, Dejan Muhamedagic wrote:
> > Hi,
> >
> > On Wed, Dec 30, 2009 at 01:31:27PM +0100, Bernd Schubert wrote:
> > > Hello Dejan,
> > >
> > > On Thursday 24 December 2009, Dejan Muhamedagic wrote:
> > >
> > > No, without Multiple Mount Protection (MMP) the start action would *not*
> > > fail
> >
> > Never tried it, but that seems to be unexpected.
>
> Why? You have a shared block device and the fail-over node knows nothing about
> the other node that has the filesystem mounted. So without any protection it
> would happily mount the device. Only exception are filesystems that had been
> designed from the very beginning for shared block device - OCFS2 and GFS.
All the time I thought that fsck was running on the same node
where the RA tries to mount the file system.
> > > on the fail-over node, so it would be possible to get data corruption.
> > > Lustre internally uses a modified ext3/ext4 and neither ext3 nor ext4
> > > would protect you against that. That is why Sun wrote the MMP
> > > extension...
> > >
> > > > file a bugzilla if the RA does something unexpected.
> > >
> > > The Filesystem agent behaves correctly, just Lustre must not claim the
> > > device is umounted although it is not. One of these bugs will be fixed in
> > > the next Lustre release and another one I still need to analyze.
> > > That is why one should use a specific agent for Lustre, which does
> > > specific Lustre checks if the filesystem is really unmounted.
> >
> > I don't know what would those checks look like, but perhaps it's
> > still better to build them into the existing Filesystem. There is
> > already support for many different file systems and, iirc, some
> > of that code is quite involved.
>
> The checks are separate functions and those functions could easily check for
> Lustre. However, the existing Filesystem agent is already terribly complex,
> since it tries to work for all filesystems.
Most of the code is not about supporting various filesystems
(apart from ocfs2).
> I prefer simple code that everyone
> easily understands, even if it sometimes adds duplicate code.
I'm all for simple code, but in this case there's so much to deal
with which is about filesystems in general. If that part has to
be duplicated in the new RA, and most probably that will be the
case, then I'd be against that. Another option could be to create
modules for more involved filesystems.
Cheers,
Dejan
> Cheers,
> Bernd
>
More information about the Pacemaker
mailing list