[Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

renayama19661014 at ybb.ne.jp renayama19661014 at ybb.ne.jp
Mon May 20 00:31:10 UTC 2013


Hi Vladislav,

> For just this, patch is unneeded. It only plays when you have that
> pengine files symlinked from stable storage to tmpfs, Without patch,
> pengine would try to rewrite file where symlink points it - directly on
> a stable storage. With that patch, pengine will remove symlink (and just
> symlink) and will open new file on tmpfs for writing. Thus, it will not
> block if stable storage is inaccessible (for my case because of
> connectivity problems, for yours - because of backing storage outage).
> 
> If you decide to go with tmpfs *and* use the same synchronization method
> as I do, then you'd need to bake the similar patch for 1.0, just add
> unlink() before pengine writes its data (I suspect that code to differ
> between 1.0 and 1.1.10, even in 1.1.6 it was different to current master).

Thank you for detailed explanation.
At first I confirm movement only in tmpfs.

Many Thanks!
Hideo Yamauchi.

--- On Fri, 2013/5/17, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:

> Hi Hideo-san,
> 
> 17.05.2013 10:29, renayama19661014 at ybb.ne.jp wrote:
> > Hi Vladislav,
> > 
> > Thank you for advice.
> > 
> > I try the patch which you showed.
> > 
> > We use Pacemaker1.0, but apply a patch there because there is a similar code.
> > 
> > If there is a question by setting, I ask you a question by an email.
> >  * At first I only use tmpfs, and I intend to test it.
> 
> For just this, patch is unneeded. It only plays when you have that
> pengine files symlinked from stable storage to tmpfs, Without patch,
> pengine would try to rewrite file where symlink points it - directly on
> a stable storage. With that patch, pengine will remove symlink (and just
> symlink) and will open new file on tmpfs for writing. Thus, it will not
> block if stable storage is inaccessible (for my case because of
> connectivity problems, for yours - because of backing storage outage).
> 
> If you decide to go with tmpfs *and* use the same synchronization method
> as I do, then you'd need to bake the similar patch for 1.0, just add
> unlink() before pengine writes its data (I suspect that code to differ
> between 1.0 and 1.1.10, even in 1.1.6 it was different to current master).
> 
> > 
> >> P.S. Andrew, is this patch ok to apply?
> > 
> > To Andrew...
> >   Does the patch in conjunction with the write_xml processing in your repository have to apply it before the confirmation of the patch of Vladislav?
> > 
> > Many Thanks!
> > Hideo Yamauchi.
> > 
> > 
> > 
> > 
> > --- On Fri, 2013/5/17, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
> > 
> >> Hi Hideo-san,
> >>
> >> You may try the following patch (with trick below)
> >>
> >> From 2c4418d11c491658e33c149f63e6a2f2316ef310 Mon Sep 17 00:00:00 2001
> >> From: Vladislav Bogdanov <bubble at hoster-ok.com>
> >> Date: Fri, 17 May 2013 05:58:34 +0000
> >> Subject: [PATCH] Feature: PE: Unlink pengine output files before writing.
> >>  This should help guys who store them to tmpfs and then copy to a stable storage
> >>  on (inotify) events with symlink creation in the original place to survive when
> >>  stable storage is not accessible.
> >>
> >> ---
> >>  pengine/pengine.c |    1 +
> >>  1 files changed, 1 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/pengine/pengine.c b/pengine/pengine.c
> >> index c7e1c68..99a81c6 100644
> >> --- a/pengine/pengine.c
> >> +++ b/pengine/pengine.c
> >> @@ -184,6 +184,7 @@ process_pe_message(xmlNode * msg, xmlNode * xml_data, crm_client_t * sender)
> >>          }
> >>  
> >>          if (is_repoke == FALSE && series_wrap != 0) {
> >> +            unlink(filename);
> >>              write_xml_file(xml_data, filename, HAVE_BZLIB_H);
> >>              write_last_sequence(PE_STATE_DIR, series[series_id].name, seq + 1, series_wrap);
> >>          } else {
> >> -- 
> >> 1.7.1
> >>
> >> You just need to ensure that /var/lib/pacemaker is on tmpfs. Then you may watch on directories there
> >> with inotify or so and take actions to move (copy) files to a stable storage (RAM is not of infinite size).
> >> In my case that is CIFS. And I use lsyncd to synchronize that directories. If you are interested, I can
> >> provide you with relevant lsyncd configuration. Frankly speaking, three is no big need to create symlinks
> >> in tmpfs to stable storage, as pacemaker does not use existing pengine files (except sequences). That sequence
> >> files and cib.xml are the only exceptions which you may want to exist in two places (and you may want to copy
> >> them from stable storage to tmpfs before pacemaker start), and you can just move everything else away from
> >> tmpfs once it is written. In this case you do not need this patch.
> >>
> >> Best,
> >> Vladislav
> >>
> >> P.S. Andrew, is this patch ok to apply?
> >>
> >> 17.05.2013 03:27, renayama19661014 at ybb.ne.jp wrote:
> >>> Hi Andrew,
> >>> Hi Vladislav,
> >>>
> >>> I try whether this correction is effective for this problem.
> >>>   * https://github.com/beekhof/pacemaker/commit/eb6264bf2db395779e65dadf1c626e050a388c59
> >>>
> >>> Best Regards,
> >>> Hideo Yamauchi.
> >>>
> >>> --- On Thu, 2013/5/16, Andrew Beekhof <andrew at beekhof.net> wrote:
> >>>
> >>>>
> >>>> On 16/05/2013, at 3:49 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
> >>>>
> >>>>> 16.05.2013 02:46, Andrew Beekhof wrote:
> >>>>>>
> >>>>>> On 15/05/2013, at 6:44 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
> >>>>>>
> >>>>>>> 15.05.2013 11:18, Andrew Beekhof wrote:
> >>>>>>>>
> >>>>>>>> On 15/05/2013, at 5:31 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
> >>>>>>>>
> >>>>>>>>> 15.05.2013 10:25, Andrew Beekhof wrote:
> >>>>>>>>>>
> >>>>>>>>>> On 15/05/2013, at 3:50 PM, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> 15.05.2013 08:23, Andrew Beekhof wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 15/05/2013, at 3:11 PM, renayama19661014 at ybb.ne.jp wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Andrew,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thank you for comments.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The guest located it to the shared disk.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> What is on the shared disk?  The whole OS or app-specific data (i.e. nothing pacemaker needs directly)?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Shared disk has all the OS and the all data.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Oh. I can imagine that being problematic.
> >>>>>>>>>>>> Pacemaker really isn't designed to function without disk access.
> >>>>>>>>>>>>
> >>>>>>>>>>>> You might be able to get away with it if you turn off saving PE files to disk though.
> >>>>>>>>>>>
> >>>>>>>>>>> I store CIB and PE files to tmpfs, and sync them to remote storage
> >>>>>>>>>>> (CIFS) with lsyncd level 1 config (I may share it on request). It copies
> >>>>>>>>>>> critical data like cib.xml, and moves everything else, symlinking it to
> >>>>>>>>>>> original place. The same technique may apply here, but with local fs
> >>>>>>>>>>> instead of cifs.
> >>>>>>>>>>>
> >>>>>>>>>>> Btw, the following patch is needed for that, otherwise pacemaker
> >>>>>>>>>>> overwrites remote files instead of creating new ones on tmpfs:
> >>>>>>>>>>>
> >>>>>>>>>>> --- a/lib/common/xml.c  2011-02-11 11:42:37.000000000 +0100
> >>>>>>>>>>> +++ b/lib/common/xml.c  2011-02-24 15:07:48.541870829 +0100
> >>>>>>>>>>> @@ -529,6 +529,8 @@ write_file(const char *string, const char *filename)
> >>>>>>>>>>>        return -1;
> >>>>>>>>>>>    }
> >>>>>>>>>>>
> >>>>>>>>>>> +    unlink(filename);
> >>>>>>>>>>
> >>>>>>>>>> Seems like it should be safe to include for normal operation.
> >>>>>>>>>
> >>>>>>>>> Exactly.
> >>>>>>>>
> >>>>>>>> Small flaw in that logic... write_file() is not used anywhere.
> >>>>>>>
> >>>>>>> Heh, thanks for spotting this.
> >>>>>>>
> >>>>>>> I recall write_file() was used for pengine, but some other function for
> >>>>>>> CIB. You probably optimized that but forgot to remove unused function,
> >>>>>>> that's why I was sure patch is still valid. And I did tests (CIFS
> >>>>>>> storage outage simulation) only after initial patch, but not last years,
> >>>>>>> that's why I didn't notice the regression - storage uses pacemaker too ;) .
> >>>>>>>
> >>>>>>> This should go to write_xml_file() (And probably to other places just
> >>>>>>> before fopen(..., "w"), f.e. series).
> >>>>>>
> >>>>>> I've consolidated the code, however adding the unlink() would break things for anyone intentionally symlinking cib.xml from somewhere else (like a git repo).
> >>>>>> So I'm not so sure I should make the unlink() change :(
> >>>>>
> >>>>> Agree.
> >>>>> I originally made it specific to pengine files.
> >>>>> What do you prefer, simple wrapper in xml.c (f.e.
> >>>>> unlink_and_write_xml_file()) or just add unlink() call to pengine before
> >>>>> it calls write_xml_file()?
> >>>>
> >>>> The last one :)
> >>>> _______________________________________________
> >>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>
> >>>> Project Home: http://www.clusterlabs.org
> >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>
> >>> _______________________________________________
> >>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >>>
> >>
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >>
> > 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> > 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 




More information about the Pacemaker mailing list