[Pacemaker] Problems with corosync while forking processes during node startup.
Lars Marowsky-Bree
lmb at suse.com
Thu Feb 28 18:23:00 UTC 2013
On 2013-02-25T11:42:40, Andrew Beekhof <andrew at beekhof.net> wrote:
> > Or we fix the corosync problem with forking from a multi-threaded
> > program. ;-)
> That was essentially my point, Steve and I have already tried - for
> quite a long time too.
> I know some people think I just like changing things for the fun of
> it, but this is actually not true.
Hence the ";-)".
(I have opinions on threads in C.)
> > We've never really had customer report problems with this
> > either, but I'm not sure why that is, honestly. I know the problem
> > theoretically exists, but it has never hit us.
> I also never hit this on openSUSE based distros either, or if I did it
> was extremely rare.
> But on Fedora it was so regular as to make the cluster unusable.
>
> I don't know what makes one of them so special. Maybe its just some
> compile flags.
I think this may be the crash hidden by setting "timestamp: off" in
corosync.conf. If that's turned on, corosync doesn't like me much at
all.
Don't get me wrong. I'd love to migrate forward and drop the plugin code
if there was an on-wire compatible way of doing so. I can't authorize
breaking rolling upgrades.
(Supporting both, and switching from one to the other live when the last
node has both online, was an option I briefly considered. But then I
woke up screaming at night and decided it perhaps wasn't a good idea.)
Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
More information about the Pacemaker
mailing list