[Pacemaker] Frustrating fun with Pacemaker / CentOS / Apache
Paul Graydon
paul at ehawaii.gov
Tue Feb 16 17:21:58 EST 2010
On 2/16/2010 10:48 AM, Andrew Beekhof wrote:
> The first error doesn't concern me particularly, it's a known Apache bug
>> relating to the proxy module that doesn't actually break anything. It's the
>> binding errors that are bothering me and presumably what is stopping
>> pacemaker from starting the service successfully. Whats really odd about
>> that error is I can run "/etc/init.d/httpd start" quite happily myself and
>> it works. There is absolutely nothing sitting listening on port 80 at all
>> for it to struggle with. Occasionally it seems to start it but I've no idea
>> why it will succeed then when it fails in the large majority of the time.
>> Really wild stab in the dark, but is there a chance pacemaker is attempting
>> to start the httpd process multiple times?
>>
> Unlikely, usually its caused by LSB services being told to start at boot time.
>
That was one of the earliest thoughts I had, sorry I meant to put this
in my first message:
# chkconfig --list httpd
httpd 0:off 1:off 2:off 3:off 4:off 5:off 6:off
I've been stepping through things as logically as I can to see if I can
figure out why this failover is so inconsistent. I know there has to be
some logical thread somewhere underneath it all that I'm missing! :)
Killed off every corosync, heartbeat and httpd process I could find.
Started up corosync and watched what happened, had to do "resource
cleanup web-cluster" multiple times to manage to get failover-apache to
start up successfully. As far as I can figure out cleanup web-cluster
should just be getting it to repeat starting the httpd process exactly
the same way each time so there should be no difference between one
cleanup and the next, but somehow there is!
So far I've definitely established in some circumstances
corosync/pacemaker is successfully starting apache, but is deciding it's
failed somehow, but leaving the httpd process running. Looks like it's
possibly running "/etc/init.d/httpd stop" but that doesn't seem to clear
off the running httpd instance.
I'll keep plugging away at this, will add in a marker into the logs so
I've got a clear section to pass on. Got to be something weird
interfering somewhere. I'll reply back with details as soon as I can.
>> After a while trying to restart the resource group starts throwing up:
>> "Error performing operation: Required data for this CIB API call not found"
>> with no obvious way to clear that message (nor documentation to that effect
>> that I can find?)
>>
> Thats not good, can you show us the logs for some context?
>
I'll try and get it to crop up again and run it.
>
>> however pacemaker isn't migrating the IP address until after it tries to
>> start apache. IP address migration happens successfully every single time,
>> never a hassle there.
>>
>> The documentation does seem to make a large number of assumptions about what
>> users do or don't know about pacemaker style clustering, and it's been far
>> from a simple process to implement what should be a straightforward 2 node
>> failover.
>>
> Did you try the "cluster- from scratch" doc?
>
Sure, it's the first time things really started to make any form of
sense, but I'd already struggled through all the other starting and
installation documents by the time I'd reached that one, which do things
a lot differently.
Maybe I'm just being fussy, but it would be awesome if there was just
/one/ from scratch / starter guide, which really isn't the case. I
(naively?) followed the wiki through in the order it pushes you:
http://clusterlabs.org/wiki/Main_Page logically leads you to ->
http://clusterlabs.org/wiki/Get_Pacemaker which then leads you on to ->
http://clusterlabs.org/wiki/Install and from there to ->
http://clusterlabs.org/wiki/Initial_Configuration and finally I ended up
at http://clusterlabs.org/wiki/Example_configurations
I never even came anywhere close to seeing the documentation list until
I'd got the cluster half set up, let alone see a link to "cluster from
scratch" :) It would be good to put that document up front and center
in Install or similar to people can see it, or even overhaul the lot
with something along the same lines but as distro-agnostic as possible?
I've never seen anyone complain about being too molly-coddled by
documentation :)
Frankly I'm slightly confused what I've got set up. "Stack: openais".
Really? Did that get installed by corosync and I didn't notice? Is
corosync openais? The FAQ lumps them together so I presume it is, but I
haven't installed any openais package like was mentioned in the "cluster
from scratch" doc. "rpm -qa | grep -i ais" comes up with zip so I'm
pretty sure it hasn't come in as a separate dependency by surprise!
I'm sorry, this probably comes across fairly harshly and it's not my
intention, but after a week of grappling with something that should be
so straightforward, keeping on hitting inconsistencies and differences
in approach in the different pages without any explanation why, what the
benefits of each method are etc. etc. just leaves me irritable! Maybe
it's stubbornness but I know pacemaker is used in major environments and
I'm confident it's exactly what we need for our set up.
Whilst I remember one glaring inconsistency between man pages,
documentation etc. is the bind address in corosync. Some places say use
the network address, i.e. end it in .0, others seem to be talking about
setting that to be the IP address of the server it's on. Both seem to
work, but I've no idea what it should be and what the implications of it
being set wrong are. I'm inclined to trust "man corosync.conf" which
tells you to use the .0 network address, over the documentation and
examples that don't!
--
Paul Graydon
Senior Systems Administrator
Hawaii Information Consortium
Internet Portal Partner with the Aloha state
808-695-4619 office
808-695-4618 fax
paul at ehawaii.gov
*********************************************
CONFIDENTIALITY NOTICE:
This email and any attachments are confidential. If you
are not the intended recipient, you do not have permission
to disclose, copy, distribute, or open any attachments. If
you have received this email in error, please notify us
immediately by returning it to the sender and delete this
copy from your system.
Thank you.
Hawaii Information Consortium, LLC
**********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20100216/cf90f728/attachment.htm>
More information about the Pacemaker
mailing list