[Pacemaker] Frustrating fun with Pacemaker / CentOS / Apache

Tue Feb 16 18:04:46 EST 2010

On Wed, Feb 17, 2010 at 12:21 AM, Paul Graydon <paul at ehawaii.gov> wrote:

>  On 2/16/2010 10:48 AM, Andrew Beekhof wrote:
>
> The first error doesn't concern me particularly, it's a known Apache bug
>
> relating to the proxy module that doesn't actually break anything.  It's the
> binding errors that are bothering me and presumably what is stopping
> pacemaker from starting the service successfully.  Whats really odd about
> that error is I can run "/etc/init.d/httpd start" quite happily myself and
> it works.  There is absolutely nothing sitting listening on port 80 at all
> for it to struggle with.  Occasionally it seems to start it but I've no idea
> why it will succeed then when it fails in the large majority of the time.
>  Really wild stab in the dark, but is there a chance pacemaker is attempting
> to start the httpd process multiple times?
>
>
>  Unlikely, usually its caused by LSB services being told to start at boot time.
>
>
>
> That was one of the earliest thoughts I had, sorry I meant to put this in
> my first message:
>
> # chkconfig --list httpd
> httpd           0:off   1:off   2:off   3:off   4:off   5:off   6:off
>
>  I suffered from the same problem as you do. It is always highly
recommended to use OCF modules written specifically for your service rather
than LSB. Aside of your stack (openAIS or Heartbeat), errors will pop up
when stopping httpd if you are using LSB..

Example configuration:
# crm configure primitive WebServer ocf:heartbeat:apache params
configfile=/etc/httpd/conf/httpd.conf statusurl=
http://127.0.0.1/server-status op monitor interval=30s

For the above example to work correctly, FIRST you have to do some editing
in httpd.conf:

1- Make your listen directive tied to the localhost (for testing):
Listen 127.0.0.1:80

2- Enable viewing server status for localhost:
<Location /server-status>
    SetHandler server-status
    Order deny,allow
    Deny from all
    Allow from 127.0.0.1
</Location>

3- Enable extended status:
ExtendedStatus On

Other than this, pacemaker will try to request a page from your server but
never finds it, so it considers the server not responding. It tries to kill
it, and you end up with some zombie processes.

One more thing, just to minimize the hassle, try testing with:
- the failover-IP and
- Apache Server
resources only enabled.. then later on enable other resources one by one and
test your configs.

Let me know if you encounter problems, I can send you my entire 2-node
configurations.

> I've been stepping through things as logically as I can to see if I can
> figure out why this failover is so inconsistent.  I know there has to be
> some logical thread somewhere underneath it all that I'm missing! :)
>
> Killed off every corosync, heartbeat and httpd process I could find.
> Started up corosync and watched what happened, had to do "resource cleanup
> web-cluster" multiple times to manage to get failover-apache to start up
> successfully.  As far as I can figure out cleanup web-cluster should just be
> getting it to repeat starting the httpd process exactly the same way each
> time so there should be no difference between one cleanup and the next, but
> somehow there is!
> So far I've definitely established in some circumstances corosync/pacemaker
> is successfully starting apache, but is deciding it's failed somehow, but
> leaving the httpd process running.  Looks like it's possibly running
> "/etc/init.d/httpd stop" but that doesn't seem to clear off the running
> httpd instance.
>
> I'll keep plugging away at this, will add in a marker into the logs so I've
> got a clear section to pass on.  Got to be something weird interfering
> somewhere.  I'll reply back with details as soon as I can.
>
>
>  After a while trying to restart the resource group starts throwing up:
> "Error performing operation: Required data for this CIB API call not found"
> with no obvious way to clear that message (nor documentation to that effect
> that I can find?)
>
>
>  Thats not good, can you show us the logs for some context?
>
>
>  I'll try and get it to crop up again and run it.
>
>
>  however pacemaker isn't migrating the IP address until after it tries to
> start apache.  IP address migration happens successfully every single time,
> never a hassle there.
>
> The documentation does seem to make a large number of assumptions about what
> users do or don't know about pacemaker style clustering, and it's been far
> from a simple process to implement what should be a straightforward 2 node
> failover.
>
>
>  Did you try the "cluster- from scratch" doc?
>
>
>
> Sure, it's the first time things really started to make any form of sense,
> but I'd already struggled through all the other starting and installation
> documents by the time I'd reached that one, which do things a lot
> differently.
> Maybe I'm just being fussy, but it would be awesome if there was just *one
> * from scratch / starter guide, which really isn't the case.  I (naively?)
> followed the wiki through in the order it pushes you:
>
> http://clusterlabs.org/wiki/Main_Page logically leads you to ->
> http://clusterlabs.org/wiki/Get_Pacemaker  which then leads you on to ->
> http://clusterlabs.org/wiki/Install and from there to ->
> http://clusterlabs.org/wiki/Initial_Configuration and finally I ended up
> at http://clusterlabs.org/wiki/Example_configurations
>
> I never even came anywhere close to seeing the documentation list until I'd
> got the cluster half set up, let alone see a link to "cluster from scratch"
> :)  It would be good to put that document up front and center in Install or
> similar to people can see it, or even overhaul the lot with something along
> the same lines but as distro-agnostic as possible?
> I've never seen anyone complain about being too molly-coddled by
> documentation :)
>
> Frankly I'm slightly confused what I've got set up.  "Stack: openais".
> Really?  Did that get installed by corosync and I didn't notice?  Is
> corosync openais?  The FAQ lumps them together so I presume it is, but I
> haven't installed any openais package like was mentioned in the "cluster
> from scratch" doc.  "rpm -qa | grep -i ais" comes up with zip so I'm pretty
> sure it hasn't come in as a separate dependency by surprise!
>
> I'm sorry, this probably comes across fairly harshly and it's not my
> intention, but after a week of grappling with something that should be so
> straightforward, keeping on hitting inconsistencies and differences in
> approach in the different pages without any explanation why, what the
> benefits of each method are etc. etc. just leaves me irritable!  Maybe it's
> stubbornness but I know pacemaker is used in major environments and I'm
> confident it's exactly what we need for our set up.
>
> Whilst I remember one glaring inconsistency between man pages,
> documentation etc. is the bind address in corosync.  Some places say use the
> network address, i.e. end it in .0, others seem to be talking about setting
> that to be the IP address of the server it's on.  Both seem to work, but
> I've no idea what it should be and what the implications of it being set
> wrong are.   I'm inclined to trust "man corosync.conf" which tells you to
> use the .0 network address, over the documentation and examples that don't!
>
> --
> Paul Graydon
> Senior Systems Administrator
> Hawaii Information Consortium
> Internet Portal Partner with the Aloha state
> 808-695-4619 office
> 808-695-4618 faxpaul at ehawaii.gov
>
> *********************************************
> CONFIDENTIALITY NOTICE:
> This email and any attachments are confidential.  If you
> are not the intended recipient, you do not have permission
> to disclose, copy, distribute, or open any attachments.  If
> you have received this email in error, please notify us
> immediately by returning it to the sender and delete this
> copy from your system.
>
> Thank you.
> Hawaii Information Consortium, LLC
> **********************************************
>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>

-- 
All the best,
Angie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20100217/0f390318/attachment-0001.htm>