[Pacemaker] Frustrating fun with Pacemaker / CentOS / Apache

Tue Feb 16 22:21:58 UTC 2010

On 2/16/2010 10:48 AM, Andrew Beekhof wrote:
> The first error doesn't concern me particularly, it's a known Apache bug
>> relating to the proxy module that doesn't actually break anything.  It's the
>> binding errors that are bothering me and presumably what is stopping
>> pacemaker from starting the service successfully.  Whats really odd about
>> that error is I can run "/etc/init.d/httpd start" quite happily myself and
>> it works.  There is absolutely nothing sitting listening on port 80 at all
>> for it to struggle with.  Occasionally it seems to start it but I've no idea
>> why it will succeed then when it fails in the large majority of the time.
>>   Really wild stab in the dark, but is there a chance pacemaker is attempting
>> to start the httpd process multiple times?
>>      
> Unlikely, usually its caused by LSB services being told to start at boot time.
>    

That was one of the earliest thoughts I had, sorry I meant to put this 
in my first message:

# chkconfig --list httpd
httpd           0:off   1:off   2:off   3:off   4:off   5:off   6:off

I've been stepping through things as logically as I can to see if I can 
figure out why this failover is so inconsistent.  I know there has to be 
some logical thread somewhere underneath it all that I'm missing! :)

Killed off every corosync, heartbeat and httpd process I could find.  
Started up corosync and watched what happened, had to do "resource 
cleanup web-cluster" multiple times to manage to get failover-apache to 
start up successfully.  As far as I can figure out cleanup web-cluster 
should just be getting it to repeat starting the httpd process exactly 
the same way each time so there should be no difference between one 
cleanup and the next, but somehow there is!
So far I've definitely established in some circumstances 
corosync/pacemaker is successfully starting apache, but is deciding it's 
failed somehow, but leaving the httpd process running.  Looks like it's 
possibly running "/etc/init.d/httpd stop" but that doesn't seem to clear 
off the running httpd instance.

I'll keep plugging away at this, will add in a marker into the logs so 
I've got a clear section to pass on.  Got to be something weird 
interfering somewhere.  I'll reply back with details as soon as I can.

>> After a while trying to restart the resource group starts throwing up:
>> "Error performing operation: Required data for this CIB API call not found"
>> with no obvious way to clear that message (nor documentation to that effect
>> that I can find?)
>>      
> Thats not good, can you show us the logs for some context?
>    
I'll try and get it to crop up again and run it.
>
>> however pacemaker isn't migrating the IP address until after it tries to
>> start apache.  IP address migration happens successfully every single time,
>> never a hassle there.
>>
>> The documentation does seem to make a large number of assumptions about what
>> users do or don't know about pacemaker style clustering, and it's been far
>> from a simple process to implement what should be a straightforward 2 node
>> failover.
>>      
> Did you try the "cluster- from scratch" doc?
>    

Sure, it's the first time things really started to make any form of 
sense, but I'd already struggled through all the other starting and 
installation documents by the time I'd reached that one, which do things 
a lot differently.
Maybe I'm just being fussy, but it would be awesome if there was just 
/one/ from scratch / starter guide, which really isn't the case.  I 
(naively?) followed the wiki through in the order it pushes you:

http://clusterlabs.org/wiki/Main_Page logically leads you to -> 
http://clusterlabs.org/wiki/Get_Pacemaker  which then leads you on to -> 
http://clusterlabs.org/wiki/Install and from there to -> 
http://clusterlabs.org/wiki/Initial_Configuration and finally I ended up 
at http://clusterlabs.org/wiki/Example_configurations

I never even came anywhere close to seeing the documentation list until 
I'd got the cluster half set up, let alone see a link to "cluster from 
scratch" :)  It would be good to put that document up front and center 
in Install or similar to people can see it, or even overhaul the lot 
with something along the same lines but as distro-agnostic as possible?
I've never seen anyone complain about being too molly-coddled by 
documentation :)

Frankly I'm slightly confused what I've got set up.  "Stack: openais".  
Really?  Did that get installed by corosync and I didn't notice?  Is 
corosync openais?  The FAQ lumps them together so I presume it is, but I 
haven't installed any openais package like was mentioned in the "cluster 
from scratch" doc.  "rpm -qa | grep -i ais" comes up with zip so I'm 
pretty sure it hasn't come in as a separate dependency by surprise!

I'm sorry, this probably comes across fairly harshly and it's not my 
intention, but after a week of grappling with something that should be 
so straightforward, keeping on hitting inconsistencies and differences 
in approach in the different pages without any explanation why, what the 
benefits of each method are etc. etc. just leaves me irritable!  Maybe 
it's stubbornness but I know pacemaker is used in major environments and 
I'm confident it's exactly what we need for our set up.

Whilst I remember one glaring inconsistency between man pages, 
documentation etc. is the bind address in corosync.  Some places say use 
the network address, i.e. end it in .0, others seem to be talking about 
setting that to be the IP address of the server it's on.  Both seem to 
work, but I've no idea what it should be and what the implications of it 
being set wrong are.   I'm inclined to trust "man corosync.conf" which 
tells you to use the .0 network address, over the documentation and 
examples that don't!

-- 
Paul Graydon
Senior Systems Administrator
Hawaii Information Consortium
Internet Portal Partner with the Aloha state
808-695-4619 office
808-695-4618 fax
paul at ehawaii.gov
*********************************************
CONFIDENTIALITY NOTICE:
This email and any attachments are confidential.  If you
are not the intended recipient, you do not have permission
to disclose, copy, distribute, or open any attachments.  If
you have received this email in error, please notify us
immediately by returning it to the sender and delete this
copy from your system.

Thank you.
Hawaii Information Consortium, LLC
**********************************************

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100216/cf90f728/attachment-0002.htm>