[Pacemaker] Error starting Apache on 2 nodes cluster

Thu Nov 19 00:39:22 UTC 2009

Angie,

I can't tell exactly what's you've provided, can you post your CRM configuration (the output of 'crm configure show')? While you're at it, also provide ' crm_verify -LV' and 'crm_mon -fo1'.

This looks suspicious though:

Nov 19 01:25:08 test2 crmd: [24251]: info: process_lrm_event: LRM operation WebServer_monitor_60000 (call=483, rc=-2, cib-update=0, confirmed=true) Cancelled unknown exec error

Personally I'd start with the OCF RA and leave LSB:httpd alone. From the above error message, something inside lssb:httpd is returning -2, which is not a supported return code.

Depending on how confident you are with shell scripts, you might find it helpful to eliminate Pacemaker from the equation and call the Resource Agent script yourself to debug problems manually, like so...

Disable your resource so Pacemaker doesn't interfere:

crm_resource -r WebSite -m -p target-role -v stopped

Then move into the RA directory and set a necessary environment variable:

cd =/usr/lib/ocf/resource.d/heartbeat
export OCF_ROOT=/usr/lib/ocf

Start testing the apache RA, setting the only mandatory environment variable for ocf:heartbeat:apache :

export OCF_RESKEY_configfile=/path/to/your/main/apache/config
./apache start
echo $?

That should echo "0" for success. Judging by your logs, you can start Apache but the monitor is failing:

./apache monitor
echo $?

If that doesn't echo "0", you might get a helpful error message explaining what's wrong. You might have to read through the apache script itself to figure out why it's failing. Finally test the 'stop' operation:

./apache stop
echo $?

Should echo "0" as well. If this all works for you, but the resource in Pacemaker is still not working, then it's probably something in your CIB (like a bad attribute), as you've just done pretty much exactly what Pacemaker will do.

Let us know how you go.

Luke Bigum
Systems Administrator
 (p) 1300 661 668
 (f)  1300 661 540
(e)  lbigum at iseek.com.au<mailto:lbigum at iseek.com.au>
http://www.iseek.com.au<http://www.iseek.com.au/>
Level 1, 100 Ipswich Road Woolloongabba QLD 4102

[cid:image001.jpg at 01CA6901.D25D3CD0]

This e-mail and any files transmitted with it may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorised to receive for the recipient), please contact the sender by reply e-mail and delete all copies of this message.

From: Angie T. Muhammad [mailto:angie.tawfik at gmail.com]
Sent: Thursday 19 November 2009 9:57 AM
To: pacemaker at oss.clusterlabs.org
Subject: [Pacemaker] Error starting Apache on 2 nodes cluster

Hello
I'm a pacemaker and openais beginner.
I followed the document 'cluster from scratch' and I successfully managed to create and monitor a 'ClusterIP' and 'LoadBalancer' resources.

But, Whenever I try to start Apache:
# crm configure primitive WebSite ocf:heartbeat:apache params configfile=/etc/httpd/conf/httpd.conf op monitor interval=1min

whether using (ocf:heartbeat:apache) or (lsb::httpd) I get the following errors when watching crm_mon:

============
Last updated: Thu Nov 19 01:38:33 2009
Stack: openais
Current DC: test1.localdomain - partition with quorum
Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7
2 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ test1.localdomain test2.localdomain ]

ClusterIP       (ocf::heartbeat:IPaddr2):       Started test1.localdomain
LoadBalancer    (lsb:haproxy):  Started test1.localdomain

Failed actions:
    WebSite_start_0 (node=test1.localdomain, call=9, rc=1, status=complete): unknown error
    WebSite_start_0 (node=test2.localdomain, call=5, rc=1, status=complete): unknown error
/************************************************************************************************************/

Knowing that I am using:
CentOS 5.4..
openais-0.80.5-15.1
pacemaker-1.0.5-4.1
# chkconfig httpd off
server-status is not enabled in my httpd.conf ...

I always check apache processes before configuring my crm using:

# ps aux | grep httpd
/* to make sure there are no zombie processes */

# /etc/init.d/httpd status
/* to gurantee it's stopped and nothing is locked */

Last but not least I am ataching the last 100 lines of my /var/log/messages of the 2nd node to help you help me.
I have been on this loop for four days now and I have no idea why the crm can't start apache though when manually starting it, everything runs smoothly!!!

Thank you in advance
--
All the best,
Angie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20091119/5bfe83fb/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 3245 bytes
Desc: image001.jpg
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20091119/5bfe83fb/attachment-0004.jpg>