[Pacemaker] Debian Unstable (sid) Problem with Pacemaker/Corosync Apache HA-Load Balanced cluster

Miltiadis Koutsokeras m.koutsokeras at biovista.com
Sun Oct 2 15:19:55 UTC 2011


Hi Nick,

Here is the output of the "crm configure show":

node node-0
node node-1
primitive Apache2 ocf:heartbeat:apache \
     params configfile="/etc/apache2/apache2.conf" \
     op monitor interval="1min" \
     meta target-role="Started"
primitive ClusterIP ocf:heartbeat:IPaddr2 \
     params ip="192.168.0.100" cidr_netmask="32" \
     op monitor interval="30s" \
     meta target-role="Started"
colocation Apache2-ClusterIP-colocation inf: Apache2 ClusterIP
order Apache2-after-ClusterIP inf: ClusterIP Apache2
property $id="cib-bootstrap-options" \
     dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
     cluster-infrastructure="openais" \
     expected-quorum-votes="2" \
     stonith-enabled="false" \
     no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
     resource-stickiness="100"

If you wish anything else, please feel free to ask.

On 10/01/2011 02:50 PM, Nick Khamis wrote:
> Can you post your crm please.
>
> Nick.
>
> On Sat, Oct 1, 2011 at 6:32 AM, Miltiadis Koutsokeras
> <m.koutsokeras at biovista.com>  wrote:
>> Hello everyone,
>>
>> My goal is to build a Round Robin balanced, HA Apache Web server cluster.
>> The
>> main purpose is to balance HTTP requests evenly between the nodes and have
>> one
>> machine pickup all requests if and ONLY if the others are not available at
>> the
>> moment. The cluster will be accessible only from internal network. Any
>> advise on
>> this will be highly appreciated (resources to use, services to install and
>> configure etc.). After walking through ClusterLabs documentation, I think
>> the
>> proper deployment is an active/active Pacemaker managed cluster.
>>
>> I'm trying to follow the "Cluster from scratch" article in order to build a
>> 2
>> node cluster on an experimental setup:
>>
>> 2 GNU/Linux Debian Unstable (sid) Virtual Machines (Kernel 3.0.0-1-686-pae,
>> Apache/2.2.21 (Debian)) on same LAN network.
>>
>> node-0 IP: 192.168.0.101
>> node-1 IP: 192.168.0.102
>> Desired Cluster Virtual IP: 192.168.0.100
>>
>> The two nodes are setup to communicate with proper SSH keys and it works
>> flawlessly. Also they can communicate with short names:
>>
>> root at node-0:~# ssh node-1 -- hostname
>> node-1
>>
>> root at node-1:~# ssh node-0 -- hostname
>> node-0
>>
>> My problem is that although I've reached the part where you have the
>> ClusterIP
>> resource setup properly, the Apache resource does not get started in either
>> node. The logs do not have a message explaining the failure in detail, even
>> with
>> debug messages enabled. All related messages report unknown errors while
>> trying
>> to start the service and after a while the cluster manager gives up. From
>> the
>> messages it seems like the manager is getting unexpected exit codes from the
>> Apache resource. The server-status URL is accessible from 127.0.0.1 in both
>> nodes.
>>
>> root at node-0:~# crm_mon -1
>> ============
>> Last updated: Fri Sep 30 14:04:55 2011
>> Stack: openais
>> Current DC: node-1 - partition with quorum
>> Version: 1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
>> 2 Nodes configured, 2 expected votes
>> 2 Resources configured.
>> ============
>>
>> Online: [ node-1 node-0 ]
>>
>>   ClusterIP    (ocf::heartbeat:IPaddr2):    Started node-1
>>
>> Failed actions:
>>     Apache2_monitor_0 (node=node-0, call=3, rc=1, status=complete): unknown
>> error
>>     Apache2_start_0 (node=node-0, call=5, rc=1, status=complete): unknown
>> error
>>     Apache2_monitor_0 (node=node-1, call=8, rc=1, status=complete): unknown
>> error
>>     Apache2_start_0 (node=node-1, call=10, rc=1, status=complete): unknown
>> error
>>
>> Let's checkout the logs for this resource:
>>
>> root at node-0:~# grep ERROR.*Apache2 /var/log/corosync/corosync.log
>> (Nothing)
>>
>> root at node-0:~# grep WARN.*Apache2 /var/log/corosync/corosync.log
>> Sep 30 14:04:23 node-0 lrmd: [2555]: WARN: Managed Apache2:monitor process
>> 2802 exited with return code 1.
>> Sep 30 14:04:30 node-0 lrmd: [2555]: WARN: Managed Apache2:start process
>> 2942 exited with return code 1.
>>
>> root at node-1:~# grep ERROR.*Apache2 /var/log/corosync/corosync.log
>> Sep 30 14:04:23 node-1 pengine: [1676]: ERROR: native_create_actions:
>> Resource Apache2 (ocf::apache) is active on 2 nodes attempting recovery
>>
>> root at node-1:~# grep WARN.*Apache2 /var/log/corosync/corosync.log
>> Sep 30 14:04:23 node-1 lrmd: [1674]: WARN: Managed Apache2:monitor process
>> 3006 exited with return code 1.
>> Sep 30 14:04:23 node-1 crmd: [1677]: WARN: status_from_rc: Action 5
>> (Apache2_monitor_0) on node-1 failed (target: 7 vs. rc: 1): Error
>> Sep 30 14:04:23 node-1 crmd: [1677]: WARN: status_from_rc: Action 7
>> (Apache2_monitor_0) on node-0 failed (target: 7 vs. rc: 1): Error
>> Sep 30 14:04:23 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_monitor_0 on node-0: unknown error (1)
>> Sep 30 14:04:23 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_monitor_0 on node-1: unknown error (1)
>> Sep 30 14:04:30 node-1 crmd: [1677]: WARN: status_from_rc: Action 10
>> (Apache2_start_0) on node-0 failed (target: 0 vs. rc: 1): Error
>> Sep 30 14:04:30 node-1 crmd: [1677]: WARN: update_failcount: Updating
>> failcount for Apache2 on node-0 after failed start: rc=1 (update=INFINITY,
>> time=1317380670)
>> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_monitor_0 on node-0: unknown error (1)
>> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_start_0 on node-0: unknown error (1)
>> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_monitor_0 on node-1: unknown error (1)
>> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: common_apply_stickiness:
>> Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
>> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_monitor_0 on node-0: unknown error (1)
>> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_start_0 on node-0: unknown error (1)
>> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_monitor_0 on node-1: unknown error (1)
>> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: common_apply_stickiness:
>> Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
>> Sep 30 14:04:36 node-1 lrmd: [1674]: WARN: Managed Apache2:start process
>> 3146 exited with return code 1.
>> Sep 30 14:04:36 node-1 crmd: [1677]: WARN: status_from_rc: Action 9
>> (Apache2_start_0) on node-1 failed (target: 0 vs. rc: 1): Error
>> Sep 30 14:04:36 node-1 crmd: [1677]: WARN: update_failcount: Updating
>> failcount for Apache2 on node-1 after failed start: rc=1 (update=INFINITY,
>> time=1317380676)
>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_monitor_0 on node-0: unknown error (1)
>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_start_0 on node-0: unknown error (1)
>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_monitor_0 on node-1: unknown error (1)
>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_start_0 on node-1: unknown error (1)
>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness:
>> Forcing Apache2 away from node-1 after 1000000 failures (max=1000000)
>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness:
>> Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_monitor_0 on node-0: unknown error (1)
>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_start_0 on node-0: unknown error (1)
>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_monitor_0 on node-1: unknown error (1)
>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_start_0 on node-1: unknown error (1)
>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness:
>> Forcing Apache2 away from node-1 after 1000000 failures (max=1000000)
>> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness:
>> Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
>> Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_monitor_0 on node-0: unknown error (1)
>> Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_start_0 on node-0: unknown error (1)
>> Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_monitor_0 on node-1: unknown error (1)
>> Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_start_0 on node-1: unknown error (1)
>> Sep 30 14:13:38 node-1 pengine: [1676]: WARN: common_apply_stickiness:
>> Forcing Apache2 away from node-1 after 1000000 failures (max=1000000)
>> Sep 30 14:13:38 node-1 pengine: [1676]: WARN: common_apply_stickiness:
>> Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
>> Sep 30 14:13:52 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_monitor_0 on node-1: unknown error (1)
>> Sep 30 14:13:52 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
>> failed op Apache2_start_0 on node-1: unknown error (1)
>> Sep 30 14:13:52 node-1 pengine: [1676]: WARN: common_apply_stickiness:
>> Forcing Apache2 away from node-1 after 1000000 failures (max=1000000)
>> Sep 30 14:13:52 node-1 pengine: [1676]: WARN: common_apply_stickiness:
>> Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
>>
>> Any suggestions?
>>
>> File /etc/corosync/corosync.conf (Only changes here , see attached for full
>> file)
>>
>> # Please read the openais.conf.5 manual page
>>
>> totem {
>>
>> ... (Default)
>>
>>      interface {
>>         # The following values need to be set based on your environment
>>         ringnumber: 0
>>         bindnetaddr: 192.168.0.0
>>         mcastaddr: 226.94.1.1
>>         mcastport: 5405
>>     }
>> }
>>
>> ... (Default)
>>
>> service {
>>      # Load the Pacemaker Cluster Resource Manager
>>      ver:       1
>>      name:      pacemaker
>> }
>>
>> ... (Default)
>>
>> logging {
>>         fileline: off
>>         to_stderr: no
>>         to_logfile: yes
>>         logfile: /var/log/corosync/corosync.log
>>         to_syslog: no
>>         syslog_facility: daemon
>>         debug: on
>>         timestamp: on
>>         logger_subsys {
>>                 subsys: AMF
>>                 debug: off
>>                 tags: enter|leave|trace1|trace2|trace3|trace4|trace6
>>         }
>> }
>>
>> --
>> Koutsokeras Miltiadis M.Sc.
>> Software Engineer
>> Biovista Inc.
>>
>> US Offices
>> 2421 Ivy Road
>> Charlottesville, VA 22903
>> USA
>> T: +1.434.971.1141
>> F: +1.434.971.1144
>>
>> European Offices
>> 34 Rodopoleos Street
>> Ellinikon, Athens 16777
>> GREECE
>> T: +30.210.9629848
>> F: +30.210.9647606
>>
>> www.biovista.com
>>
>> Biovista is a privately held biotechnology company that finds novel uses for
>> existing drugs, and profiles their side effects using their mechanism of
>> action. Biovista develops its own pipeline of drugs in CNS, oncology,
>> auto-immune and rare diseases. Biovista is collaborating with
>> biopharmaceutical companies on indication expansion and de-risking of their
>> portfolios and with the FDA on adverse event prediction.
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


-- 
Koutsokeras Miltiadis M.Sc.
Software Engineer
Biovista Inc.

US Offices
2421 Ivy Road
Charlottesville, VA 22903
USA
T: +1.434.971.1141
F: +1.434.971.1144

European Offices
34 Rodopoleos Street
Ellinikon, Athens 16777
GREECE
T: +30.210.9629848
F: +30.210.9647606

www.biovista.com

Biovista is a privately held biotechnology company that finds novel uses for existing drugs, and profiles their side effects using their mechanism of action. Biovista develops its own pipeline of drugs in CNS, oncology, auto-immune and rare diseases. Biovista is collaborating with biopharmaceutical companies on indication expansion and de-risking of their portfolios and with the FDA on adverse event prediction.






More information about the Pacemaker mailing list