[Pacemaker] Single Node Cluster and Resource Management
ant1spamz-pacemaker at yahoo.com
ant1spamz-pacemaker at yahoo.com
Tue Dec 7 12:24:40 UTC 2010
Hi there,
I have a requirement to make a single node cluster primarily for resource
monitoring on the local node so that network load balancing from my front end
load balancers works correctly and the node in question fails out due to either
my public or private interface or both interfaces fail (typical OR Truth Table)
my NLB has the following setup
2 front end LB's with a failover IP between them and direct routing to my nodes
public interface with monitoring on the private interface
my nodes
one public interface and one private interface, this is how things are and I
cant change it.
=================
setup
===========
pingd to my LB1 - 192.168.0.68
pingd to my LB2 (represents a "public" ping destination) - 192.168.0.69
location constraint to fail if either one of the ping times out
now on startup everything is ok, apache launches along with my 2 pingd, the fail
constraint works as well
================
the problem
=============
now when I simulate a network failure (iptables -s web1.testcluster -j DROP)
apache is correctly failed. When pingd re-establishes connection the
Apache constraint must be reversed and Apache simply started.
how do I achieve the automatic resource restart?
Could my monitor constraint on the apache resource be in conflict with pingd?
am I simply missing a "recovery" constraint to start the service?
is my location constraint not correctly done?
Other?
possibly this: Resource apache cannot run anywhere (what this means I have no
idea)
icmp is ok
Last updated: Tue Dec 7 07:09:11 2010
Stack: Heartbeat
Current DC: web1.testcluster (ae391b6f-176d-43bc-93b4-8104ff3414c8) - partition
with quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
1 Nodes configured, unknown expected votes
3 Resources configured.
============
Online: [ web1.testcluster ]
pingdnet1(ocf::pacemaker:pingd):Started web1.testcluster
pingdnet2(ocf::pacemaker:pingd):Started web1.testcluster
crm(live)# Ctrl-C, leaving
[root at web1 ~]# date
Tue Dec 7 07:13:11 EST 2010
[root at web1 ~]# ping 192.168.0.69
PING 192.168.0.69 (192.168.0.69) 56(84) bytes of data.
64 bytes from 192.168.0.69: icmp_seq=1 ttl=64 time=0.104 ms
--- 192.168.0.69 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.104/0.104/0.104/0.000 ms
[root at web1 ~]# ping 192.168.0.68
PING 192.168.0.68 (192.168.0.68) 56(84) bytes of data.
64 bytes from 192.168.0.68: icmp_seq=1 ttl=64 time=0.151 ms
--- 192.168.0.68 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.151/0.151/0.151/0.000 ms
================
conf
====
primitive pingdnet1 ocf:pacemaker:pingd params host_list=192.168.0.69
name=pingdnet1 op monitor interval=15s timeout=5s
primitive pingdnet2 ocf:pacemaker:pingd params host_list=192.168.0.68
name=pingdnet2 op monitor interval=15s timeout=5s
primitive apache lsb::httpd op monitor interval=15s
location apache-ping-constraint apache rule -inf: not_defined pingdnet1 or
pingdnet1 lte 0
location apache-ping-constraint2 apache rule -inf: not_defined pingdnet2 or
pingdnet2 lte 0
order ping-then-apache inf: pingdnet1 pingdnet2 apache
===============================================
logs to help
======================
Dec 7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout)
Starting httpd:
Dec 7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout) [
Dec 7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout) OK
Dec 7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout) ]
Dec 7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout)
Dec 7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout)
Dec 7 06:39:41 web1 lrmd: [2471]: info: Managed apache:start process 10650
exited with return code 0.
Dec 7 06:39:41 web1 crmd: [2474]: info: process_lrm_event: LRM operation
apache_start_0 (call=25, rc=0, cib-update=196, confirmed=true) ok
Dec 7 06:39:41 web1 crmd: [2474]: info: match_graph_event: Action
apache_start_0 (14) confirmed on web1.testcluster (rc=0)
Dec 7 06:39:41 web1 crmd: [2474]: info: te_rsc_command: Initiating action 15:
monitor apache_monitor_15000 on web1.testcluster (local)
Dec 7 06:39:41 web1 crmd: [2474]: info: do_lrm_rsc_op: Performing
key=15:34:0:02fb0ab7-1384-4125-b14a-0ab5b4e9d1e8 op=apache_monitor_15000 )
Dec 7 06:39:41 web1 lrmd: [2471]: info: rsc:apache:26: monitor
Dec 7 06:39:41 web1 crmd: [2474]: info: te_pseudo_action: Pseudo action 3 fired
and confirmed
Dec 7 06:39:41 web1 lrmd: [2471]: info: Managed apache:monitor process 10666
exited with return code 0.
Dec 7 06:39:41 web1 crmd: [2474]: info: process_lrm_event: LRM operation
apache_monitor_15000 (call=26, rc=0, cib-update=197, confirmed=false) ok
Dec 7 06:39:41 web1 crmd: [2474]: info: match_graph_event: Action
apache_monitor_15000 (15) confirmed on web1.testcluster (rc=0)
Dec 7 06:56:08 web1 pengine: [2487]: notice: native_print:
pingdnet1(ocf::pacemaker:pingd):Started web1.testcluster
Dec 7 06:56:08 web1 pengine: [2487]: notice: native_print:
pingdnet2(ocf::pacemaker:pingd):Started web1.testcluster
Dec 7 06:56:08 web1 pengine: [2487]: notice: native_print:
apache(lsb:httpd):Stopped
Dec 7 06:56:08 web1 pengine: [2487]: info: native_color: Resource apache cannot
run anywhere
Dec 7 06:56:08 web1 pengine: [2487]: notice: LogActions: Leave resource
pingdnet1(Started web1.testcluster)
Dec 7 06:56:08 web1 pengine: [2487]: notice: LogActions: Leave resource
pingdnet2(Started web1.testcluster)
Dec 7 06:56:08 web1 pengine: [2487]: notice: LogActions: Leave resource
apache(Stopped)
Dec 7 07:10:15 web1 pingd: [9653]: info: ping_read: Retrying...
Dec 7 07:10:16 web1 pingd: [9521]: info: ping_read: Retrying...
Dec 7 07:10:47 web1 last message repeated 31 times
Dec 7 07:11:08 web1 last message repeated 21 times
Dec 7 07:11:08 web1 last message repeated 21 times
Dec 7 07:11:08 web1 crmd: [2474]: info: crm_timer_popped: PEngine Recheck Timer
(I_PE_CALC) just popped!
Dec 7 07:11:08 web1 crmd: [2474]: info: do_state_transition: State transition
S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
origin=crm_timer_popped ]
Dec 7 07:11:08 web1 crmd: [2474]: info: do_state_transition: Progressed to
state S_POLICY_ENGINE after C_TIMER_POPPED
Dec 7 07:11:08 web1 crmd: [2474]: info: do_state_transition: All 1 cluster
nodes are eligible to run resources.
Dec 7 07:11:08 web1 crmd: [2474]: info: do_pe_invoke: Query 201: Requesting the
current CIB: S_POLICY_ENGINE
Dec 7 07:11:08 web1 crmd: [2474]: info: do_pe_invoke_callback: Invoking the PE:
query=201, ref=pe_calc-dc-1291723868-103, seq=1, quorate=1
Dec 7 07:11:08 web1 pengine: [2487]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Dec 7 07:11:08 web1 pengine: [2487]: info: unpack_config: Node scores: 'red' =
-INFINITY, 'yellow' = 0, 'green' = 0
Dec 7 07:11:08 web1 pengine: [2487]: info: determine_online_status: Node
web1.testcluster is online
Dec 7 07:11:08 web1 pengine: [2487]: notice: native_print:
pingdnet1(ocf::pacemaker:pingd):Started web1.testcluster
Dec 7 07:11:08 web1 pengine: [2487]: notice: native_print:
pingdnet2(ocf::pacemaker:pingd):Started web1.testcluster
Dec 7 07:11:08 web1 pengine: [2487]: notice: native_print:
apache(lsb:httpd):Stopped
Dec 7 07:11:08 web1 pengine: [2487]: info: native_color: Resource apache cannot
run anywhere
Dec 7 07:11:08 web1 pengine: [2487]: notice: LogActions: Leave resource
pingdnet1(Started web1.testcluster)
Dec 7 07:11:08 web1 pengine: [2487]: notice: LogActions: Leave resource
pingdnet2(Started web1.testcluster)
Dec 7 07:11:08 web1 pengine: [2487]: notice: LogActions: Leave resource
apache(Stopped)
Dec 7 07:11:08 web1 crmd: [2474]: info: do_state_transition: State transition
S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
origin=handle_response ]
Dec 7 07:11:08 web1 crmd: [2474]: info: unpack_graph: Unpacked transition 37: 0
actions in 0 synapses
Dec 7 07:11:08 web1 crmd: [2474]: info: do_te_invoke: Processing graph 37
(ref=pe_calc-dc-1291723868-103) derived from /var/lib/pengine/pe-input-555.bz2
Dec 7 07:11:08 web1 crmd: [2474]: info: run_graph:
====================================================
Dec 7 07:11:08 web1 crmd: [2474]: notice: run_graph: Transition 37 (Complete=0,
Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-555.bz2): Complete
Dec 7 07:11:08 web1 crmd: [2474]: info: te_graph_trigger: Transition 37 is now
complete
Dec 7 07:11:08 web1 crmd: [2474]: info: notify_crmd: Transition 37 status: done
- <null>
Dec 7 07:11:08 web1 crmd: [2474]: info: do_state_transition: State transition
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL
origin=notify_crmd ]
Dec 7 07:11:08 web1 crmd: [2474]: info: do_state_transition: Starting PEngine
Recheck Timer
Dec 7 07:11:08 web1 pengine: [2487]: info: process_pe_message: Transition 37:
PEngine Input stored in: /var/lib/pengine/pe-input-555.bz2
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20101207/17caaed0/attachment-0001.html>
More information about the Pacemaker
mailing list