[Pacemaker] Single Node Cluster and Resource Management

Andrew Beekhof andrew at beekhof.net
Thu Dec 9 10:53:49 UTC 2010


On Tue, Dec 7, 2010 at 1:24 PM,  <ant1spamz-pacemaker at yahoo.com> wrote:
> Hi there,
> I have a requirement to make a single node cluster primarily for resource
> monitoring on the local node so that network load balancing from my front
> end load balancers works correctly and the node in question fails out due to
> either my public or private interface or both interfaces fail (typical OR
> Truth Table)
> my NLB has the following setup
> 2 front end LB's with a failover IP between them and direct routing to my
> nodes public interface with monitoring on the private interface
> my nodes
> one public interface and one private interface,  this is how things are and
> I cant change it.
> =================
> setup
> ===========
> pingd to my LB1 - 192.168.0.68
> pingd to my LB2 (represents a "public" ping destination) - 192.168.0.69
> location constraint to fail if either one of the ping times out
> now on startup everything is ok, apache launches along with my 2 pingd, the
> fail constraint works as well
> ================
> the problem
> =============
> now when I simulate a network failure (iptables -s web1.testcluster -j DROP)
> apache is correctly failed.  When pingd re-establishes connection the
> Apache constraint must be reversed and Apache simply started.
> how do I achieve the automatic resource restart?

it will happen whenever the pingd resource redetects network
connectivity (up to 15s later) based on your monitor interval

> Could my monitor constraint on the apache resource be in conflict with
> pingd?

shouldn't be, did you wait long enough for connectivity to return and
the next monitor op to happen?

> am I simply missing a "recovery" constraint to start the service?
> is my location constraint not correctly done?
> Other?
> possibly this: Resource apache cannot run anywhere (what this means I have
> no idea)
> icmp is ok
> Last updated: Tue Dec  7 07:09:11 2010
> Stack: Heartbeat
> Current DC: web1.testcluster (ae391b6f-176d-43bc-93b4-8104ff3414c8) -
> partition with quorum
> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
> 1 Nodes configured, unknown expected votes
> 3 Resources configured.
> ============
> Online: [ web1.testcluster ]
>  pingdnet1 (ocf::pacemaker:pingd): Started web1.testcluster
>  pingdnet2 (ocf::pacemaker:pingd): Started web1.testcluster
> crm(live)# Ctrl-C, leaving
> [root at web1 ~]# date
> Tue Dec  7 07:13:11 EST 2010
> [root at web1 ~]# ping 192.168.0.69
> PING 192.168.0.69 (192.168.0.69) 56(84) bytes of data.
> 64 bytes from 192.168.0.69: icmp_seq=1 ttl=64 time=0.104 ms
> --- 192.168.0.69 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.104/0.104/0.104/0.000 ms
> [root at web1 ~]# ping 192.168.0.68
> PING 192.168.0.68 (192.168.0.68) 56(84) bytes of data.
> 64 bytes from 192.168.0.68: icmp_seq=1 ttl=64 time=0.151 ms
> --- 192.168.0.68 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.151/0.151/0.151/0.000 ms
> ================
> conf
> ====
> primitive pingdnet1 ocf:pacemaker:pingd params host_list=192.168.0.69
> name=pingdnet1 op monitor interval=15s timeout=5s
> primitive pingdnet2 ocf:pacemaker:pingd params host_list=192.168.0.68
> name=pingdnet2 op monitor interval=15s timeout=5s
> primitive apache lsb::httpd op monitor interval=15s
> location apache-ping-constraint apache rule -inf: not_defined pingdnet1 or
> pingdnet1 lte 0
> location apache-ping-constraint2 apache rule -inf: not_defined pingdnet2 or
> pingdnet2 lte 0
> order ping-then-apache inf: pingdnet1 pingdnet2 apache
> ===============================================
> logs to help
> ======================
> Dec  7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout)
> Starting httpd:
> Dec  7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout) [
> Dec  7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout)
> OK
> Dec  7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout) ]
> Dec  7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout)
> Dec  7 06:39:41 web1 lrmd: [2471]: info: RA output: (apache:start:stdout)
> Dec  7 06:39:41 web1 lrmd: [2471]: info: Managed apache:start process 10650
> exited with return code 0.
> Dec  7 06:39:41 web1 crmd: [2474]: info: process_lrm_event: LRM operation
> apache_start_0 (call=25, rc=0, cib-update=196, confirmed=true) ok
> Dec  7 06:39:41 web1 crmd: [2474]: info: match_graph_event: Action
> apache_start_0 (14) confirmed on web1.testcluster (rc=0)
> Dec  7 06:39:41 web1 crmd: [2474]: info: te_rsc_command: Initiating action
> 15: monitor apache_monitor_15000 on web1.testcluster (local)
> Dec  7 06:39:41 web1 crmd: [2474]: info: do_lrm_rsc_op: Performing
> key=15:34:0:02fb0ab7-1384-4125-b14a-0ab5b4e9d1e8 op=apache_monitor_15000 )
> Dec  7 06:39:41 web1 lrmd: [2471]: info: rsc:apache:26: monitor
> Dec  7 06:39:41 web1 crmd: [2474]: info: te_pseudo_action: Pseudo action 3
> fired and confirmed
> Dec  7 06:39:41 web1 lrmd: [2471]: info: Managed apache:monitor process
> 10666 exited with return code 0.
> Dec  7 06:39:41 web1 crmd: [2474]: info: process_lrm_event: LRM operation
> apache_monitor_15000 (call=26, rc=0, cib-update=197, confirmed=false) ok
> Dec  7 06:39:41 web1 crmd: [2474]: info: match_graph_event: Action
> apache_monitor_15000 (15) confirmed on web1.testcluster (rc=0)
>
> Dec  7 06:56:08 web1 pengine: [2487]: notice: native_print: pingdnet1
> (ocf::pacemaker:pingd): Started web1.testcluster
> Dec  7 06:56:08 web1 pengine: [2487]: notice: native_print: pingdnet2
> (ocf::pacemaker:pingd): Started web1.testcluster
> Dec  7 06:56:08 web1 pengine: [2487]: notice: native_print: apache
> (lsb:httpd): Stopped
> Dec  7 06:56:08 web1 pengine: [2487]: info: native_color: Resource apache
> cannot run anywhere
> Dec  7 06:56:08 web1 pengine: [2487]: notice: LogActions: Leave resource
> pingdnet1 (Started web1.testcluster)
> Dec  7 06:56:08 web1 pengine: [2487]: notice: LogActions: Leave resource
> pingdnet2 (Started web1.testcluster)
> Dec  7 06:56:08 web1 pengine: [2487]: notice: LogActions: Leave resource
> apache (Stopped)
>
>
>
> Dec  7 07:10:15 web1 pingd: [9653]: info: ping_read: Retrying...
> Dec  7 07:10:16 web1 pingd: [9521]: info: ping_read: Retrying...
> Dec  7 07:10:47 web1 last message repeated 31 times
> Dec  7 07:11:08 web1 last message repeated 21 times
> Dec  7 07:11:08 web1 last message repeated 21 times
> Dec  7 07:11:08 web1 crmd: [2474]: info: crm_timer_popped: PEngine Recheck
> Timer (I_PE_CALC) just popped!
> Dec  7 07:11:08 web1 crmd: [2474]: info: do_state_transition: State
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
> origin=crm_timer_popped ]
> Dec  7 07:11:08 web1 crmd: [2474]: info: do_state_transition: Progressed to
> state S_POLICY_ENGINE after C_TIMER_POPPED
> Dec  7 07:11:08 web1 crmd: [2474]: info: do_state_transition: All 1 cluster
> nodes are eligible to run resources.
> Dec  7 07:11:08 web1 crmd: [2474]: info: do_pe_invoke: Query 201: Requesting
> the current CIB: S_POLICY_ENGINE
> Dec  7 07:11:08 web1 crmd: [2474]: info: do_pe_invoke_callback: Invoking the
> PE: query=201, ref=pe_calc-dc-1291723868-103, seq=1, quorate=1
> Dec  7 07:11:08 web1 pengine: [2487]: notice: unpack_config: On loss of CCM
> Quorum: Ignore
> Dec  7 07:11:08 web1 pengine: [2487]: info: unpack_config: Node scores:
> 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> Dec  7 07:11:08 web1 pengine: [2487]: info: determine_online_status: Node
> web1.testcluster is online
> Dec  7 07:11:08 web1 pengine: [2487]: notice: native_print: pingdnet1
> (ocf::pacemaker:pingd): Started web1.testcluster
> Dec  7 07:11:08 web1 pengine: [2487]: notice: native_print: pingdnet2
> (ocf::pacemaker:pingd): Started web1.testcluster
> Dec  7 07:11:08 web1 pengine: [2487]: notice: native_print: apache
> (lsb:httpd): Stopped
> Dec  7 07:11:08 web1 pengine: [2487]: info: native_color: Resource apache
> cannot run anywhere
> Dec  7 07:11:08 web1 pengine: [2487]: notice: LogActions: Leave resource
> pingdnet1 (Started web1.testcluster)
> Dec  7 07:11:08 web1 pengine: [2487]: notice: LogActions: Leave resource
> pingdnet2 (Started web1.testcluster)
> Dec  7 07:11:08 web1 pengine: [2487]: notice: LogActions: Leave resource
> apache (Stopped)
> Dec  7 07:11:08 web1 crmd: [2474]: info: do_state_transition: State
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=handle_response ]
> Dec  7 07:11:08 web1 crmd: [2474]: info: unpack_graph: Unpacked transition
> 37: 0 actions in 0 synapses
> Dec  7 07:11:08 web1 crmd: [2474]: info: do_te_invoke: Processing graph 37
> (ref=pe_calc-dc-1291723868-103) derived from
> /var/lib/pengine/pe-input-555.bz2
> Dec  7 07:11:08 web1 crmd: [2474]: info: run_graph:
> ====================================================
> Dec  7 07:11:08 web1 crmd: [2474]: notice: run_graph: Transition 37
> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pengine/pe-input-555.bz2): Complete
> Dec  7 07:11:08 web1 crmd: [2474]: info: te_graph_trigger: Transition 37 is
> now complete
> Dec  7 07:11:08 web1 crmd: [2474]: info: notify_crmd: Transition 37 status:
> done - <null>
> Dec  7 07:11:08 web1 crmd: [2474]: info: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Dec  7 07:11:08 web1 crmd: [2474]: info: do_state_transition: Starting
> PEngine Recheck Timer
> Dec  7 07:11:08 web1 pengine: [2487]: info: process_pe_message: Transition
> 37: PEngine Input stored in: /var/lib/pengine/pe-input-555.bz2
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>




More information about the Pacemaker mailing list