[Pacemaker] ping directive configuration

Tue Feb 1 16:55:09 UTC 2011

Hi Again :-)

I think my main problem is my location configuration when i bring down eth0
on node1 the and looking at crm_m -f the count on node 2 never increases

Could anyone help me out with the pingd / location restraints required for a
group of resources to failover from node1 to node 2 if the node1 can no
longer ping the default gateway ?

Thanks
again

On 1 February 2011 13:08, paul harford <harfordmeister at gmail.com> wrote:

> Hi Nikita
> Sorry i fogot i have 2 ethernet interfaces eth 1 is for the heartbeat and
> eth 0 is for the public ip and the virtual ip for apache is 10.100.1.100
>
> Thanks
> Paul
>
>
> On 1 February 2011 12:04, Nikita Michalko <michalko.system at a-i-p.com>wrote:
>
>> Hi Paul!
>>
>> Can you show me your ha.cf?
>> How many network  interfaces do you use for this cluster?
>> If only one, it is the typical split-brain situation after cable pull
>> down!
>>
>> Nikita
>>
>>
>> Am Dienstag, 1. Februar 2011 12:05 schrieb paul harford:
>>  > Hi NIkita
>> > I reverted to an early snapshot and started again i now have ping d
>> running
>> > but when i remove the eth0 the resource does not failover
>> >
>> > i can see in the ha-log that the ping detects the network is gone but it
>> > does not move the resource. Can anyone see the error in my config?
>> >
>> >
>> > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" node1 \
>> >         attributes standby="off"
>> > node $id="59440607-2a5c-450e-84fa-94bf69742671" node2 \
>> >         attributes standby="off"
>> > primitive MYPING ocf:pacemaker:pingd \
>> >         params host_list="10.100.0.254" multiplier="1000" \
>> >         op monitor interval="15s" timeout="20s" \
>> >         op start interval="0" timeout="90s" \
>> >         op stop interval="0" timeout="100s"
>> > primitive crhweb ocf:heartbeat:apache \
>> >         params configfile="/etc/httpd/conf/httpd.conf" \
>> >         op monitor interval="60s" \
>> >         meta target-role="Started"
>> > primitive failoverip ocf:heartbeat:IPaddr \
>> >         params ip="10.100.1.100" cidr_netmask="255.255.0.0" \
>> >         op monitor interval="30s"
>> > clone MYPINGCLONE MYPING \
>> >         meta globally-unique="false"
>> > location web_location crhweb \
>> >         rule $id="web_location-rule" -inf: not_defined pingd or pingd
>> lte 0
>> > colocation crhweb-with-failoverip inf: crhweb failoverip
>> > order crhweb-after-failoverip inf: MYPINGCLONE failoverip crhweb
>> > property $id="cib-bootstrap-options" \
>> >         dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
>> >         cluster-infrastructure="Heartbeat" \
>> >         stonith-enabled="false" \
>> >         no-quorum-policy="ignore"
>> > rsc_defaults $id="rsc-options" \
>> >         resource-stickiness="100"
>> >
>> >
>> > HA_LOG
>> >
>> > Jan 28 11:17:42 node1 heartbeat: [2872]: ERROR: glib: Error sending
>> packet:
>> > Network is unreachable
>> > Jan 28 11:17:42 node1 heartbeat: [2872]: info: glib: euid=0 egid=0
>> > Jan 28 11:17:42 node1 heartbeat: [2872]: ERROR: write_child: write
>> failure
>> > on ping 10.100.0.254.: Network is unreachable
>> > Jan 28 11:17:43 node1 pingd: [6004]: WARN: ping_write: Wrote -1 of 39
>> > chars: Network is unreachable (101
>> >
>> > On 1 February 2011 09:35, paul harford <harfordmeister at gmail.com>
>> wrote:
>> > > Hi NIkita
>> > > Many thanks for your assistance, i updated the changes you noticed but
>> > > now my 2 nodes just keep rebooting, did i enter something incorrectly
>> in
>> > > the pingd directive ?
>> > >
>> > > Paul
>> > >
>> > >
>> > > i can see these errors in the messages log and my configuration is
>> below
>> > >
>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: clone_print:  Clone
>> > > Set: connected
>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: short_print:
>> > > Stopped: [ pingd:0 pingd:1 ]
>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: info: rsc_merge_weights:
>> > > failoverip: Rolling back scores from crhweb
>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: info: native_color: Resource
>> > > crhweb cannot run anywhere
>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: RecurringOp:  Start
>> > > recurring monitor (10s) for pingd:0 on crhnode2
>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation
>> > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s
>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use
>> > > the same (name, interval) combination more than once per resource
>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation
>> > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s
>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use
>> > > the same (name, interval) combination more than once per resource
>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: RecurringOp:  Start
>> > > recurring monitor (10s) for pingd:1 on crhnode1
>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation
>> > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s
>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use
>> > > the same (name, interval) combination more than once per resource
>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation
>> > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s
>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use
>> > > the same (name, interval) combination more than once per resource
>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Leave
>> > > resource failoverip (Started crhnode1)
>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Stop
>> > > resource crhweb      (crhnode1)
>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Start
>> > > pingd:0     (crhnode2)
>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Start
>> > > pingd:1     (crhnode1)
>> > > Feb  1 09:01:06 crhnode2 crmd: [3742]: info: do_state_transition:
>> State
>> > > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
>> > > cause=C_IPC_MESSAGE origin=handle_response ]
>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: info: process_pe_message:
>> > > Transition 59: PEngine Input stored in:
>> /var/lib/pengine/pe-input-82.bz2
>> > > Feb  1 09:01:06 crhnode2 crmd: [3742]: info: unpack_graph: Unpacked
>> > > transition 59: 14 actions in 14 synapses
>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: info: process_pe_message:
>> > > Configuration ERRORs found during PE processing.  Please run
>> "crm_verify
>> > > -L" to identify issues.
>> > >
>> > >
>> > >
>> > > here is my current configuration
>> > >
>> > >
>> > > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \
>> > >         attributes standby="off"
>> > > node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \
>> > >         attributes standby="off"
>> > > primitive crhweb ocf:heartbeat:apache \
>> > >
>> > >         params configfile="/etc/httpd/conf/httpd.conf" \
>> > >         op monitor interval="60s" \
>> > >         meta target-role="Started"
>> > > primitive failoverip ocf:heartbeat:IPaddr \
>> > >         params ip="10.100.1.100" cidr_netmask="255.255.0.0" \
>> > >         op monitor interval="30s" \
>> > >         meta target-role="Started"
>> > > primitive pingd ocf:pacemaker:pingd \
>> > >         params dampen="5s" host_list="10.100.0.254" multiplier="1000"
>> > > name="pingval" \
>> > >         operations $id="pingd-operations" \
>> > >         op monitor interval="10s" timeout="20s" \
>> > >         op monitor interval="90s" timeout="25s" start \
>> > >         op monitor interval="100s" timeout="25s" stop
>> > > clone connected pingd \
>> > >
>> > >         meta globally-unique="false" target-role="started"
>> > > location cli-prefer-crhweb crhweb \
>> > >
>> > >         rule $id="cli-prefer-rule-crhweb" inf: #uname eq crhnode1
>> > > location crhweb_on_connected_node crhweb \
>> > >         rule $id="crhweb_on_connected_node-rule" -inf: not_defined
>> > > pingval or pingval lte 0
>> > >
>> > > location prefer-crhnode1 crhweb 50: crhnode1
>> > > colocation crhweb-with-failoverip inf: crhweb failoverip
>> > > order crhweb-after-failoverip inf: pingd failoverip crhweb
>> > >
>> > > property $id="cib-bootstrap-options" \
>> > >         dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
>> > >         cluster-infrastructure="Heartbeat" \
>> > >         stonith-enabled="false" \
>> > >         no-quorum-policy="ignore"
>> > >
>> > > On 1 February 2011 07:21, Nikita Michalko
>> <michalko.system at a-i-p.com>wrote:
>> > >> Hi Paul,
>> > >>
>> > >> see below!
>> > >>
>> > >> Am Montag, 31. Januar 2011 19:55 schrieb paul harford:
>> > >> > HI guys
>> > >> > i'm having some issues with a ping directive, my current config is
>> > >> > below and basically i want the web resource to failover to the
>> second
>> > >> > node if
>> > >>
>> > >> the
>> > >>
>> > >> > ping can no longer contact the default gateway
>> > >> >
>> > >> > so here goes
>> > >> >
>> > >> > crm configure primitive ping ocf:pacemaker:ping params dampen=5s
>> > >> > host_list=(default GateWay) multplier=1000 name=pingval operations
>> > >> > $id=ping-operations op moinitor interval=10s timeout=15s
>> > >>
>> > >>  - this is surely wrong: "moinitor" ?
>> > >>  - no such primitive (ping) below ...
>> > >>
>> > >> HTH
>> > >>
>> > >> Nikita Michalko
>> > >>
>> > >> > and
>> > >> >
>> > >> > crm configure clone connected ping meta globally-unique=false
>> > >> > target-role=started
>> > >> >
>> > >> > and
>> > >> >
>> > >> > location web_on_connected_node cweb rule
>> > >> > $id=web_on_connected_node-rule -inf: not_defined pingval or pingval
>> > >> > lte 0
>> > >> >
>> > >> >
>> > >> > Does anyone see any isssues's whith the above confiuguration ? i
>> want
>> > >> > to check first as the last time i tried it wouldn't work and my
>> > >> > resources would not failover or start
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \
>> > >> >         attributes standby="off"
>> > >> > node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \
>> > >> >         attributes standby="off"
>> > >> > primitive cweb ocf:heartbeat:apache \
>> > >> >         params configfile="/etc/httpd/conf/httpd.conf" \
>> > >> >         op monitor interval="60s" \
>> > >> >         meta target-role="Started"
>> > >> > primitive failoverip ocf:heartbeat:IPaddr \
>> > >> >         params ip="10.100.1.100" cidr_netmask="255.255.0.0" \
>> > >> >         op monitor interval="30s" \
>> > >> >         meta target-role="Started"
>> > >> > location cli-prefer-cweb cweb \
>> > >> >         rule $id="cli-prefer-rule-crhweb" inf: #uname eq crhnode1
>> > >> > location prefer-crhnode1 crhweb 50: crhnode1
>> > >> > colocation cweb-with-failoverip inf: cweb failoverip
>> > >> > order crhweb-after-failoverip inf: failoverip cweb
>> > >> > property $id="cib-bootstrap-options" \
>> > >> >
>> dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
>> > >> >         cluster-infrastructure="Heartbeat" \
>> > >> >         stonith-enabled="false" \
>> > >> >         no-quorum-policy="ignore"
>> > >> > rsc_defaults $id="rsc-options" \
>> > >> >         resource-stickiness="100"
>> > >>
>> > >> _______________________________________________
>> > >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> > >>
>> > >> Project Home: http://www.clusterlabs.org
>> > >> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > >> Bugs:
>> > >>
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemake
>> > >>r
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110201/a4376913/attachment-0001.htm>