[Pacemaker] a question on the `ping` RA
Riccardo Murri
riccardo.murri at gmail.com
Thu May 29 11:19:53 UTC 2014
Hello,
we have setup a cluster of 10 nodes to serve a Lustre filesystem to a
computational cluster, with Pacemaker+Corosync to handle failover
between hosts. Each host is connected to an ethernet network and an
Infiniband, and we set up a `ping` resource to ensure that storage
nodes can see compute nodes over the Infiniband network. The
intention is to ensure that, if a storage node cannot communicate with
compute nodes over IB, it should hand over resources to another
storage node.
Here's the relevant section from `crm configure show`::
primitive ping ocf:pacemaker:ping \
params name=ping dampen=5s multiplier=10
host_list="lustre-mds1 ibr01c01b01n01 ...(24 hosts omitted)..." \
op start timeout=120 interval=0 \
op monitor timeout=60 interval=10 \
op stop timeout=20 interval=0
clone ping_clone ping \
meta globally-unique=false clone-node-max=1
is-managed=true target-role=Started
# Bind OST locations to hosts that can actually support them.
location mdt-location mdt \
[...]
rule $id="mdt_only_if_ping_works" -INFINITY: not_defined
ping or ping number:lte 0
In our understanding of the `ping` RA, this would add a score from 0
to 520, depending on how many compute nodes a storage node can ping.
Since the resource stickiness is 2000, resources would only move if
the `ping` RA failed completely and the host was totally cut off from
the IB network.
However, we have had a case last night of resources moving back and
forth between two storage nodes; the only trace left in the logs is
that `ping` failed everywhere, and some trouble reports from Corosync
(which we cannot explain and could be the real cause)::
May 28 00:29:19 lustre-mds1 ping(ping)[8147]: ERROR: Unexpected
result for 'ping -n -q -W 5 -c 3 iblustre-mds1' 2: ping: unknown host
iblustre-mds1
May 28 00:29:22 lustre-mds1 corosync[23879]: [TOTEM ]
Incrementing problem counter for seqid 11125389 i
face 10.129.93.10 to [9 of 10]
May 28 00:29:25 lustre-mds1 corosync[23879]: [TOTEM ]
Incrementing problem counter for seqid 11126239 i
face 10.129.93.10 to [10 of 10]
May 28 00:29:25 lustre-mds1 corosync[23879]: [TOTEM ] Marking
seqid 11126239 ringid 0 interface 10.129.
93.10 FAULTY
May 28 00:29:26 lustre-mds1 corosync[23879]: [TOTEM ]
Automatically recovered ring 0
May 28 00:29:27 lustre-mds1 lrmd[23906]: warning:
child_timeout_callback: ping_monitor_10000 process (PID 8147) timed
out
May 28 00:29:27 lustre-mds1 lrmd[23906]: warning:
operation_finished: ping_monitor_10000:8147 - timed out after 60000ms
May 28 00:29:27 lustre-mds1 crmd[23909]: error:
process_lrm_event: Operation ping_monitor_10000: Timed Out
(node=lustre-mds1.ften.es.hpcn.uzh.ch, call=267, timeout=60000ms)
May 28 00:29:27 lustre-mds1 corosync[23879]: [TOTEM ]
Incrementing problem counter for seqid 11126319 iface 10.129.93.10 to
[1 of 10]
May 28 00:29:27 lustre-mds1 crmd[23909]: warning:
update_failcount: Updating failcount for ping on
lustre-mds1.ften.es.hpcn.uzh.ch after failed monitor: rc=1
(update=value++, time=1401229767)
[...]
May 28 00:30:03 lustre-mds1 crmd[23909]: warning:
update_failcount: Updating failcount for ping on
lustre-oss1.ften.es.hpcn.uzh.ch after failed monitor: rc=1
(update=value++, time=1401229803)
May 28 00:30:03 lustre-mds1 crmd[23909]: notice: run_graph:
Transition 472 (Complete=7, Pending=0, Fired=0, Skipped=1,
Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2770.bz2):
Stopped
May 28 00:30:03 lustre-mds1 pengine[23908]: warning:
unpack_rsc_op_failure: Processing failed op monitor for ping:0 on
lustre-oss4.ften.es.hpcn.uzh.ch: unknown error (1)
May 28 00:30:03 lustre-mds1 pengine[23908]: warning:
unpack_rsc_op_failure: Processing failed op monitor for ping:1 on
lustre-oss5.ften.es.hpcn.uzh.ch: unknown error (1)
May 28 00:30:03 lustre-mds1 pengine[23908]: warning:
unpack_rsc_op_failure: Processing failed op monitor for ping:2 on
lustre-oss6.ften.es.hpcn.uzh.ch: unknown error (1)
May 28 00:30:03 lustre-mds1 pengine[23908]: warning:
unpack_rsc_op_failure: Processing failed op monitor for ping:3 on
lustre-oss7.ften.es.hpcn.uzh.ch: unknown error (1)
May 28 00:30:03 lustre-mds1 pengine[23908]: warning:
unpack_rsc_op_failure: Processing failed op monitor for ping:4 on
lustre-oss8.ften.es.hpcn.uzh.ch: unknown error (1)
May 28 00:30:03 lustre-mds1 pengine[23908]: warning:
unpack_rsc_op_failure: Processing failed op monitor for ping:5 on
lustre-mds1.ften.es.hpcn.uzh.ch: unknown error (1)
May 28 00:30:03 lustre-mds1 pengine[23908]: warning:
unpack_rsc_op_failure: Processing failed op monitor for ping:6 on
lustre-mds2.ften.es.hpcn.uzh.ch: unknown error (1)
May 28 00:30:03 lustre-mds1 pengine[23908]: warning:
unpack_rsc_op_failure: Processing failed op monitor for ping:7 on
lustre-oss1.ften.es.hpcn.uzh.ch: unknown error (1)
May 28 00:30:03 lustre-mds1 pengine[23908]: warning:
unpack_rsc_op_failure: Processing failed op monitor for ping:8 on
lustre-oss2.ften.es.hpcn.uzh.ch: unknown error (1)
May 28 00:30:03 lustre-mds1 pengine[23908]: warning:
unpack_rsc_op_failure: Processing failed op monitor for ping:9 on
lustre-oss3.ften.es.hpcn.uzh.ch: unknown error (1)
May 28 00:30:03 lustre-mds1 pengine[23908]: notice: LogActions:
Restart mdt#011(Started lustre-mds1.ften.es.hpcn.uzh.ch)
May 28 00:30:03 lustre-mds1 pengine[23908]: notice: LogActions:
Move mgt#011(Started lustre-mds2.ften.es.hpcn.uzh.ch ->
lustre-mds1.ften.es.hpcn.uzh.ch)
May 28 00:30:03 lustre-mds1 pengine[23908]: notice: LogActions:
Restart ost00#011(Started lustre-oss1.ften.es.hpcn.uzh.ch)
May 28 00:30:03 lustre-mds1 pengine[23908]: notice: LogActions:
Restart ost01#011(Started lustre-oss3.ften.es.hpcn.uzh.ch)
[...]
So, questions:
- is this the way one is supposed to use the `ping` RA, i.e., to
compute a score based on the number of reachable test nodes?
- or rather does the `ping` RA trigger failure events when even one of
the nodes cannot be pinged?
- could the ping failure have triggered the resource restart above?
- any hints how to further debug the issue?
Thank you for any help!
Kind regards,
Riccardo
--
Riccardo Murri
http://www.gc3.uzh.ch/people/rm
Grid Computing Competence Centre
University of Zurich
Winterthurerstrasse 190, CH-8057 Zürich (Switzerland)
Tel: +41 44 635 4222
Fax: +41 44 635 6888
More information about the Pacemaker
mailing list