[Pacemaker] a question on the `ping` RA

Thu May 29 20:38:08 EDT 2014

On 29 May 2014, at 9:19 pm, Riccardo Murri <riccardo.murri at gmail.com> wrote:

> Hello,
> 
> we have setup a cluster of 10 nodes to serve a Lustre filesystem to a
> computational cluster, with Pacemaker+Corosync to handle failover
> between hosts.  Each host is connected to an ethernet network and an
> Infiniband, and we set up a `ping` resource to ensure that storage
> nodes can see compute nodes over the Infiniband network.  The
> intention is to ensure that, if a storage node cannot communicate with
> compute nodes over IB, it should hand over resources to another
> storage node.
> 
> Here's the relevant section from `crm configure show`::
> 
>    primitive ping ocf:pacemaker:ping \
>            params name=ping dampen=5s multiplier=10
> host_list="lustre-mds1 ibr01c01b01n01 ...(24 hosts omitted)..." \
>            op start timeout=120 interval=0 \
>            op monitor timeout=60 interval=10 \
>            op stop timeout=20 interval=0
>    clone ping_clone ping \
>            meta globally-unique=false clone-node-max=1
> is-managed=true target-role=Started
>    # Bind OST locations to hosts that can actually support them.
>    location mdt-location mdt \
>            [...]
>            rule $id="mdt_only_if_ping_works" -INFINITY: not_defined
> ping or ping number:lte 0
> 
> In our understanding of the `ping` RA, this would add a score from 0
> to 520, depending on how many compute nodes a storage node can ping.

I'd expect the max to be 10 * 26 = 260.
For 520 you'd need the multiplier to be 20.

> 
> Since the resource stickiness is 2000, resources would only move if
> the `ping` RA failed completely and the host was totally cut off from
> the IB network.
> 
> However, we have had a case last night of resources moving back and
> forth between two storage nodes; the only trace left in the logs is
> that `ping` failed everywhere, and some trouble reports from Corosync
> (which we cannot explain and could be the real cause)::
> 
>    May 28 00:29:19 lustre-mds1 ping(ping)[8147]: ERROR: Unexpected
> result for 'ping -n -q -W 5 -c 3  iblustre-mds1' 2: ping: unknown host
> iblustre-mds1

It couldn't find itself?  DNS issue?

>    May 28 00:29:22 lustre-mds1 corosync[23879]:   [TOTEM ]
> Incrementing problem counter for seqid 11125389 i
>    face 10.129.93.10 to [9 of 10]
>    May 28 00:29:25 lustre-mds1 corosync[23879]:   [TOTEM ]
> Incrementing problem counter for seqid 11126239 i
>    face 10.129.93.10 to [10 of 10]
>    May 28 00:29:25 lustre-mds1 corosync[23879]:   [TOTEM ] Marking
> seqid 11126239 ringid 0 interface 10.129.
>    93.10 FAULTY
>    May 28 00:29:26 lustre-mds1 corosync[23879]:   [TOTEM ]
> Automatically recovered ring 0
>    May 28 00:29:27 lustre-mds1 lrmd[23906]:  warning:
> child_timeout_callback: ping_monitor_10000 process (PID 8147) timed
> out
>    May 28 00:29:27 lustre-mds1 lrmd[23906]:  warning:
> operation_finished: ping_monitor_10000:8147 - timed out after 60000ms
>    May 28 00:29:27 lustre-mds1 crmd[23909]:    error:
> process_lrm_event: Operation ping_monitor_10000: Timed Out
> (node=lustre-mds1.ften.es.hpcn.uzh.ch, call=267, timeout=60000ms)
>    May 28 00:29:27 lustre-mds1 corosync[23879]:   [TOTEM ]
> Incrementing problem counter for seqid 11126319 iface 10.129.93.10 to
> [1 of 10]
>    May 28 00:29:27 lustre-mds1 crmd[23909]:  warning:
> update_failcount: Updating failcount for ping on
> lustre-mds1.ften.es.hpcn.uzh.ch after failed monitor: rc=1
> (update=value++, time=1401229767)
>    [...]
>    May 28 00:30:03 lustre-mds1 crmd[23909]:  warning:
> update_failcount: Updating failcount for ping on
> lustre-oss1.ften.es.hpcn.uzh.ch after failed monitor: rc=1
> (update=value++, time=1401229803)
>    May 28 00:30:03 lustre-mds1 crmd[23909]:   notice: run_graph:
> Transition 472 (Complete=7, Pending=0, Fired=0, Skipped=1,
> Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2770.bz2):
> Stopped
>    May 28 00:30:03 lustre-mds1 pengine[23908]:  warning:
> unpack_rsc_op_failure: Processing failed op monitor for ping:0 on
> lustre-oss4.ften.es.hpcn.uzh.ch: unknown error (1)
>    May 28 00:30:03 lustre-mds1 pengine[23908]:  warning:
> unpack_rsc_op_failure: Processing failed op monitor for ping:1 on
> lustre-oss5.ften.es.hpcn.uzh.ch: unknown error (1)
>    May 28 00:30:03 lustre-mds1 pengine[23908]:  warning:
> unpack_rsc_op_failure: Processing failed op monitor for ping:2 on
> lustre-oss6.ften.es.hpcn.uzh.ch: unknown error (1)
>    May 28 00:30:03 lustre-mds1 pengine[23908]:  warning:
> unpack_rsc_op_failure: Processing failed op monitor for ping:3 on
> lustre-oss7.ften.es.hpcn.uzh.ch: unknown error (1)
>    May 28 00:30:03 lustre-mds1 pengine[23908]:  warning:
> unpack_rsc_op_failure: Processing failed op monitor for ping:4 on
> lustre-oss8.ften.es.hpcn.uzh.ch: unknown error (1)
>    May 28 00:30:03 lustre-mds1 pengine[23908]:  warning:
> unpack_rsc_op_failure: Processing failed op monitor for ping:5 on
> lustre-mds1.ften.es.hpcn.uzh.ch: unknown error (1)
>    May 28 00:30:03 lustre-mds1 pengine[23908]:  warning:
> unpack_rsc_op_failure: Processing failed op monitor for ping:6 on
> lustre-mds2.ften.es.hpcn.uzh.ch: unknown error (1)
>    May 28 00:30:03 lustre-mds1 pengine[23908]:  warning:
> unpack_rsc_op_failure: Processing failed op monitor for ping:7 on
> lustre-oss1.ften.es.hpcn.uzh.ch: unknown error (1)
>    May 28 00:30:03 lustre-mds1 pengine[23908]:  warning:
> unpack_rsc_op_failure: Processing failed op monitor for ping:8 on
> lustre-oss2.ften.es.hpcn.uzh.ch: unknown error (1)
>    May 28 00:30:03 lustre-mds1 pengine[23908]:  warning:
> unpack_rsc_op_failure: Processing failed op monitor for ping:9 on
> lustre-oss3.ften.es.hpcn.uzh.ch: unknown error (1)
>    May 28 00:30:03 lustre-mds1 pengine[23908]:   notice: LogActions:
> Restart mdt#011(Started lustre-mds1.ften.es.hpcn.uzh.ch)
>    May 28 00:30:03 lustre-mds1 pengine[23908]:   notice: LogActions:
> Move    mgt#011(Started lustre-mds2.ften.es.hpcn.uzh.ch ->
> lustre-mds1.ften.es.hpcn.uzh.ch)
>    May 28 00:30:03 lustre-mds1 pengine[23908]:   notice: LogActions:
> Restart ost00#011(Started lustre-oss1.ften.es.hpcn.uzh.ch)
>    May 28 00:30:03 lustre-mds1 pengine[23908]:   notice: LogActions:
> Restart ost01#011(Started lustre-oss3.ften.es.hpcn.uzh.ch)
>    [...]
> 
> So, questions:
> 
> - is this the way one is supposed to use the `ping` RA, i.e., to
>  compute a score based on the number of reachable test nodes?

yep

> 
> - or rather does the `ping` RA trigger failure events when even one of
>  the nodes cannot be pinged?

both.  it always triggers events when something changes and its up to the policy engine to look at your constraints and decide if things should be moved.

> 
> - could the ping failure have triggered the resource restart above?

yes

> 
> - any hints how to further debug the issue?

>    May 28 00:30:03 lustre-mds1 crmd[23909]:   notice: run_graph:
> Transition 472 (Complete=7, Pending=0, Fired=0, Skipped=1,
> Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2770.bz2):

^^^ Have a look at the mentioned pe-input files.  Replay them with crm_simulate -Sx and compare them with crm_diff - that will tell you what changed and what the cluster decided to do about it. 

> 
> Thank you for any help!
> 
> Kind regards,
> Riccardo
> 
> --
> Riccardo Murri
> http://www.gc3.uzh.ch/people/rm
> 
> Grid Computing Competence Centre
> University of Zurich
> Winterthurerstrasse 190, CH-8057 Zürich (Switzerland)
> Tel: +41 44 635 4222
> Fax: +41 44 635 6888
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140530/58695d58/attachment-0003.sig>