<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    On 03/14/2013 03:36 PM, Arnold Krille wrote:<br>

    <span style="white-space: pre;">&gt; On Thu, 14 Mar 2013 14:06:36

      +0000 Owen Le Blanc <a class="moz-txt-link-rfc2396E" href="mailto:LeBlanc@man.ac.uk">&lt;LeBlanc@man.ac.uk&gt;</a><br>

      &gt; wrote:<br>

      &gt;&gt; I have a number of pacemaker managed clusters. We use an

      independent<br>

      &gt;&gt; heartbeat network for corosync, and we use another

      network for the<br>

      &gt;&gt; managed services. The heartbeat network is routed using

      different<br>

      &gt;&gt; hardware from the service network. We have two machine

      rooms, and<br>

      &gt;&gt; our normal pacemaker clusters have one node in each

      machine room.<br>

      &gt;&gt;<br>

      &gt;&gt; In the past I've used ocf:pacemaker:ping as part of our<br>

      &gt;&gt; configurations, but we had problems, since our network is

      busy, and<br>

      &gt;&gt; many of the routers (the most reliable things to ping)

      are configured<br>

      &gt;&gt; to ignore pings when they have too much to do otherwise.

      In this way<br>

      &gt;&gt; we often had false connectivity failures in the past, and

      services<br>

      &gt;&gt; would flop from one side to the other.<br>

      &gt;&gt;<br>

      &gt;&gt; Recently we had a power failure which affected all of the

      switches on<br>

      &gt;&gt; our service network in one machine room. This meant that

      all<br>

      &gt;&gt; services in that machine room were unavailable. Our

      pacemaker<br>

      &gt;&gt; clusters unfortunately saw this as no problem, since

      without a ping<br>

      &gt;&gt; test, they couldn't tell that the network was down.<br>

      &gt;&gt;<br>

      &gt;&gt; Has anyone done any work to measure network connectivity

      in<br>

      &gt;&gt; connection with pacemaker without using ping? I can see a

      couple of<br>

      &gt;&gt; potential ways to avoid it, but I hate to reinvent

      wheels.<br>

      &gt;<br>

      &gt; I have seen a commercial (but pacemaker-based) solution that

      seemed to<br>

      &gt; use link-detection on the hw-level to suicide the local node

      when both<br>

      &gt; links (one to the outside and one to the peer) went down.<br>

      &gt;<br>

      &gt; But I don't even know if this was done inside pacemaker, nor

      did I have<br>

      &gt; time to think about something similar for our cluster.<br>

      &gt;<br>

      &gt; I just trust that four links using two switches with

      independant power<br>

      &gt; will be safe enough...</span><br>

    I've done a suicide when the link goes away by looking at

    /sys/class/net//&lt;interface&gt;//carrier<br>

    <br>

    for example, cat /sys/class/net/eth0/carrier and see what it looks

    like...<br>

    <br>

    It's 1 when the link is up, and 0 when it's down.&nbsp; You could

    presumably write a script that uses that to set node attributes

    too...<br>

    <br>

    -- <br>

    &nbsp;&nbsp;&nbsp; Alan Robertson <a class="moz-txt-link-rfc2396E" href="mailto:alanr@unix.sh">&lt;alanr@unix.sh&gt;</a> - @OSSAlanR<br>

    <br>

    "Openness is the foundation and preservative of friendship...&nbsp; Let

    me claim from you at all times your undisguised opinions." - William

    Wilberforce<br>

    <br>

  </body>

</html>