<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 03/14/2013 03:36 PM, Arnold Krille wrote:<br>
<span style="white-space: pre;">> On Thu, 14 Mar 2013 14:06:36
+0000 Owen Le Blanc <a class="moz-txt-link-rfc2396E" href="mailto:LeBlanc@man.ac.uk"><LeBlanc@man.ac.uk></a><br>
> wrote:<br>
>> I have a number of pacemaker managed clusters. We use an
independent<br>
>> heartbeat network for corosync, and we use another
network for the<br>
>> managed services. The heartbeat network is routed using
different<br>
>> hardware from the service network. We have two machine
rooms, and<br>
>> our normal pacemaker clusters have one node in each
machine room.<br>
>><br>
>> In the past I've used ocf:pacemaker:ping as part of our<br>
>> configurations, but we had problems, since our network is
busy, and<br>
>> many of the routers (the most reliable things to ping)
are configured<br>
>> to ignore pings when they have too much to do otherwise.
In this way<br>
>> we often had false connectivity failures in the past, and
services<br>
>> would flop from one side to the other.<br>
>><br>
>> Recently we had a power failure which affected all of the
switches on<br>
>> our service network in one machine room. This meant that
all<br>
>> services in that machine room were unavailable. Our
pacemaker<br>
>> clusters unfortunately saw this as no problem, since
without a ping<br>
>> test, they couldn't tell that the network was down.<br>
>><br>
>> Has anyone done any work to measure network connectivity
in<br>
>> connection with pacemaker without using ping? I can see a
couple of<br>
>> potential ways to avoid it, but I hate to reinvent
wheels.<br>
><br>
> I have seen a commercial (but pacemaker-based) solution that
seemed to<br>
> use link-detection on the hw-level to suicide the local node
when both<br>
> links (one to the outside and one to the peer) went down.<br>
><br>
> But I don't even know if this was done inside pacemaker, nor
did I have<br>
> time to think about something similar for our cluster.<br>
><br>
> I just trust that four links using two switches with
independant power<br>
> will be safe enough...</span><br>
I've done a suicide when the link goes away by looking at
/sys/class/net//<interface>//carrier<br>
<br>
for example, cat /sys/class/net/eth0/carrier and see what it looks
like...<br>
<br>
It's 1 when the link is up, and 0 when it's down. You could
presumably write a script that uses that to set node attributes
too...<br>
<br>
-- <br>
Alan Robertson <a class="moz-txt-link-rfc2396E" href="mailto:alanr@unix.sh"><alanr@unix.sh></a> - @OSSAlanR<br>
<br>
"Openness is the foundation and preservative of friendship... Let
me claim from you at all times your undisguised opinions." - William
Wilberforce<br>
<br>
</body>
</html>