[Pacemaker] network check without ping
Alan Robertson
alanr at unix.sh
Thu Mar 14 22:12:25 UTC 2013
On 03/14/2013 03:36 PM, Arnold Krille wrote:
> On Thu, 14 Mar 2013 14:06:36 +0000 Owen Le Blanc <LeBlanc at man.ac.uk>
> wrote:
>> I have a number of pacemaker managed clusters. We use an independent
>> heartbeat network for corosync, and we use another network for the
>> managed services. The heartbeat network is routed using different
>> hardware from the service network. We have two machine rooms, and
>> our normal pacemaker clusters have one node in each machine room.
>>
>> In the past I've used ocf:pacemaker:ping as part of our
>> configurations, but we had problems, since our network is busy, and
>> many of the routers (the most reliable things to ping) are configured
>> to ignore pings when they have too much to do otherwise. In this way
>> we often had false connectivity failures in the past, and services
>> would flop from one side to the other.
>>
>> Recently we had a power failure which affected all of the switches on
>> our service network in one machine room. This meant that all
>> services in that machine room were unavailable. Our pacemaker
>> clusters unfortunately saw this as no problem, since without a ping
>> test, they couldn't tell that the network was down.
>>
>> Has anyone done any work to measure network connectivity in
>> connection with pacemaker without using ping? I can see a couple of
>> potential ways to avoid it, but I hate to reinvent wheels.
>
> I have seen a commercial (but pacemaker-based) solution that seemed to
> use link-detection on the hw-level to suicide the local node when both
> links (one to the outside and one to the peer) went down.
>
> But I don't even know if this was done inside pacemaker, nor did I have
> time to think about something similar for our cluster.
>
> I just trust that four links using two switches with independant power
> will be safe enough...
I've done a suicide when the link goes away by looking at
/sys/class/net//<interface>//carrier
for example, cat /sys/class/net/eth0/carrier and see what it looks like...
It's 1 when the link is up, and 0 when it's down. You could presumably
write a script that uses that to set node attributes too...
--
Alan Robertson <alanr at unix.sh> - @OSSAlanR
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130314/508c55d8/attachment.htm>
More information about the Pacemaker
mailing list