[Pacemaker] ifdown ethX + corosync + DRBD = split-brain?

Viacheslav Dubrovskyi dubrsl at gmail.com
Thu Jul 25 10:28:29 UTC 2013


19.07.2013 14:38, Howley, Tom wrote:
> Hi,
>
> I have been doing some testing of a fairly standard pacemaker/corosync setup with DRBD (with resource-level fencing) and have noticed the following in relation to testing network failures:
>
> - Handling of all ports being blocked is OK, based on hundreds of tests.
> - Handling of cable-pulls seems OK, based on only 10 tests.
> - ifdown ethX leads to split-brain roughly 50% of the time due to two underlying issues:
>
> 1. corosync (possibly by design) handles loss of network interface differently to other network failures. I can only see this from the point of view of the   logs: "[TOTEM ] The network interface is down.", which is different from cable-pull log, where I don't see that message. I'm guessing this as I don't know the code.
> 2. corosync allows a non-quorate partition, in my case a single node, to update the CIB. This behaviour has been previously confirmed in reply to previous mails on this list and it has been mentioned that there may be improvements in this area in the future. This on its own seems like a bug to me.
>
> My question is: is it possible for me to configure corosync/drbd to handle the ifdown scenario or do I simply have to tell people "do not test with ifdown", as I have seen mentioned in a few places on the web? If I do have to leave out ifdown testing, how can I be sure that I haven't missed out testing some real network failure scenario.
When you shut down an interface, IP is removed. As a result, DRBD can 
not bind to IP.
In real life, it's not going to happen. So just tell people "do not test 
with ifdown".

-- 
WBR,
Viacheslav Dubrovskyi





More information about the Pacemaker mailing list