[Pacemaker] create 2-node Active/Passive firewall cluster
David Lang
david at lang.hm
Thu Sep 19 14:04:23 UTC 2013
On Thu, 19 Sep 2013, Florian Crouzat wrote:
> Le 19/09/2013 11:43, David Lang a ?crit :
>>
>> I've been running active/failover firewall clusters with heartbeat since
>> about 2000, and one suggestion that I would make. If you can leave all
>> the daemons running all the time, the failover process is far more
>> robust (and faster since you don't have daemons to start). If you set
>> net.ipv4.ip_nonlocal_bind you can even have the daemons startup binding
>> to the VIP addresses that don't yet exist.
>>
>> If you do not have to have the daemons bound to the VIP, the fact that
>> they are always running on the backup box gives you a quick way to check
>> if a failover would solve the problem or not by having a client connect
>> directly to the second box. The drawback is that someone may configure
>> something to point directly at a box and not at a VIP and you won't
>> detect it (without log analysis) until the box they point at actually
>> goes down.
>>
>> David Lang
>
> I never thought about that, it seems it could be interesting, especially with
> slow (start|stop)ing daemons such as squid.
yes, if the daemons are started at boot time, you don't have to worry about some
subtle config error creeping in that prevents them from running when you need
them.
you can also monitor the availability of the backup firewall from your network
monitoring systems. Nothing's worse than having your primary fail, only to
discover that your backup wasn't working (especially over something like a bad
route that's not detected by the HA software that just runs on the local subnet)
> In my case, my daemons would be protected by the "passive firewall state"
> that my nodes have when they don't host resources.
Why? I know, the real answer is 'because it's the standby, and standby boxes
aren't active'. But is there really a need to do this? or it it just because?
If your systems are hardened to be a firewall, what difference does it make if
they are exposed or 'proteted by the passive firewall state'?
what do you gain by changing your firewall rules when you switch between active
and passive (and are you sure there is never an instant when your defenses are
down during this switch, I bring up the iptables rules before bringing up the
interfaces at boot)
if having something running on the primary and backup at the same time would
cause a conflict, then the HA software needs to manage it (shared disk or IP is
a good example), but otherwise it should be running at all times so that you
know it's healthy (you can monitor it) and to reduce the work needed at failover
time.
You should have both systems sending their logs to a central server, so from the
point of view of knowing what's happening, there really shouldn't be a
difference between the two systems, even if someone does deliberatly hit your
'backup' box
and speaking of primary and backup, if the boxes are identical hardware, it
really shouldn't matter which is active, so 'primary' and 'backup' are bad
names. It's best practice to regularly excercise your backup systems, and so
having your HA system treat the two as equal (except in the case of both booting
at the same time or recovering from split-brain when you need to designate who
wins the tie) lets you run for an extended time on either box
This also helps you avoid flapping where the primary has something wrong that
slows it down so it can't handle full load, but could handle partial load. under
load the primary fails, you failover to the backup, the primary recovers and
looks healthy, so you failover to the primary, which goes down because of the
load....
I've seen this be something as simple as blocked cooling where a box was fine
when idle, but overheated (and therefor the CPU throttled down to slower speeds
tutomatically) under load.
Ideally you do something like schedule a failover every month or quarter from
one box to the other, and just keep running on that box until the next failover.
It does mean that you need to check which box is active when you work on them,
but you should do that anyway :-)
David Lang
More information about the Pacemaker
mailing list