[Pacemaker] Seeking suggestions for cluster configuration of HA iSCSI target and initiators

Mon Jul 16 17:14:17 UTC 2012

On 07/16/2012 12:08 PM, Phil Frost wrote:
> I'm designing a cluster to run both iSCSI targets and initiators to
> ultimately provide block devices to virtual machines. I'm considering
> the case of a target failure, and how to handle that as gracefully as
> possible. Ideally, IO may be paused until the target recovers, but VMs
> do not restart or see IO errors.
> 
> I've observed that the iscsi RA will configure the initiator to retry
> connections indefinitely if the target should fail. This is mostly good,
> except that if the initiator is in the retrying state, the monitor
> action will return an error.
> 
> The Right Thing to do in this case, I would think, would be to just
> wait. Of course the initiators can't work if the target is down, but the
> initiators will recover automatically when the target recovers. Ideally
> the cluster would wait for the target (which it also manages) to
> recover, then try again to monitor the initiators. For good measure, it
> might try monitoring the initiators a couple times, since it can take
> them a moment to reconnect.
> 
> Unfortunately, what actually happens is the monitor action on the
> initiator fails. Pacemaker then attempts to stop the initiator, and that
> also fails, because the target is still unavailable. Then the initiator
> node gets STONITHed, taking out all the hosted VMs with it.
> 
> I added a mandatory, non-symmetrical order constraint of target ->
> initiator, so at least Pacemaker will not attempt to re-start the
> initiator after a target failure. I made it asymetrical so that restarts
> of the target do not force restarts of the initiator. However, it
> doesn't do much to help the failed-target case.
> 
> What's a good solution? Is there some way to suspend monitoring of the
> initiators if pacemaker knows the target is failed? I suppose I could
> modify the iscsi RA to return success for monitor in the case that the
> initiator is attempting to reconnect to the target, but then what if
> actually the initiator has failed, and the target is operational? What
> then about race conditions that might exist in cases where the target
> has failed, but pacemaker has not yet detected the target failure though
> a monitor operation?

I've only tested this a little, so please take it as a general
suggestion rather than strong advice.

I created a two-node cluster, using red hat's high-availability add-on,
using DRBD to keep the data replicated between the two "SAN" nodes and
tgtd to export the LUNs. I had a virtual IP on the cluster to act as the
target IP and I had DRBD in dual-primary mode with clustered LVM (so I
had DRBD as the PV and exported the space from the LVs).

Then I built a second cluster of five nodes to host KVM VMs. The
underlying nodes used clustered LVM as well, but this time the LUNs was
the PV. I carved this up into an LV per VM and made the VMs the HA
service. Again using RH HA-Addon.

In this setup, I was able to fail over the SAN without losing any VMs. I
even messed up the fencing on the SAN cluster once, which meant it took
>30s to fail over, and I didn't lose the VMs. So to the minimal extent I
tested it, it worked excellently.

I have some very rough notes on this setup. They're not fit for public
consumption at all, but if you'd like I'll send them to you directly.
They include the configurations which might help as a template or similar.

Digimer

-- 
Digimer
Papers and Projects: https://alteeve.com