[Pacemaker] [DRBD-user] Performing crm failover using crm node standby

Florian Haas florian.haas at linbit.com
Wed Jun 22 10:10:46 UTC 2011


Sorry about cross-posting -- this issue may be interesting to
subscribers to both lists.

Hi Felix,

On 2011-06-22 10:53, Felix Frank wrote:
> Hi,
> 
> during testing, I noticed that the DRBD RA will happily stop a DRBD
> that's currently being SyncSource.
> So if I want to migrate my services to the peer by issuing a "crm node
> standby", the SyncSource goes away and leaves my cluster in a bad state
> because the peer cannot become master.

Don't Do That Then.™ ;)

> (Interesting: Even after the
> SyncSource resumes, the crm won't go ahead and promote either side - but
> that part is probably rooted rather deeply).
> 
> Would it make sense to have the stop action of ocf:linbit:drbd fail if
> the peer is known to be Inconsistent?

That would be attempting to put out a fire with gasoline. A failed stop
leads to fencing, and then you've got an inconsistent node and a dead node.

One thing that _could_ be done would be to wait, during stop, until a
running sync has completed. However, then if you hit your stop timeout
you're in the same situation as above. And setting a timeout that would
allow for a _full_ sync to complete (because that's what you'd
ultimately have to do to avoid this issue) is not exactly a sane
approach either.

Interesting issue, I must confess, and evidently not something that
we've thought about before. I might be missing an obvious solution, so
if others can pitch in please do. Otherwise, give us a little time to
think please. Thanks for highlighting this!

Cheers,
Florian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110622/42697e2d/attachment-0003.sig>


More information about the Pacemaker mailing list