[Pacemaker] Not unmoving colocated resources can provoke DRBD split-brain

Mon Jun 2 00:57:44 UTC 2014

On 31 May 2014, at 12:43 am, Robert Dahlem <Robert.Dahlem at gmx.net> wrote:

> Hi,
> 
> On 30.05.2014 13:20, Robert Dahlem wrote:
> 
>>> run crm_report for the period covered by these commands and attach the result:
>>> 
>>> # crm node standby korfwf01 ; sleep 10
>>> # crm node standby korfwf02 ; sleep 10
>>> # crm node online korfwf02 ; sleep 10
>>> # crm node online korfwf01 ; sleep 10
>>> # crm status
>> 
>> I filed a bug
>> 	http://bugs.clusterlabs.org/show_bug.cgi?id=5217
>> and attached the crm_report.
> 
> This seems to be some kind of a race condition: I added
> 	sleep 3
> to a central point in /usr/lib/ocf/resource.d/linbit/drbd.

Define central?

> As soon as I
> did that I could not reproduce the split brain.
> 
> Then I added some logging and repeated
> 	crm resource move ALL-ffm korfwf01
> 	crm node standby korfwf01
> 	crm node standby korfwf02
> 	crm node online korfwf02
> 	crm node online korfwf01
> 
> The following is what happens after
> 	crm node online korfwf01
> 
> Timestamp			korfwf01	korfwf02
> ========================================================
> Fri May 30 15:46:35 CEST 2014			notify
> Fri May 30 15:46:38 CEST 2014			notify
> Fri May 30 15:46:41 CEST 2014			demote
> Fri May 30 15:46:44 CEST 2014			notify
> Fri May 30 15:46:47 CEST 2014			notify
> Fri May 30 15:46:50 CEST 2014			stop
> Fri May 30 15:46:53 CEST 2014	start		start
> Fri May 30 15:46:58 CEST 2014	notify		notify
> Fri May 30 15:47:01 CEST 2014	notify		notify
> Fri May 30 15:47:04 CEST 2014	promote	
> Fri May 30 15:47:07 CEST 2014	notify		notify
> Fri May 30 15:47:10 CEST 2014	monitor		monitor
> 
> 1.) Note the parallel "start" at 15:46:53. This could very well end up
> in a race condition without "sleep 3".
> 
> 2.) Why is pacemaker doing "stop/start" at all on korfwf02?

This seems to correspond to 

May 23 13:29:31 korfwm01 pengine[5140]:   notice: LogActions: Move    stonith-korfwf02	(Started korfwm01 -> korfwf01)
May 23 13:29:31 korfwm01 pengine[5140]:   notice: LogActions: Move    ALL-ffm	(Started korfwf02 -> korfwf01)
May 23 13:29:31 korfwm01 pengine[5140]:   notice: LogActions: Demote  DRBD-ffm:0	(Master -> Slave korfwf02)
May 23 13:29:31 korfwm01 pengine[5140]:   notice: LogActions: Restart DRBD-ffm:0	(Slave korfwf02)
May 23 13:29:31 korfwm01 pengine[5140]:   notice: LogActions: Start   DRBD-ffm:1	(korfwf01)
May 23 13:29:31 korfwm01 pengine[5140]:   notice: LogActions: Promote DRBD-ffm:1	(Stopped -> Master korfwf01)
May 23 13:29:31 korfwm01 pengine[5140]:   notice: process_pe_message: Calculated Transition 843: /var/lib/pacemaker/pengine/pe-input-728.bz2

from your original tarball.

In that case, the cause is:

      <rsc_order id="ord-ALL-ffm-before-DRBD-ffm" score="INFINITY" first="ALL-ffm" then="ms-DRBD-ffm"/>

Which requires that ms-DRBD-ffm be completely stopped if ALL-ffm is stopped (which it is because its being moved to 01).
Perhaps you meant this?

      <rsc_order id="ord-ALL-ffm-before-DRBD-ffm" score="INFINITY" first="ALL-ffm" then="ms-DRBD-ffm" then-action="promote"/>

> 
> Now again, but without "sleep 3":
> 
> Timestamp			korfwf01	korfwf02
> ========================================================
> Fri May 30 16:23:24 CEST 2014			notify
> Fri May 30 16:23:26 CEST 2014			notify
> Fri May 30 16:23:26 CEST 2014	start		demote
> Fri May 30 16:23:26 CEST 2014			notify
> Fri May 30 16:23:26 CEST 2014			notify
> Fri May 30 16:23:26 CEST 2014			notify
> Fri May 30 16:23:26 CEST 2014			stop
> Fri May 30 16:23:26 CEST 2014			start
> Fri May 30 16:23:27 CEST 2014	notify		notify
> Fri May 30 16:23:28 CEST 2014	notify		notify
> Fri May 30 16:23:28 CEST 2014	promote	
> Fri May 30 16:23:28 CEST 2014	notify		notify
> Fri May 30 16:23:28 CEST 2014	monitor		monitor
> 
> Look at this excerpt from /var/log/messages:
> 
> 16:23:27 korfwf01 block drbd7: disk( Diskless -> Attaching )
> 16:23:27 korfwf01 block drbd7: disk( Attaching -> UpToDate )
> 16:23:27 korfwf01 drbd ffm: conn( StandAlone -> Unconnected )
> 16:23:27 korfwf01 drbd ffm: conn( Unconnected -> WFConnection )
> 16:23:28 korfwf01 block drbd7: role( Secondary -> Primary )
> 16:23:28 korfwf01 drbd ffm: conn( WFConnection -> WFReportParams )
> 16:23:28 korfwf01 drbd ffm: conn( WFReportParams -> NetworkFailure )
> 16:23:28 korfwf01 drbd ffm: conn( NetworkFailure -> Unconnected )
> 16:23:28 korfwf01 drbd ffm: conn( Unconnected -> WFConnection )
> 
> korfwf01 was not waiting for a connection before 16:23:27. At this time
> korfwf02 was stopped, but had the latest data. So korfwf01 came up
> before korfwf02 was up again -> split-brain!
> 
> Again: why is korfwf02 doing "stop/start" in this situation? Without
> this, korfwf02 would just do "demote" while korfwf01 would do
> "start/promote" and everything would be fine.
> 
> Kind regards,
> Robert
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140602/f6028113/attachment-0003.sig>