[Pacemaker] Not unmoving colocated resources can provoke DRBD split-brain
Andrew Beekhof
andrew at beekhof.net
Mon Jun 2 00:57:44 UTC 2014
On 31 May 2014, at 12:43 am, Robert Dahlem <Robert.Dahlem at gmx.net> wrote:
> Hi,
>
> On 30.05.2014 13:20, Robert Dahlem wrote:
>
>>> run crm_report for the period covered by these commands and attach the result:
>>>
>>> # crm node standby korfwf01 ; sleep 10
>>> # crm node standby korfwf02 ; sleep 10
>>> # crm node online korfwf02 ; sleep 10
>>> # crm node online korfwf01 ; sleep 10
>>> # crm status
>>
>> I filed a bug
>> http://bugs.clusterlabs.org/show_bug.cgi?id=5217
>> and attached the crm_report.
>
> This seems to be some kind of a race condition: I added
> sleep 3
> to a central point in /usr/lib/ocf/resource.d/linbit/drbd.
Define central?
> As soon as I
> did that I could not reproduce the split brain.
>
> Then I added some logging and repeated
> crm resource move ALL-ffm korfwf01
> crm node standby korfwf01
> crm node standby korfwf02
> crm node online korfwf02
> crm node online korfwf01
>
> The following is what happens after
> crm node online korfwf01
>
> Timestamp korfwf01 korfwf02
> ========================================================
> Fri May 30 15:46:35 CEST 2014 notify
> Fri May 30 15:46:38 CEST 2014 notify
> Fri May 30 15:46:41 CEST 2014 demote
> Fri May 30 15:46:44 CEST 2014 notify
> Fri May 30 15:46:47 CEST 2014 notify
> Fri May 30 15:46:50 CEST 2014 stop
> Fri May 30 15:46:53 CEST 2014 start start
> Fri May 30 15:46:58 CEST 2014 notify notify
> Fri May 30 15:47:01 CEST 2014 notify notify
> Fri May 30 15:47:04 CEST 2014 promote
> Fri May 30 15:47:07 CEST 2014 notify notify
> Fri May 30 15:47:10 CEST 2014 monitor monitor
>
> 1.) Note the parallel "start" at 15:46:53. This could very well end up
> in a race condition without "sleep 3".
>
> 2.) Why is pacemaker doing "stop/start" at all on korfwf02?
This seems to correspond to
May 23 13:29:31 korfwm01 pengine[5140]: notice: LogActions: Move stonith-korfwf02 (Started korfwm01 -> korfwf01)
May 23 13:29:31 korfwm01 pengine[5140]: notice: LogActions: Move ALL-ffm (Started korfwf02 -> korfwf01)
May 23 13:29:31 korfwm01 pengine[5140]: notice: LogActions: Demote DRBD-ffm:0 (Master -> Slave korfwf02)
May 23 13:29:31 korfwm01 pengine[5140]: notice: LogActions: Restart DRBD-ffm:0 (Slave korfwf02)
May 23 13:29:31 korfwm01 pengine[5140]: notice: LogActions: Start DRBD-ffm:1 (korfwf01)
May 23 13:29:31 korfwm01 pengine[5140]: notice: LogActions: Promote DRBD-ffm:1 (Stopped -> Master korfwf01)
May 23 13:29:31 korfwm01 pengine[5140]: notice: process_pe_message: Calculated Transition 843: /var/lib/pacemaker/pengine/pe-input-728.bz2
from your original tarball.
In that case, the cause is:
<rsc_order id="ord-ALL-ffm-before-DRBD-ffm" score="INFINITY" first="ALL-ffm" then="ms-DRBD-ffm"/>
Which requires that ms-DRBD-ffm be completely stopped if ALL-ffm is stopped (which it is because its being moved to 01).
Perhaps you meant this?
<rsc_order id="ord-ALL-ffm-before-DRBD-ffm" score="INFINITY" first="ALL-ffm" then="ms-DRBD-ffm" then-action="promote"/>
>
> Now again, but without "sleep 3":
>
> Timestamp korfwf01 korfwf02
> ========================================================
> Fri May 30 16:23:24 CEST 2014 notify
> Fri May 30 16:23:26 CEST 2014 notify
> Fri May 30 16:23:26 CEST 2014 start demote
> Fri May 30 16:23:26 CEST 2014 notify
> Fri May 30 16:23:26 CEST 2014 notify
> Fri May 30 16:23:26 CEST 2014 notify
> Fri May 30 16:23:26 CEST 2014 stop
> Fri May 30 16:23:26 CEST 2014 start
> Fri May 30 16:23:27 CEST 2014 notify notify
> Fri May 30 16:23:28 CEST 2014 notify notify
> Fri May 30 16:23:28 CEST 2014 promote
> Fri May 30 16:23:28 CEST 2014 notify notify
> Fri May 30 16:23:28 CEST 2014 monitor monitor
>
> Look at this excerpt from /var/log/messages:
>
> 16:23:27 korfwf01 block drbd7: disk( Diskless -> Attaching )
> 16:23:27 korfwf01 block drbd7: disk( Attaching -> UpToDate )
> 16:23:27 korfwf01 drbd ffm: conn( StandAlone -> Unconnected )
> 16:23:27 korfwf01 drbd ffm: conn( Unconnected -> WFConnection )
> 16:23:28 korfwf01 block drbd7: role( Secondary -> Primary )
> 16:23:28 korfwf01 drbd ffm: conn( WFConnection -> WFReportParams )
> 16:23:28 korfwf01 drbd ffm: conn( WFReportParams -> NetworkFailure )
> 16:23:28 korfwf01 drbd ffm: conn( NetworkFailure -> Unconnected )
> 16:23:28 korfwf01 drbd ffm: conn( Unconnected -> WFConnection )
>
> korfwf01 was not waiting for a connection before 16:23:27. At this time
> korfwf02 was stopped, but had the latest data. So korfwf01 came up
> before korfwf02 was up again -> split-brain!
>
> Again: why is korfwf02 doing "stop/start" in this situation? Without
> this, korfwf02 would just do "demote" while korfwf01 would do
> "start/promote" and everything would be fine.
>
> Kind regards,
> Robert
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140602/f6028113/attachment-0003.sig>
More information about the Pacemaker
mailing list