[Pacemaker] DRBD primary/primary + Pacemaker goes into split brain after crm node standby/online

Andrew Beekhof andrew at beekhof.net
Mon Jun 9 19:44:35 EDT 2014


On 10 Jun 2014, at 4:07 am, Alexis de BRUYN <alexis.mailinglist at de-bruyn.fr> wrote:

> Hi Everybody,
> 
> I have an issue with a 2-node Debian Wheezy primary/primary DRBD
> Pacemaker/Corosync configuration.
> 
> After a 'crm node standby' then a 'crm node online', the DRBD volume
> stays in a 'split brain state' (cs:StandAlone ro:Primary/Unknown).
> 
> A soft or hard reboot of one node gets rid of the split brain and/or
> doesn't create one.
> 
> I have followed http://www.drbd.org/users-guide-8.3/ and keep my tests
> as simple as possible (no activity and no filesystem on the DRBD volume).
> 
> I don't see what I am doing wrong. Could anybody help me with this please.

There could be a pacemaker bug.  
Master/slave resources are quite complex internally and have received many improvements in the years since 1.1.7.
So simply upgrading pacemaker could be the answer.

> 
> Regards,
> 
> Alexis.
> 
> Here are my config and log files:
> 
> # cat /etc/issue
> Debian GNU/Linux 7 \n \l
> 
> # uname -a
> Linux testvm2 3.2.0-4-amd64 #1 SMP Debian 3.2.57-3+deb7u2 x86_64 GNU/Linux
> 
> # dpkg -l | grep corosync
> ii  corosync                        1.4.2-3                       amd64
>       Standards-based cluster framework (daemon and modules)
> 
> # dpkg -l | grep pacemaker
> ii  pacemaker                       1.1.7-1                       amd64
>       HA cluster resource manager
> 
> # dpkg -l | grep drbd
> ii  drbd8-utils                     2:8.3.13-2                    amd64
>       RAID 1 over tcp/ip for Linux utilities
> 
> # cat /etc/drbd.d/sda4.res
> resource sda4 {
> device /dev/drbd0;
> disk /dev/sda4;
> meta-disk internal;
> 
>  startup {
>    become-primary-on both;
>  }
> 
>  handlers {
>    split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>  }
> 
>  net {
>    allow-two-primaries;
>    after-sb-0pri discard-zero-changes;
>    after-sb-1pri discard-secondary;
>    after-sb-2pri disconnect;
>  }
> on testvm1 {
>  address 192.168.1.201:7788;
> }
> on testvm2 {
>  address 192.168.1.202:7788;
> }
> 
> syncer {
>  rate 100M;
>  al-extents 3389;
> }
> }
> 
> # crm configure show
> node testvm1
> node testvm2 \
> 	attributes standby="off"
> primitive prim-DRBD-data ocf:linbit:drbd \
> 	params drbd_resource="sda4" \
> 	operations $id="operations-DRBD-sda4" \
> 	op monitor interval="10" role="Master" timeout="20" \
> 	op monitor interval="20" role="Slave" timeout="20" \
> 	op start interval="0" timeout="240s" \
> 	op stop interval="0" timeout="100s"
> ms ms-DRBD-data prim-DRBD-data \
> 	meta master-max="2" clone-max="2" notify="true" target-role="Master"
> property $id="cib-bootstrap-options" \
> 	dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
> 	cluster-infrastructure="openais" \
> 	expected-quorum-votes="2" \
> 	no-quorum-policy="ignore" \
> 	stonith-enabled="false" \
> 	default-resource-stickiness="1000"
> 
> # crm status
> ============
> Last updated: Mon Jun  9 19:23:08 2014
> Last change: Mon Jun  9 19:15:50 2014 via cibadmin on testvm2
> Stack: openais
> Current DC: testvm1 - partition with quorum
> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
> 
> Online: [ testvm1 testvm2 ]
> 
> Master/Slave Set: ms-DRBD-data [prim-DRBD-data]
>     Masters: [ testvm1 testvm2 ]
> 
> 
> # cat /proc/drbd
> version: 8.3.11 (api:88/proto:86-96)
> srcversion: F937DCB2E5D83C6CCE4A6C9
> 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
>    ns:0 nr:0 dw:0 dr:88 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 
> # crm node standby testvm2
> 
> # cat /proc/drbd
> version: 8.3.11 (api:88/proto:86-96)
> srcversion: F937DCB2E5D83C6CCE4A6C9
> 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
>    ns:0 nr:0 dw:0 dr:88 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 
> # crm node online testvm2
> 
> # cat /proc/drbd
> version: 8.3.11 (api:88/proto:86-96)
> srcversion: F937DCB2E5D83C6CCE4A6C9
> 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>    ns:0 nr:0 dw:0 dr:88 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 
> # crm status
> ============
> Last updated: Mon Jun  9 19:56:07 2014
> Last change: Mon Jun  9 19:43:24 2014 via crm_attribute on testvm2
> Stack: openais
> Current DC: testvm2 - partition with quorum
> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
> 
> Online: [ testvm1 testvm2 ]
> 
> Master/Slave Set: ms-DRBD-data [prim-DRBD-data]
>     Masters: [ testvm1 testvm2 ]
> 
> 
> # cat /var/log/daemon.log
> Jun  9 19:37:03 testvm2 cib: [2168]: info: cib_stats: Processed 178
> operations (393.00us average, 0% utilization) in the last 10min
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: - <cib
> admin_epoch="0" epoch="23" num_updates="65" >
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: -   <configuration >
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: -     <nodes >
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: -       <node
> id="testvm2" >
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: -
> <instance_attributes id="nodes-testvm2" >
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: -           <nvpair
> value="off" id="nodes-testvm2-standby" />
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: -
> </instance_attributes>
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: -       </node>
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: -     </nodes>
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: -   </configuration>
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: - </cib>
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: + <cib epoch="24"
> num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2"
> crm_feature_set="3.0.6" update-origin="testvm2"
> update-client="crm_attribute" cib-last-written="Mon Jun  9 19:28:24
> 2014" have-quorum="1" dc-uuid="testvm2" >
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: +   <configuration >
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: +     <nodes >
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: +       <node
> id="testvm2" type="normal" uname="testvm2" >
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: +
> <instance_attributes id="nodes-testvm2" >
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: +           <nvpair
> id="nodes-testvm2-standby" name="standby" value="on" />
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: +
> </instance_attributes>
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: +       </node>
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: +     </nodes>
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: +   </configuration>
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib:diff: + </cib>
> Jun  9 19:42:06 testvm2 cib: [2168]: info: cib_process_request:
> Operation complete: op cib_modify for section nodes
> (origin=local/crm_attribute/4, version=0.24.1): ok (rc=0)
> Jun  9 19:42:06 testvm2 crmd: [2173]: info: abort_transition_graph:
> te_update_diff:126 - Triggered transition abort (complete=1, tag=diff,
> id=(null), magic=NA, cib=0.24.1) : Non-status change
> Jun  9 19:42:06 testvm2 crmd: [2173]: notice: do_state_transition: State
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Jun  9 19:42:06 testvm2 pengine: [2172]: notice: unpack_config: On loss
> of CCM Quorum: Ignore
> Jun  9 19:42:06 testvm2 pengine: [2172]: notice: LogActions: Demote
> prim-DRBD-data:1#011(Master -> Stopped testvm2)
> Jun  9 19:42:06 testvm2 crmd: [2173]: notice: do_state_transition: State
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=handle_response ]
> Jun  9 19:42:06 testvm2 crmd: [2173]: info: do_te_invoke: Processing
> graph 4 (ref=pe_calc-dc-1402335726-65) derived from
> /var/lib/pengine/pe-input-10.bz2
> Jun  9 19:42:06 testvm2 crmd: [2173]: info: te_rsc_command: Initiating
> action 48: notify prim-DRBD-data:0_pre_notify_demote_0 on testvm1
> Jun  9 19:42:06 testvm2 crmd: [2173]: info: te_rsc_command: Initiating
> action 50: notify prim-DRBD-data:1_pre_notify_demote_0 on testvm2 (local)
> Jun  9 19:42:06 testvm2 lrmd: [2170]: info: rsc:prim-DRBD-data:1
> notify[29] (pid 8047)
> Jun  9 19:42:06 testvm2 lrmd: [2170]: info: operation notify[29] on
> prim-DRBD-data:1 for client 2173: pid 8047 exited with return code 0
> Jun  9 19:42:06 testvm2 crmd: [2173]: info: process_lrm_event: LRM
> operation prim-DRBD-data:1_notify_0 (call=29, rc=0, cib-update=0,
> confirmed=true) ok
> Jun  9 19:42:06 testvm2 pengine: [2172]: notice: process_pe_message:
> Transition 4: PEngine Input stored in: /var/lib/pengine/pe-input-10.bz2
> Jun  9 19:42:06 testvm2 crmd: [2173]: info: te_rsc_command: Initiating
> action 11: demote prim-DRBD-data:1_demote_0 on testvm2 (local)
> Jun  9 19:42:06 testvm2 lrmd: [2170]: info: cancel_op: operation
> monitor[24] on prim-DRBD-data:1 for client 2173, its parameters:
> drbd_resource=[sda4] CRM_meta_role=[Master]
> CRM_meta_notify_stop_resource=[ ]
> CRM_meta_notify_start_resource=[prim-DRBD-data:1 ]
> CRM_meta_notify_inactive_resource=[prim-DRBD-data:1 ]
> CRM_meta_notify_master_uname=[testvm1 ] CRM_meta_timeout=[20000]
> CRM_meta_name=[monitor] CRM_meta_notify_demote_resource=[ ]
> CRM_meta_notify_promote_uname=[ ] crm_feature_set=[3.0.6]
> CRM_meta_notify=[true] CRM_meta_notify_start_uname=[testvm2 ] CRM_meta
> cancelled
> Jun  9 19:42:06 testvm2 lrmd: [2170]: info: rsc:prim-DRBD-data:1
> demote[30] (pid 8070)
> Jun  9 19:42:06 testvm2 crmd: [2173]: info: process_lrm_event: LRM
> operation prim-DRBD-data:1_monitor_10000 (call=24, status=1,
> cib-update=0, confirmed=true) Cancelled
> Jun  9 19:42:06 testvm2 lrmd: [2170]: info: RA output:
> (prim-DRBD-data:1:demote:stdout)
> Jun  9 19:42:06 testvm2 lrmd: [2170]: info: operation demote[30] on
> prim-DRBD-data:1 for client 2173: pid 8070 exited with return code 0
> Jun  9 19:42:06 testvm2 crmd: [2173]: info: process_lrm_event: LRM
> operation prim-DRBD-data:1_demote_0 (call=30, rc=0, cib-update=76,
> confirmed=true) ok
> Jun  9 19:42:06 testvm2 crmd: [2173]: info: te_rsc_command: Initiating
> action 49: notify prim-DRBD-data:0_post_notify_demote_0 on testvm1
> Jun  9 19:42:06 testvm2 crmd: [2173]: info: te_rsc_command: Initiating
> action 51: notify prim-DRBD-data:1_post_notify_demote_0 on testvm2 (local)
> Jun  9 19:42:06 testvm2 lrmd: [2170]: info: rsc:prim-DRBD-data:1
> notify[31] (pid 8104)
> Jun  9 19:42:06 testvm2 lrmd: [2170]: info: RA output:
> (prim-DRBD-data:1:notify:stdout)
> Jun  9 19:42:06 testvm2 lrmd: [2170]: info: operation notify[31] on
> prim-DRBD-data:1 for client 2173: pid 8104 exited with return code 0
> Jun  9 19:42:06 testvm2 crmd: [2173]: info: process_lrm_event: LRM
> operation prim-DRBD-data:1_notify_0 (call=31, rc=0, cib-update=0,
> confirmed=true) ok
> Jun  9 19:42:07 testvm2 crmd: [2173]: info: te_rsc_command: Initiating
> action 43: notify prim-DRBD-data:0_pre_notify_stop_0 on testvm1
> Jun  9 19:42:07 testvm2 crmd: [2173]: info: te_rsc_command: Initiating
> action 45: notify prim-DRBD-data:1_pre_notify_stop_0 on testvm2 (local)
> Jun  9 19:42:07 testvm2 lrmd: [2170]: info: rsc:prim-DRBD-data:1
> notify[32] (pid 8132)
> Jun  9 19:42:07 testvm2 lrmd: [2170]: info: operation notify[32] on
> prim-DRBD-data:1 for client 2173: pid 8132 exited with return code 0
> Jun  9 19:42:07 testvm2 crmd: [2173]: info: process_lrm_event: LRM
> operation prim-DRBD-data:1_notify_0 (call=32, rc=0, cib-update=0,
> confirmed=true) ok
> Jun  9 19:42:07 testvm2 crmd: [2173]: info: te_rsc_command: Initiating
> action 12: stop prim-DRBD-data:1_stop_0 on testvm2 (local)
> Jun  9 19:42:07 testvm2 lrmd: [2170]: info: rsc:prim-DRBD-data:1
> stop[33] (pid 8155)
> Jun  9 19:42:07 testvm2 lrmd: [2170]: info: RA output:
> (prim-DRBD-data:1:stop:stdout)
> Jun  9 19:42:07 testvm2 attrd: [2171]: notice: attrd_trigger_update:
> Sending flush op to all hosts for: master-prim-DRBD-data:1 (<null>)
> Jun  9 19:42:07 testvm2 attrd: [2171]: notice: attrd_perform_update:
> Sent delete 39: node=testvm2, attr=master-prim-DRBD-data:1, id=<n/a>,
> set=(null), section=status
> Jun  9 19:42:07 testvm2 lrmd: [2170]: info: operation stop[33] on
> prim-DRBD-data:1 for client 2173: pid 8155 exited with return code 0
> Jun  9 19:42:07 testvm2 lrmd: [2170]: info: RA output:
> (prim-DRBD-data:1:stop:stdout)
> Jun  9 19:42:07 testvm2 crmd: [2173]: info: abort_transition_graph:
> te_update_diff:194 - Triggered transition abort (complete=0,
> tag=transient_attributes, id=testvm2, magic=NA, cib=0.24.3) : Transient
> attribute: removal
> Jun  9 19:42:07 testvm2 crmd: [2173]: info: process_lrm_event: LRM
> operation prim-DRBD-data:1_stop_0 (call=33, rc=0, cib-update=77,
> confirmed=true) ok
> Jun  9 19:42:07 testvm2 attrd: [2171]: notice: attrd_perform_update:
> Sent delete -22: node=testvm2, attr=master-prim-DRBD-data:1, id=<n/a>,
> set=(null), section=status
> Jun  9 19:42:07 testvm2 crmd: [2173]: info: te_rsc_command: Initiating
> action 44: notify prim-DRBD-data:0_post_notify_stop_0 on testvm1
> Jun  9 19:42:07 testvm2 crmd: [2173]: notice: run_graph: ==== Transition
> 4 (Complete=21, Pending=0, Fired=0, Skipped=1, Incomplete=0,
> Source=/var/lib/pengine/pe-input-10.bz2): Stopped
> Jun  9 19:42:07 testvm2 crmd: [2173]: notice: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Jun  9 19:42:07 testvm2 pengine: [2172]: notice: unpack_config: On loss
> of CCM Quorum: Ignore
> Jun  9 19:42:07 testvm2 crmd: [2173]: notice: do_state_transition: State
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=handle_response ]
> Jun  9 19:42:07 testvm2 crmd: [2173]: info: do_te_invoke: Processing
> graph 5 (ref=pe_calc-dc-1402335727-78) derived from
> /var/lib/pengine/pe-input-11.bz2
> Jun  9 19:42:07 testvm2 crmd: [2173]: notice: run_graph: ==== Transition
> 5 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pengine/pe-input-11.bz2): Complete
> Jun  9 19:42:07 testvm2 crmd: [2173]: notice: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Jun  9 19:42:07 testvm2 pengine: [2172]: notice: process_pe_message:
> Transition 5: PEngine Input stored in: /var/lib/pengine/pe-input-11.bz2
> 
> # cat /var/log/daemon.log
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: - <cib
> admin_epoch="0" epoch="24" num_updates="4" >
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: -   <configuration >
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: -     <nodes >
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: -       <node
> id="testvm2" >
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: -
> <instance_attributes id="nodes-testvm2" >
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: -           <nvpair
> value="on" id="nodes-testvm2-standby" />
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: -
> </instance_attributes>
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: -       </node>
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: -     </nodes>
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: -   </configuration>
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: - </cib>
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: + <cib epoch="25"
> num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2"
> crm_feature_set="3.0.6" update-origin="testvm2"
> update-client="crm_attribute" cib-last-written="Mon Jun  9 19:42:06
> 2014" have-quorum="1" dc-uuid="testvm2" >
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: +   <configuration >
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: +     <nodes >
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: +       <node
> id="testvm2" type="normal" uname="testvm2" >
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: +
> <instance_attributes id="nodes-testvm2" >
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: +           <nvpair
> id="nodes-testvm2-standby" name="standby" value="off" />
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: +
> </instance_attributes>
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: +       </node>
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: +     </nodes>
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: +   </configuration>
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib:diff: + </cib>
> Jun  9 19:43:24 testvm2 cib: [2168]: info: cib_process_request:
> Operation complete: op cib_modify for section nodes
> (origin=local/crm_attribute/4, version=0.25.1): ok (rc=0)
> Jun  9 19:43:24 testvm2 crmd: [2173]: info: abort_transition_graph:
> te_update_diff:126 - Triggered transition abort (complete=1, tag=diff,
> id=(null), magic=NA, cib=0.25.1) : Non-status change
> Jun  9 19:43:24 testvm2 crmd: [2173]: notice: do_state_transition: State
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Jun  9 19:43:24 testvm2 pengine: [2172]: notice: unpack_config: On loss
> of CCM Quorum: Ignore
> Jun  9 19:43:24 testvm2 pengine: [2172]: notice: LogActions: Start
> prim-DRBD-data:1#011(testvm2)
> Jun  9 19:43:24 testvm2 crmd: [2173]: notice: do_state_transition: State
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=handle_response ]
> Jun  9 19:43:24 testvm2 crmd: [2173]: info: do_te_invoke: Processing
> graph 6 (ref=pe_calc-dc-1402335804-79) derived from
> /var/lib/pengine/pe-input-12.bz2
> Jun  9 19:43:24 testvm2 crmd: [2173]: info: te_rsc_command: Initiating
> action 40: notify prim-DRBD-data:0_pre_notify_start_0 on testvm1
> Jun  9 19:43:24 testvm2 pengine: [2172]: notice: process_pe_message:
> Transition 6: PEngine Input stored in: /var/lib/pengine/pe-input-12.bz2
> Jun  9 19:43:24 testvm2 crmd: [2173]: info: te_rsc_command: Initiating
> action 10: start prim-DRBD-data:1_start_0 on testvm2 (local)
> Jun  9 19:43:24 testvm2 lrmd: [2170]: info: rsc:prim-DRBD-data:1
> start[34] (pid 8209)
> Jun  9 19:43:24 testvm2 lrmd: [2170]: info: RA output:
> (prim-DRBD-data:1:start:stdout)         allow-two-primaries;
> Jun  9 19:43:24 testvm2 lrmd: [2170]: info: RA output:
> (prim-DRBD-data:1:start:stdout)
> Jun  9 19:43:24 testvm2 lrmd: [2170]: info: RA output:
> (prim-DRBD-data:1:start:stdout)
> Jun  9 19:43:24 testvm2 lrmd: [2170]: info: RA output:
> (prim-DRBD-data:1:start:stdout)
> Jun  9 19:43:24 testvm2 lrmd: [2170]: info: RA output:
> (prim-DRBD-data:1:start:stdout)
> Jun  9 19:43:24 testvm2 attrd: [2171]: notice: attrd_trigger_update:
> Sending flush op to all hosts for: master-prim-DRBD-data:1 (1000)
> Jun  9 19:43:24 testvm2 attrd: [2171]: notice: attrd_perform_update:
> Sent update 43: master-prim-DRBD-data:1=1000
> Jun  9 19:43:24 testvm2 lrmd: [2170]: info: operation start[34] on
> prim-DRBD-data:1 for client 2173: pid 8209 exited with return code 0
> Jun  9 19:43:24 testvm2 lrmd: [2170]: info: RA output:
> (prim-DRBD-data:1:start:stdout)
> Jun  9 19:43:24 testvm2 crmd: [2173]: info: abort_transition_graph:
> te_update_diff:176 - Triggered transition abort (complete=0, tag=nvpair,
> id=status-testvm2-master-prim-DRBD-data.1, name=master-prim-DRBD-data:1,
> value=1000, magic=NA, cib=0.25.2) : Transient attribute: update
> Jun  9 19:43:24 testvm2 crmd: [2173]: info: process_lrm_event: LRM
> operation prim-DRBD-data:1_start_0 (call=34, rc=0, cib-update=80,
> confirmed=true) ok
> Jun  9 19:43:24 testvm2 crmd: [2173]: info: te_rsc_command: Initiating
> action 41: notify prim-DRBD-data:0_post_notify_start_0 on testvm1
> Jun  9 19:43:24 testvm2 crmd: [2173]: info: te_rsc_command: Initiating
> action 42: notify prim-DRBD-data:1_post_notify_start_0 on testvm2 (local)
> Jun  9 19:43:24 testvm2 lrmd: [2170]: info: rsc:prim-DRBD-data:1
> notify[35] (pid 8287)
> Jun  9 19:43:24 testvm2 lrmd: [2170]: info: RA output:
> (prim-DRBD-data:1:notify:stdout)
> Jun  9 19:43:24 testvm2 lrmd: [2170]: info: operation notify[35] on
> prim-DRBD-data:1 for client 2173: pid 8287 exited with return code 0
> Jun  9 19:43:24 testvm2 crmd: [2173]: info: process_lrm_event: LRM
> operation prim-DRBD-data:1_notify_0 (call=35, rc=0, cib-update=0,
> confirmed=true) ok
> Jun  9 19:43:24 testvm2 crmd: [2173]: notice: run_graph: ==== Transition
> 6 (Complete=10, Pending=0, Fired=0, Skipped=1, Incomplete=0,
> Source=/var/lib/pengine/pe-input-12.bz2): Stopped
> Jun  9 19:43:24 testvm2 crmd: [2173]: notice: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Jun  9 19:43:24 testvm2 pengine: [2172]: notice: unpack_config: On loss
> of CCM Quorum: Ignore
> Jun  9 19:43:24 testvm2 pengine: [2172]: notice: LogActions: Promote
> prim-DRBD-data:1#011(Slave -> Master testvm2)
> Jun  9 19:43:24 testvm2 crmd: [2173]: notice: do_state_transition: State
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=handle_response ]
> Jun  9 19:43:24 testvm2 crmd: [2173]: info: do_te_invoke: Processing
> graph 7 (ref=pe_calc-dc-1402335804-85) derived from
> /var/lib/pengine/pe-input-13.bz2
> Jun  9 19:43:24 testvm2 crmd: [2173]: info: te_rsc_command: Initiating
> action 49: notify prim-DRBD-data:0_pre_notify_promote_0 on testvm1
> Jun  9 19:43:24 testvm2 crmd: [2173]: info: te_rsc_command: Initiating
> action 51: notify prim-DRBD-data:1_pre_notify_promote_0 on testvm2 (local)
> Jun  9 19:43:24 testvm2 lrmd: [2170]: info: rsc:prim-DRBD-data:1
> notify[36] (pid 8317)
> Jun  9 19:43:24 testvm2 lrmd: [2170]: info: operation notify[36] on
> prim-DRBD-data:1 for client 2173: pid 8317 exited with return code 0
> Jun  9 19:43:24 testvm2 crmd: [2173]: info: process_lrm_event: LRM
> operation prim-DRBD-data:1_notify_0 (call=36, rc=0, cib-update=0,
> confirmed=true) ok
> Jun  9 19:43:24 testvm2 pengine: [2172]: notice: process_pe_message:
> Transition 7: PEngine Input stored in: /var/lib/pengine/pe-input-13.bz2
> Jun  9 19:43:24 testvm2 crmd: [2173]: info: te_rsc_command: Initiating
> action 13: promote prim-DRBD-data:1_promote_0 on testvm2 (local)
> Jun  9 19:43:24 testvm2 lrmd: [2170]: info: rsc:prim-DRBD-data:1
> promote[37] (pid 8340)
> Jun  9 19:43:24 testvm2 lrmd: [2170]: info: RA output:
> (prim-DRBD-data:1:promote:stdout)         allow-two-primaries;
> Jun  9 19:43:25 testvm2 lrmd: [2170]: info: RA output:
> (prim-DRBD-data:1:promote:stdout)
> Jun  9 19:43:25 testvm2 lrmd: [2170]: info: operation promote[37] on
> prim-DRBD-data:1 for client 2173: pid 8340 exited with return code 0
> Jun  9 19:43:25 testvm2 crmd: [2173]: info: process_lrm_event: LRM
> operation prim-DRBD-data:1_promote_0 (call=37, rc=0, cib-update=82,
> confirmed=true) ok
> Jun  9 19:43:25 testvm2 crmd: [2173]: info: te_rsc_command: Initiating
> action 50: notify prim-DRBD-data:0_post_notify_promote_0 on testvm1
> Jun  9 19:43:25 testvm2 crmd: [2173]: info: te_rsc_command: Initiating
> action 52: notify prim-DRBD-data:1_post_notify_promote_0 on testvm2 (local)
> Jun  9 19:43:25 testvm2 lrmd: [2170]: info: rsc:prim-DRBD-data:1
> notify[38] (pid 8371)
> Jun  9 19:43:25 testvm2 attrd: [2171]: notice: attrd_trigger_update:
> Sending flush op to all hosts for: master-prim-DRBD-data:1 (10000)
> Jun  9 19:43:25 testvm2 attrd: [2171]: notice: attrd_perform_update:
> Sent update 45: master-prim-DRBD-data:1=10000
> Jun  9 19:43:25 testvm2 lrmd: [2170]: info: operation notify[38] on
> prim-DRBD-data:1 for client 2173: pid 8371 exited with return code 0
> Jun  9 19:43:25 testvm2 lrmd: [2170]: info: RA output:
> (prim-DRBD-data:1:notify:stdout)
> Jun  9 19:43:25 testvm2 crmd: [2173]: info: abort_transition_graph:
> te_update_diff:176 - Triggered transition abort (complete=0, tag=nvpair,
> id=status-testvm2-master-prim-DRBD-data.1, name=master-prim-DRBD-data:1,
> value=10000, magic=NA, cib=0.25.5) : Transient attribute: update
> Jun  9 19:43:25 testvm2 crmd: [2173]: info: process_lrm_event: LRM
> operation prim-DRBD-data:1_notify_0 (call=38, rc=0, cib-update=0,
> confirmed=true) ok
> Jun  9 19:43:25 testvm2 crmd: [2173]: notice: run_graph: ==== Transition
> 7 (Complete=11, Pending=0, Fired=0, Skipped=1, Incomplete=0,
> Source=/var/lib/pengine/pe-input-13.bz2): Stopped
> Jun  9 19:43:25 testvm2 crmd: [2173]: notice: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Jun  9 19:43:25 testvm2 pengine: [2172]: notice: unpack_config: On loss
> of CCM Quorum: Ignore
> Jun  9 19:43:25 testvm2 crmd: [2173]: notice: do_state_transition: State
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=handle_response ]
> Jun  9 19:43:25 testvm2 crmd: [2173]: info: do_te_invoke: Processing
> graph 8 (ref=pe_calc-dc-1402335805-93) derived from
> /var/lib/pengine/pe-input-14.bz2
> Jun  9 19:43:25 testvm2 crmd: [2173]: notice: run_graph: ==== Transition
> 8 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pengine/pe-input-14.bz2): Complete
> Jun  9 19:43:25 testvm2 crmd: [2173]: notice: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Jun  9 19:43:25 testvm2 pengine: [2172]: notice: process_pe_message:
> Transition 8: PEngine Input stored in: /var/lib/pengine/pe-input-14.bz2
> 
> 
> 
> # cat /var/log/messages
> Jun  9 19:42:06 testvm2 kernel: [ 1513.906368] block drbd0: role(
> Primary -> Secondary )
> Jun  9 19:42:06 testvm2 kernel: [ 1513.906381] block drbd0: bitmap WRITE
> of 0 pages took 0 jiffies
> Jun  9 19:42:06 testvm2 kernel: [ 1513.914433] block drbd0: 0 KB (0
> bits) marked out-of-sync by on disk bit-map.
> Jun  9 19:42:07 testvm2 kernel: [ 1514.131720] block drbd0: peer(
> Primary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate
> -> DUnknown )
> Jun  9 19:42:07 testvm2 kernel: [ 1514.141106] block drbd0: asender
> terminated
> Jun  9 19:42:07 testvm2 kernel: [ 1514.141109] block drbd0: Terminating
> drbd0_asender
> Jun  9 19:42:07 testvm2 kernel: [ 1514.141139] block drbd0: Connection
> closed
> Jun  9 19:42:07 testvm2 kernel: [ 1514.141144] block drbd0: conn(
> Disconnecting -> StandAlone )
> Jun  9 19:42:07 testvm2 kernel: [ 1514.141159] block drbd0: receiver
> terminated
> Jun  9 19:42:07 testvm2 kernel: [ 1514.141162] block drbd0: Terminating
> drbd0_receiver
> Jun  9 19:42:07 testvm2 kernel: [ 1514.141202] block drbd0: disk(
> UpToDate -> Failed )
> Jun  9 19:42:07 testvm2 kernel: [ 1514.141236] block drbd0: disk( Failed
> -> Diskless )
> Jun  9 19:42:07 testvm2 kernel: [ 1514.141482] block drbd0:
> drbd_bm_resize called with capacity == 0
> Jun  9 19:42:07 testvm2 kernel: [ 1514.141491] block drbd0: worker
> terminated
> Jun  9 19:42:07 testvm2 kernel: [ 1514.141493] block drbd0: Terminating
> drbd0_worker
> Jun  9 19:43:24 testvm2 kernel: [ 1591.733037] block drbd0: Starting
> worker thread (from drbdsetup [8256])
> Jun  9 19:43:24 testvm2 kernel: [ 1591.742919] block drbd0: disk(
> Diskless -> Attaching )
> Jun  9 19:43:24 testvm2 kernel: [ 1591.759344] block drbd0: No usable
> activity log found.
> Jun  9 19:43:24 testvm2 kernel: [ 1591.759348] block drbd0: Method to
> ensure write ordering: flush
> Jun  9 19:43:24 testvm2 kernel: [ 1591.759353] block drbd0:
> drbd_bm_resize called with capacity == 9764488
> Jun  9 19:43:24 testvm2 kernel: [ 1591.759393] block drbd0: resync
> bitmap: bits=1220561 words=19072 pages=38
> Jun  9 19:43:24 testvm2 kernel: [ 1591.759397] block drbd0: size = 4768
> MB (4882244 KB)
> Jun  9 19:43:24 testvm2 kernel: [ 1591.767962] block drbd0: bitmap READ
> of 38 pages took 2 jiffies
> Jun  9 19:43:24 testvm2 kernel: [ 1591.768102] block drbd0: recounting
> of set bits took additional 1 jiffies
> Jun  9 19:43:24 testvm2 kernel: [ 1591.768104] block drbd0: 0 KB (0
> bits) marked out-of-sync by on disk bit-map.
> Jun  9 19:43:24 testvm2 kernel: [ 1591.768107] block drbd0: disk(
> Attaching -> UpToDate )
> Jun  9 19:43:24 testvm2 kernel: [ 1591.768110] block drbd0: attached to
> UUIDs F58AAE5FEA6E466C:0000000000000000:E9AC0BE3EFC7924E:E9AB0BE3EFC7924E
> Jun  9 19:43:24 testvm2 kernel: [ 1591.781631] block drbd0: conn(
> StandAlone -> Unconnected )
> Jun  9 19:43:24 testvm2 kernel: [ 1591.781639] block drbd0: Starting
> receiver thread (from drbd0_worker [8258])
> Jun  9 19:43:24 testvm2 kernel: [ 1591.782767] block drbd0: receiver
> (re)started
> Jun  9 19:43:24 testvm2 kernel: [ 1591.782771] block drbd0: conn(
> Unconnected -> WFConnection )
> Jun  9 19:43:25 testvm2 kernel: [ 1591.993951] block drbd0: role(
> Secondary -> Primary )
> Jun  9 19:43:25 testvm2 kernel: [ 1592.008682] block drbd0: new current
> UUID 8627192EB0F0CFCD:F58AAE5FEA6E466C:E9AC0BE3EFC7924E:E9AB0BE3EFC7924E
> Jun  9 19:43:25 testvm2 kernel: [ 1592.280386] block drbd0: Handshake
> successful: Agreed network protocol version 96
> Jun  9 19:43:25 testvm2 kernel: [ 1592.280394] block drbd0: conn(
> WFConnection -> WFReportParams )
> Jun  9 19:43:25 testvm2 kernel: [ 1592.280412] block drbd0: Starting
> asender thread (from drbd0_receiver [8272])
> Jun  9 19:43:25 testvm2 kernel: [ 1592.280572] block drbd0:
> data-integrity-alg: <not-used>
> Jun  9 19:43:25 testvm2 kernel: [ 1592.280613] block drbd0:
> drbd_sync_handshake:
> Jun  9 19:43:25 testvm2 kernel: [ 1592.280616] block drbd0: self
> 8627192EB0F0CFCD:F58AAE5FEA6E466C:E9AC0BE3EFC7924E:E9AB0BE3EFC7924E
> bits:0 flags:0
> Jun  9 19:43:25 testvm2 kernel: [ 1592.280621] block drbd0: peer
> EC5512EE0B9C4DA1:F58AAE5FEA6E466D:E9AC0BE3EFC7924E:E9AB0BE3EFC7924E
> bits:0 flags:0
> Jun  9 19:43:25 testvm2 kernel: [ 1592.280624] block drbd0:
> uuid_compare()=100 by rule 90
> Jun  9 19:43:25 testvm2 kernel: [ 1592.280628] block drbd0: helper
> command: /sbin/drbdadm initial-split-brain minor-0
> Jun  9 19:43:25 testvm2 kernel: [ 1592.281830] block drbd0: helper
> command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0)
> Jun  9 19:43:25 testvm2 kernel: [ 1592.281844] block drbd0: helper
> command: /sbin/drbdadm split-brain minor-0
> Jun  9 19:43:25 testvm2 notify-split-brain.sh[8412]: invoked for sda4
> Jun  9 19:43:25 testvm2 notify-split-brain.sh[8412]: /usr/lib/sendmail:
> No such file or directory
> Jun  9 19:43:25 testvm2 notify-split-brain.sh[8412]: . . . message not sent.
> Jun  9 19:43:25 testvm2 kernel: [ 1592.289611] block drbd0: helper
> command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
> Jun  9 19:43:25 testvm2 kernel: [ 1592.289618] block drbd0: conn(
> WFReportParams -> Disconnecting )
> Jun  9 19:43:25 testvm2 kernel: [ 1592.289662] block drbd0: asender
> terminated
> Jun  9 19:43:25 testvm2 kernel: [ 1592.289665] block drbd0: Terminating
> drbd0_asender
> Jun  9 19:43:25 testvm2 kernel: [ 1592.289716] block drbd0: Connection
> closed
> Jun  9 19:43:25 testvm2 kernel: [ 1592.289721] block drbd0: conn(
> Disconnecting -> StandAlone )
> Jun  9 19:43:25 testvm2 kernel: [ 1592.289772] block drbd0: receiver
> terminated
> Jun  9 19:43:25 testvm2 kernel: [ 1592.289776] block drbd0: Terminating
> drbd0_receiver
> 
> 
> -- 
> Alexis de BRUYN
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140610/a5f27617/attachment-0003.sig>


More information about the Pacemaker mailing list