[Pacemaker] Having trouble with DRBD 8.4.4 on RHEL 7 beta w/ Pacemaker 1.1.10 - calls crm-fence-peer.sh when restarting the drbd resource

Sun Feb 16 20:22:44 EST 2014

On 28 Jan 2014, at 12:36 pm, Digimer <lists at alteeve.ca> wrote:

> Hi all,
> 
>  I initially posted this to the DRBD mailing list, but it got moderated for being too large.

Perhaps compress the logs and send them as attachments next time ;-)

> I hope it's ok to cross-post it here in the mean time.
> 
>  I'm trying to get DRBD dual-primary working on pacemaker 1.1.10 on RHEL 7 (beta 1). It's mostly working, except for a really strange problem.
> 
>  When I start pacemaker/corosync, DRBD starts and promotes to primary on both nodes quickly and without issue. After that, if I disable the DRBD resource, both nodes stop drbd just fine.
> 
>  The problem is when I try to re-enable the DRBD resource... One of the nodes will invoke crm-fence-peer.sh, which in turn adds a constraint blocking DRBD from becoming primary on one of the nodes (seems to be random, it's done this to both nodes). This, of course, leads to the resource entering a FAILED state on one of the nodes.
> 
>  I tried adding: handlers { after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; }. With this in place, eventually (about 60 seconds later), crm-unfence-peer.sh was called and the constraint was removed. However, by then, the resource had already entered a failed state.
> 
> Here is the current config:
> 
> ====
> [root at an-c03n01 ~]# drbdadm dump
> # /etc/drbd.conf
> global {
>    usage-count yes;
> }
> 
> common {
>    net {
>        protocol           C;
>        allow-two-primaries yes;
>        after-sb-0pri    discard-zero-changes;
>        after-sb-1pri    discard-secondary;
>        after-sb-2pri    disconnect;
>    }
>    disk {
>        fencing          resource-and-stonith;
>    }
>    handlers {
>        fence-peer       /usr/lib/drbd/crm-fence-peer.sh;
>        after-resync-target /usr/lib/drbd/crm-unfence-peer.sh;
>    }
> }
> 
> # resource r0 on an-c03n01.alteeve.ca: not ignored, not stacked
> # defined at /etc/drbd.d/r0.res:3
> resource r0 {
>    on an-c03n01.alteeve.ca {
>        volume 0 {
>            device       /dev/drbd0 minor 0;
>            disk         /dev/vdb1;
>            meta-disk    internal;
>        }
>        address          ipv4 10.10.30.1:7788;
>    }
>    on an-c03n02.alteeve.ca {
>        volume 0 {
>            device       /dev/drbd0 minor 0;
>            disk         /dev/vdb1;
>            meta-disk    internal;
>        }
>        address          ipv4 10.10.30.2:7788;
>    }
>    net {
>        verify-alg       md5;
>        data-integrity-alg md5;
>    }
>    disk {
>        disk-flushes      no;
>        md-flushes        no;
>    }
> }
> ====
> 
> I'll walk through the steps, showing the logs from both nodes as I go.
> 
> First, I start the cluster:
> 
> ====
> [root at an-c03n01 ~]# pcs cluster start --all
> an-c03n01.alteeve.ca: Starting Cluster...
> an-c03n02.alteeve.ca: Starting Cluster...
> ====
> [root at an-c03n02 ~]# pcs status
> Cluster name: an-cluster-03
> Last updated: Mon Jan 27 20:26:38 2014
> Last change: Mon Jan 27 20:25:06 2014 via crmd on an-c03n01.alteeve.ca
> Stack: corosync
> Current DC: an-c03n02.alteeve.ca (2) - partition with quorum
> Version: 1.1.10-19.el7-368c726
> 2 Nodes configured
> 4 Resources configured
> 
> 
> Online: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
> 
> Full list of resources:
> 
> fence_n01_virsh	(stonith:fence_virsh):	Started an-c03n01.alteeve.ca
> fence_n02_virsh	(stonith:fence_virsh):	Started an-c03n02.alteeve.ca
> Master/Slave Set: drbd_r0_Clone [drbd_r0]
>     Masters: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
> 
> PCSD Status:
> an-c03n01.alteeve.ca:
>  an-c03n01.alteeve.ca: Online
> an-c03n02.alteeve.ca:
>  an-c03n02.alteeve.ca: Online
> 
> Daemon Status:
>  corosync: active/disabled
>  pacemaker: active/disabled
>  pcsd: active/enabled
> ====
> [root at an-c03n02 ~]# cat /proc/drbd
> version: 8.4.4 (api:1/proto:86-101)
> GIT-hash: 74402fecf24da8e5438171ee8c19e28627e1c98a build by root at an-c03n02.alteeve.ca, 2014-01-26 16:48:51
> 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
>    ns:0 nr:0 dw:0 dr:152 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
> ====
> 
> Startup logs from an-c03n01:
> ====
> Jan 27 20:26:09 an-c03n01 systemd: Starting Corosync Cluster Engine...
> Jan 27 20:26:09 an-c03n01 corosync[823]: [MAIN  ] Corosync Cluster Engine ('2.3.2'): started and ready to provide service.
> Jan 27 20:26:09 an-c03n01 corosync[823]: [MAIN  ] Corosync built-in features: dbus systemd xmlconf snmp pie relro bindnow
> Jan 27 20:26:09 an-c03n01 corosync[824]: [TOTEM ] Initializing transport (UDP/IP Unicast).
> Jan 27 20:26:09 an-c03n01 corosync[824]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none
> Jan 27 20:26:09 an-c03n01 corosync[824]: [TOTEM ] The network interface [10.20.30.1] is now up.
> Jan 27 20:26:09 an-c03n01 corosync[824]: [SERV  ] Service engine loaded: corosync configuration map access [0]
> Jan 27 20:26:09 an-c03n01 corosync[824]: [QB    ] server name: cmap
> Jan 27 20:26:09 an-c03n01 corosync[824]: [SERV  ] Service engine loaded: corosync configuration service [1]
> Jan 27 20:26:09 an-c03n01 corosync[824]: [QB    ] server name: cfg
> Jan 27 20:26:09 an-c03n01 corosync[824]: [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
> Jan 27 20:26:09 an-c03n01 corosync[824]: [QB    ] server name: cpg
> Jan 27 20:26:09 an-c03n01 corosync[824]: [SERV  ] Service engine loaded: corosync profile loading service [4]
> Jan 27 20:26:09 an-c03n01 corosync[824]: [QUORUM] Using quorum provider corosync_votequorum
> Jan 27 20:26:09 an-c03n01 corosync[824]: [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2
> Jan 27 20:26:09 an-c03n01 corosync[824]: [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
> Jan 27 20:26:09 an-c03n01 corosync[824]: [QB    ] server name: votequorum
> Jan 27 20:26:09 an-c03n01 corosync[824]: [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
> Jan 27 20:26:09 an-c03n01 corosync[824]: [QB    ] server name: quorum
> Jan 27 20:26:09 an-c03n01 corosync[824]: [TOTEM ] adding new UDPU member {10.20.30.1}
> Jan 27 20:26:09 an-c03n01 corosync[824]: [TOTEM ] adding new UDPU member {10.20.30.2}
> Jan 27 20:26:09 an-c03n01 corosync[824]: [TOTEM ] A new membership (10.20.30.1:200) was formed. Members joined: 1
> Jan 27 20:26:09 an-c03n01 corosync[824]: [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2
> Jan 27 20:26:09 an-c03n01 corosync[824]: [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2
> Jan 27 20:26:09 an-c03n01 corosync[824]: [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2
> Jan 27 20:26:09 an-c03n01 corosync[824]: [QUORUM] Members[1]: 1
> Jan 27 20:26:09 an-c03n01 corosync[824]: [MAIN  ] Completed service synchronization, ready to provide service.
> Jan 27 20:26:10 an-c03n01 corosync[824]: [TOTEM ] A new membership (10.20.30.1:208) was formed. Members joined: 2
> Jan 27 20:26:10 an-c03n01 corosync[824]: [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2
> Jan 27 20:26:10 an-c03n01 corosync[824]: [QUORUM] This node is within the primary component and will provide service.
> Jan 27 20:26:10 an-c03n01 corosync[824]: [QUORUM] Members[2]: 1 2
> Jan 27 20:26:10 an-c03n01 corosync[824]: [MAIN  ] Completed service synchronization, ready to provide service.
> Jan 27 20:26:10 an-c03n01 corosync: Starting Corosync Cluster Engine (corosync): [  OK  ]
> Jan 27 20:26:10 an-c03n01 systemd: Started Corosync Cluster Engine.
> Jan 27 20:26:10 an-c03n01 systemd: Starting Pacemaker High Availability Cluster Manager...
> Jan 27 20:26:10 an-c03n01 systemd: Started Pacemaker High Availability Cluster Manager.
> Jan 27 20:26:10 an-c03n01 pacemakerd: Could not establish pacemakerd connection: Connection refused (111)
> Jan 27 20:26:10 an-c03n01 pacemakerd[839]: notice: mcp_read_config: Configured corosync to accept connections from group 189: OK (1)
> Jan 27 20:26:10 an-c03n01 pacemakerd[839]: notice: main: Starting Pacemaker 1.1.10-19.el7 (Build: 368c726):  generated-manpages agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc upstart systemd nagios  corosync-native
> Jan 27 20:26:10 an-c03n01 pacemakerd[839]: notice: cluster_connect_quorum: Quorum acquired
> Jan 27 20:26:10 an-c03n01 pacemakerd[839]: notice: crm_update_peer_state: pcmk_quorum_notification: Node an-c03n01.alteeve.ca[1] - state is now member (was (null))
> Jan 27 20:26:10 an-c03n01 pacemakerd[839]: notice: crm_update_peer_state: pcmk_quorum_notification: Node an-c03n02.alteeve.ca[2] - state is now member (was (null))
> Jan 27 20:26:10 an-c03n01 attrd[843]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Jan 27 20:26:10 an-c03n01 crmd[845]: notice: main: CRM Git Version: 368c726
> Jan 27 20:26:10 an-c03n01 stonith-ng[841]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Jan 27 20:26:10 an-c03n01 attrd[843]: notice: main: Starting mainloop...
> Jan 27 20:26:10 an-c03n01 cib[840]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Jan 27 20:26:11 an-c03n01 crmd[845]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Jan 27 20:26:11 an-c03n01 crmd[845]: notice: cluster_connect_quorum: Quorum acquired
> Jan 27 20:26:11 an-c03n01 stonith-ng[841]: notice: setup_cib: Watching for stonith topology changes
> Jan 27 20:26:11 an-c03n01 stonith-ng[841]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jan 27 20:26:11 an-c03n01 crmd[845]: notice: crm_update_peer_state: pcmk_quorum_notification: Node an-c03n01.alteeve.ca[1] - state is now member (was (null))
> Jan 27 20:26:11 an-c03n01 crmd[845]: notice: crm_update_peer_state: pcmk_quorum_notification: Node an-c03n02.alteeve.ca[2] - state is now member (was (null))
> Jan 27 20:26:11 an-c03n01 crmd[845]: notice: do_started: The local CRM is operational
> Jan 27 20:26:11 an-c03n01 crmd[845]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
> Jan 27 20:26:12 an-c03n01 stonith-ng[841]: notice: stonith_device_register: Added 'fence_n01_virsh' to the device list (1 active devices)
> Jan 27 20:26:13 an-c03n01 stonith-ng[841]: notice: stonith_device_register: Added 'fence_n02_virsh' to the device list (2 active devices)
> 
> Jan 27 20:26:32 an-c03n01 crmd[845]: warning: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
> Jan 27 20:26:32 an-c03n01 crmd[845]: notice: do_state_transition: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ]
> Jan 27 20:26:32 an-c03n01 crmd[845]: notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]
> Jan 27 20:26:32 an-c03n01 attrd[843]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
> Jan 27 20:26:33 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_monitor_0 (call=14, rc=7, cib-update=11, confirmed=true) not running
> Jan 27 20:26:33 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
> Jan 27 20:26:33 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 4: probe_complete=true
> Jan 27 20:26:33 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 7: probe_complete=true
> Jan 27 20:26:34 an-c03n01 stonith-ng[841]: notice: stonith_device_register: Device 'fence_n01_virsh' already existed in device list (2 active devices)
> Jan 27 20:26:34 an-c03n01 kernel: [19496.418912] drbd r0: Starting worker thread (from drbdsetup [946])
> Jan 27 20:26:34 an-c03n01 kernel: [19496.419207] block drbd0: disk( Diskless -> Attaching )
> Jan 27 20:26:34 an-c03n01 kernel: [19496.419268] drbd r0: Method to ensure write ordering: drain
> Jan 27 20:26:34 an-c03n01 kernel: [19496.419270] block drbd0: max BIO size = 1048576
> Jan 27 20:26:34 an-c03n01 kernel: [19496.419273] block drbd0: Adjusting my ra_pages to backing device's (32 -> 1024)
> Jan 27 20:26:34 an-c03n01 kernel: [19496.419275] block drbd0: drbd_bm_resize called with capacity == 41937592
> Jan 27 20:26:34 an-c03n01 kernel: [19496.419346] block drbd0: resync bitmap: bits=5242199 words=81910 pages=160
> Jan 27 20:26:34 an-c03n01 kernel: [19496.419348] block drbd0: size = 20 GB (20968796 KB)
> Jan 27 20:26:34 an-c03n01 kernel: [19496.420788] block drbd0: bitmap READ of 160 pages took 1 jiffies
> Jan 27 20:26:34 an-c03n01 kernel: [19496.420892] block drbd0: recounting of set bits took additional 0 jiffies
> Jan 27 20:26:34 an-c03n01 kernel: [19496.420895] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> Jan 27 20:26:34 an-c03n01 kernel: [19496.420900] block drbd0: disk( Attaching -> Consistent )
> Jan 27 20:26:34 an-c03n01 kernel: [19496.420904] block drbd0: attached to UUIDs AA966D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D
> Jan 27 20:26:34 an-c03n01 kernel: [19496.428933] drbd r0: conn( StandAlone -> Unconnected )
> Jan 27 20:26:34 an-c03n01 kernel: [19496.428949] drbd r0: Starting receiver thread (from drbd_w_r0 [947])
> Jan 27 20:26:34 an-c03n01 kernel: [19496.428970] drbd r0: receiver (re)started
> Jan 27 20:26:34 an-c03n01 kernel: [19496.428978] drbd r0: conn( Unconnected -> WFConnection )
> Jan 27 20:26:34 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (5)
> Jan 27 20:26:34 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 11: master-drbd_r0=5
> Jan 27 20:26:34 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_start_0 (call=16, rc=0, cib-update=12, confirmed=true) ok
> Jan 27 20:26:34 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=17, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:26:35 an-c03n01 kernel: [19496.930042] drbd r0: Handshake successful: Agreed network protocol version 101
> Jan 27 20:26:35 an-c03n01 kernel: [19496.930046] drbd r0: Agreed to support TRIM on protocol level
> Jan 27 20:26:35 an-c03n01 kernel: [19496.930093] drbd r0: conn( WFConnection -> WFReportParams )
> Jan 27 20:26:35 an-c03n01 kernel: [19496.930095] drbd r0: Starting asender thread (from drbd_r_r0 [956])
> Jan 27 20:26:35 an-c03n01 kernel: [19496.937081] block drbd0: drbd_sync_handshake:
> Jan 27 20:26:35 an-c03n01 kernel: [19496.937086] block drbd0: self AA966D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D bits:0 flags:0
> Jan 27 20:26:35 an-c03n01 kernel: [19496.937088] block drbd0: peer AA966D5345E69DAA:0000000000000000:4F366962CD263E3C:4F356962CD263E3D bits:0 flags:0
> Jan 27 20:26:35 an-c03n01 kernel: [19496.937091] block drbd0: uuid_compare()=0 by rule 40
> Jan 27 20:26:35 an-c03n01 kernel: [19496.937098] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) disk( Consistent -> UpToDate ) pdsk( DUnknown -> UpToDate )
> Jan 27 20:26:35 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation fence_n01_virsh_start_0 (call=15, rc=0, cib-update=13, confirmed=true) ok
> Jan 27 20:26:35 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=19, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:26:35 an-c03n01 kernel: [19497.258935] block drbd0: peer( Secondary -> Primary )
> Jan 27 20:26:35 an-c03n01 kernel: [19497.262592] block drbd0: role( Secondary -> Primary )
> Jan 27 20:26:35 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_promote_0 (call=20, rc=0, cib-update=14, confirmed=true) ok
> Jan 27 20:26:35 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (10000)
> Jan 27 20:26:35 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 13: master-drbd_r0=10000
> Jan 27 20:26:35 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=21, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:26:35 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 15: master-drbd_r0=10000
> Jan 27 20:26:36 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation fence_n01_virsh_monitor_60000 (call=18, rc=0, cib-update=15, confirmed=false) ok
> ====
> 
> Startup logs from an-c03n02:
> ====
> Jan 27 20:26:09 an-c03n02 systemd: Starting Corosync Cluster Engine...
> Jan 27 20:26:09 an-c03n02 corosync[21111]: [MAIN  ] Corosync Cluster Engine ('2.3.2'): started and ready to provide service.
> Jan 27 20:26:09 an-c03n02 corosync[21111]: [MAIN  ] Corosync built-in features: dbus systemd xmlconf snmp pie relro bindnow
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [TOTEM ] Initializing transport (UDP/IP Unicast).
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [TOTEM ] The network interface [10.20.30.2] is now up.
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [SERV  ] Service engine loaded: corosync configuration map access [0]
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [QB    ] server name: cmap
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [SERV  ] Service engine loaded: corosync configuration service [1]
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [QB    ] server name: cfg
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [QB    ] server name: cpg
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [SERV  ] Service engine loaded: corosync profile loading service [4]
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [QUORUM] Using quorum provider corosync_votequorum
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [QB    ] server name: votequorum
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [QB    ] server name: quorum
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [TOTEM ] adding new UDPU member {10.20.30.1}
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [TOTEM ] adding new UDPU member {10.20.30.2}
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [TOTEM ] A new membership (10.20.30.2:204) was formed. Members joined: 2
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [QUORUM] Waiting for all cluster members. Current votes: 1 expected_votes: 2
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [QUORUM] Members[1]: 2
> Jan 27 20:26:09 an-c03n02 corosync[21112]: [MAIN  ] Completed service synchronization, ready to provide service.
> Jan 27 20:26:10 an-c03n02 corosync[21112]: [TOTEM ] A new membership (10.20.30.1:208) was formed. Members joined: 1
> Jan 27 20:26:10 an-c03n02 corosync[21112]: [QUORUM] This node is within the primary component and will provide service.
> Jan 27 20:26:10 an-c03n02 corosync[21112]: [QUORUM] Members[2]: 1 2
> Jan 27 20:26:10 an-c03n02 corosync[21112]: [MAIN  ] Completed service synchronization, ready to provide service.
> Jan 27 20:26:10 an-c03n02 corosync: Starting Corosync Cluster Engine (corosync): [  OK  ]
> Jan 27 20:26:10 an-c03n02 systemd: Started Corosync Cluster Engine.
> Jan 27 20:26:10 an-c03n02 systemd: Starting Pacemaker High Availability Cluster Manager...
> Jan 27 20:26:10 an-c03n02 systemd: Started Pacemaker High Availability Cluster Manager.
> Jan 27 20:26:10 an-c03n02 pacemakerd: Could not establish pacemakerd connection: Connection refused (111)
> Jan 27 20:26:10 an-c03n02 pacemakerd[21127]: notice: mcp_read_config: Configured corosync to accept connections from group 189: OK (1)
> Jan 27 20:26:10 an-c03n02 pacemakerd[21127]: notice: main: Starting Pacemaker 1.1.10-19.el7 (Build: 368c726):  generated-manpages agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc upstart systemd nagios  corosync-native
> Jan 27 20:26:10 an-c03n02 pacemakerd[21127]: notice: cluster_connect_quorum: Quorum acquired
> Jan 27 20:26:10 an-c03n02 pacemakerd[21127]: notice: crm_update_peer_state: pcmk_quorum_notification: Node an-c03n01.alteeve.ca[1] - state is now member (was (null))
> Jan 27 20:26:10 an-c03n02 pacemakerd[21127]: notice: crm_update_peer_state: pcmk_quorum_notification: Node an-c03n02.alteeve.ca[2] - state is now member (was (null))
> Jan 27 20:26:10 an-c03n02 stonith-ng[21129]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Jan 27 20:26:10 an-c03n02 cib[21128]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Jan 27 20:26:10 an-c03n02 crmd[21133]: notice: main: CRM Git Version: 368c726
> Jan 27 20:26:10 an-c03n02 attrd[21131]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Jan 27 20:26:10 an-c03n02 attrd[21131]: notice: main: Starting mainloop...
> Jan 27 20:26:11 an-c03n02 stonith-ng[21129]: notice: setup_cib: Watching for stonith topology changes
> Jan 27 20:26:11 an-c03n02 crmd[21133]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Jan 27 20:26:11 an-c03n02 stonith-ng[21129]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jan 27 20:26:11 an-c03n02 crmd[21133]: notice: cluster_connect_quorum: Quorum acquired
> Jan 27 20:26:11 an-c03n02 crmd[21133]: notice: crm_update_peer_state: pcmk_quorum_notification: Node an-c03n01.alteeve.ca[1] - state is now member (was (null))
> Jan 27 20:26:11 an-c03n02 crmd[21133]: notice: crm_update_peer_state: pcmk_quorum_notification: Node an-c03n02.alteeve.ca[2] - state is now member (was (null))
> Jan 27 20:26:11 an-c03n02 crmd[21133]: notice: do_started: The local CRM is operational
> Jan 27 20:26:11 an-c03n02 crmd[21133]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
> Jan 27 20:26:12 an-c03n02 stonith-ng[21129]: notice: stonith_device_register: Added 'fence_n01_virsh' to the device list (1 active devices)
> Jan 27 20:26:13 an-c03n02 stonith-ng[21129]: notice: stonith_device_register: Added 'fence_n02_virsh' to the device list (2 active devices)
> 
> Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
> Jan 27 20:26:32 an-c03n02 attrd[21131]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
> Jan 27 20:26:32 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jan 27 20:26:32 an-c03n02 pengine[21132]: notice: LogActions: Start fence_n01_virsh	(an-c03n01.alteeve.ca)
> Jan 27 20:26:32 an-c03n02 pengine[21132]: notice: LogActions: Start fence_n02_virsh	(an-c03n02.alteeve.ca)
> Jan 27 20:26:32 an-c03n02 pengine[21132]: notice: LogActions: Start drbd_r0:0	(an-c03n01.alteeve.ca)
> Jan 27 20:26:32 an-c03n02 pengine[21132]: notice: LogActions: Start drbd_r0:1	(an-c03n02.alteeve.ca)
> Jan 27 20:26:32 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-164.bz2
> Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 8: monitor fence_n01_virsh_monitor_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 4: monitor fence_n01_virsh_monitor_0 on an-c03n01.alteeve.ca
> Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 9: monitor fence_n02_virsh_monitor_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 5: monitor fence_n02_virsh_monitor_0 on an-c03n01.alteeve.ca
> Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 6: monitor drbd_r0:0_monitor_0 on an-c03n01.alteeve.ca
> Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 10: monitor drbd_r0:1_monitor_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_monitor_0 (call=14, rc=7, cib-update=28, confirmed=true) not running
> Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 7: probe_complete probe_complete on an-c03n02.alteeve.ca (local) - no waiting
> Jan 27 20:26:33 an-c03n02 attrd[21131]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
> Jan 27 20:26:33 an-c03n02 attrd[21131]: notice: attrd_perform_update: Sent update 4: probe_complete=true
> Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 3: probe_complete probe_complete on an-c03n01.alteeve.ca - no waiting
> Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 11: start fence_n01_virsh_start_0 on an-c03n01.alteeve.ca
> Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 13: start fence_n02_virsh_start_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 15: start drbd_r0:0_start_0 on an-c03n01.alteeve.ca
> Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 17: start drbd_r0:1_start_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:26:34 an-c03n02 stonith-ng[21129]: notice: stonith_device_register: Device 'fence_n02_virsh' already existed in device list (2 active devices)
> Jan 27 20:26:34 an-c03n02 kernel: [ 4904.724683] drbd r0: Starting worker thread (from drbdsetup [21238])
> Jan 27 20:26:34 an-c03n02 kernel: [ 4904.724970] block drbd0: disk( Diskless -> Attaching )
> Jan 27 20:26:34 an-c03n02 kernel: [ 4904.725081] drbd r0: Method to ensure write ordering: drain
> Jan 27 20:26:34 an-c03n02 kernel: [ 4904.725084] block drbd0: max BIO size = 1048576
> Jan 27 20:26:34 an-c03n02 kernel: [ 4904.725087] block drbd0: Adjusting my ra_pages to backing device's (32 -> 1024)
> Jan 27 20:26:34 an-c03n02 kernel: [ 4904.725090] block drbd0: drbd_bm_resize called with capacity == 41937592
> Jan 27 20:26:34 an-c03n02 kernel: [ 4904.725180] block drbd0: resync bitmap: bits=5242199 words=81910 pages=160
> Jan 27 20:26:34 an-c03n02 kernel: [ 4904.725183] block drbd0: size = 20 GB (20968796 KB)
> Jan 27 20:26:34 an-c03n02 kernel: [ 4904.727769] block drbd0: bitmap READ of 160 pages took 2 jiffies
> Jan 27 20:26:34 an-c03n02 kernel: [ 4904.727981] block drbd0: recounting of set bits took additional 0 jiffies
> Jan 27 20:26:34 an-c03n02 kernel: [ 4904.727985] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> Jan 27 20:26:34 an-c03n02 kernel: [ 4904.728001] block drbd0: disk( Attaching -> Consistent )
> Jan 27 20:26:34 an-c03n02 kernel: [ 4904.728013] block drbd0: attached to UUIDs AA966D5345E69DAA:0000000000000000:4F366962CD263E3C:4F356962CD263E3D
> Jan 27 20:26:34 an-c03n02 kernel: [ 4904.738601] drbd r0: conn( StandAlone -> Unconnected )
> Jan 27 20:26:34 an-c03n02 kernel: [ 4904.738688] drbd r0: Starting receiver thread (from drbd_w_r0 [21239])
> Jan 27 20:26:34 an-c03n02 kernel: [ 4904.738709] drbd r0: receiver (re)started
> Jan 27 20:26:34 an-c03n02 kernel: [ 4904.738721] drbd r0: conn( Unconnected -> WFConnection )
> Jan 27 20:26:34 an-c03n02 attrd[21131]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (5)
> Jan 27 20:26:34 an-c03n02 attrd[21131]: notice: attrd_perform_update: Sent update 9: master-drbd_r0=5
> Jan 27 20:26:34 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_start_0 (call=16, rc=0, cib-update=29, confirmed=true) ok
> Jan 27 20:26:34 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 48: notify drbd_r0:0_post_notify_start_0 on an-c03n01.alteeve.ca
> Jan 27 20:26:34 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 49: notify drbd_r0:1_post_notify_start_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:26:34 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=17, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:26:35 an-c03n02 kernel: [ 4905.294095] drbd r0: Handshake successful: Agreed network protocol version 101
> Jan 27 20:26:35 an-c03n02 kernel: [ 4905.294099] drbd r0: Agreed to support TRIM on protocol level
> Jan 27 20:26:35 an-c03n02 kernel: [ 4905.294132] drbd r0: conn( WFConnection -> WFReportParams )
> Jan 27 20:26:35 an-c03n02 kernel: [ 4905.294134] drbd r0: Starting asender thread (from drbd_r_r0 [21248])
> Jan 27 20:26:35 an-c03n02 kernel: [ 4905.303108] block drbd0: drbd_sync_handshake:
> Jan 27 20:26:35 an-c03n02 kernel: [ 4905.303112] block drbd0: self AA966D5345E69DAA:0000000000000000:4F366962CD263E3C:4F356962CD263E3D bits:0 flags:0
> Jan 27 20:26:35 an-c03n02 kernel: [ 4905.303114] block drbd0: peer AA966D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D bits:0 flags:0
> Jan 27 20:26:35 an-c03n02 kernel: [ 4905.303115] block drbd0: uuid_compare()=0 by rule 40
> Jan 27 20:26:35 an-c03n02 kernel: [ 4905.303120] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) disk( Consistent -> UpToDate ) pdsk( DUnknown -> UpToDate )
> Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation fence_n02_virsh_start_0 (call=15, rc=0, cib-update=30, confirmed=true) ok
> Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: run_graph: Transition 0 (Complete=21, Pending=0, Fired=0, Skipped=4, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-164.bz2): Stopped
> Jan 27 20:26:35 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jan 27 20:26:35 an-c03n02 pengine[21132]: notice: LogActions: Promote drbd_r0:0	(Slave -> Master an-c03n02.alteeve.ca)
> Jan 27 20:26:35 an-c03n02 pengine[21132]: notice: LogActions: Promote drbd_r0:1	(Slave -> Master an-c03n01.alteeve.ca)
> Jan 27 20:26:35 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-165.bz2
> Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 7: monitor fence_n01_virsh_monitor_60000 on an-c03n01.alteeve.ca
> Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 10: monitor fence_n02_virsh_monitor_60000 on an-c03n02.alteeve.ca (local)
> Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 52: notify drbd_r0_pre_notify_promote_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 54: notify drbd_r0_pre_notify_promote_0 on an-c03n01.alteeve.ca
> Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=19, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 13: promote drbd_r0_promote_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 16: promote drbd_r0_promote_0 on an-c03n01.alteeve.ca
> Jan 27 20:26:35 an-c03n02 kernel: [ 4905.623345] block drbd0: role( Secondary -> Primary )
> Jan 27 20:26:35 an-c03n02 kernel: [ 4905.626560] block drbd0: peer( Secondary -> Primary )
> Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_promote_0 (call=20, rc=0, cib-update=32, confirmed=true) ok
> Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 53: notify drbd_r0_post_notify_promote_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 55: notify drbd_r0_post_notify_promote_0 on an-c03n01.alteeve.ca
> Jan 27 20:26:35 an-c03n02 attrd[21131]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (10000)
> Jan 27 20:26:35 an-c03n02 attrd[21131]: notice: attrd_perform_update: Sent update 13: master-drbd_r0=10000
> Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=21, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:26:35 an-c03n02 attrd[21131]: notice: attrd_perform_update: Sent update 15: master-drbd_r0=10000
> Jan 27 20:26:36 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation fence_n02_virsh_monitor_60000 (call=18, rc=0, cib-update=33, confirmed=false) ok
> Jan 27 20:26:36 an-c03n02 crmd[21133]: notice: run_graph: Transition 1 (Complete=14, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-165.bz2): Complete
> Jan 27 20:26:36 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jan 27 20:26:36 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-166.bz2
> Jan 27 20:26:36 an-c03n02 crmd[21133]: notice: run_graph: Transition 2 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-166.bz2): Complete
> Jan 27 20:26:36 an-c03n02 crmd[21133]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> ====
> 
> So everything looks good. So now I'll disable the DRBD resource:
> 
> ====
> [root at an-c03n01 ~]# pcs resource disable drbd_r0_Clone
> [root at an-c03n01 ~]# pcs constraint
> Location Constraints:
> Ordering Constraints:
> Colocation Constraints:
> ====
> [root at an-c03n02 ~]# pcs status
> Cluster name: an-cluster-03
> Last updated: Mon Jan 27 20:29:23 2014
> Last change: Mon Jan 27 20:29:10 2014 via crm_resource on an-c03n01.alteeve.ca
> Stack: corosync
> Current DC: an-c03n02.alteeve.ca (2) - partition with quorum
> Version: 1.1.10-19.el7-368c726
> 2 Nodes configured
> 4 Resources configured
> 
> 
> Online: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
> 
> Full list of resources:
> 
> fence_n01_virsh	(stonith:fence_virsh):	Started an-c03n01.alteeve.ca
> fence_n02_virsh	(stonith:fence_virsh):	Started an-c03n02.alteeve.ca
> Master/Slave Set: drbd_r0_Clone [drbd_r0]
>     Stopped: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
> 
> PCSD Status:
> an-c03n01.alteeve.ca:
>  an-c03n01.alteeve.ca: Online
> an-c03n02.alteeve.ca:
>  an-c03n02.alteeve.ca: Online
> 
> Daemon Status:
>  corosync: active/disabled
>  pacemaker: active/disabled
>  pcsd: active/enabled
> ====
> 
> Disable logs from an-c03n01:
> ====
> Jan 27 20:29:10 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=22, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:29:10 an-c03n01 kernel: [19652.354342] block drbd0: role( Primary -> Secondary )
> Jan 27 20:29:10 an-c03n01 kernel: [19652.354362] block drbd0: bitmap WRITE of 0 pages took 0 jiffies
> Jan 27 20:29:10 an-c03n01 kernel: [19652.354364] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> Jan 27 20:29:10 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_demote_0 (call=23, rc=0, cib-update=16, confirmed=true) ok
> Jan 27 20:29:10 an-c03n01 kernel: [19652.363096] block drbd0: peer( Primary -> Secondary )
> Jan 27 20:29:10 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=24, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:29:10 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=25, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:29:10 an-c03n01 kernel: [19652.471517] drbd r0: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown )
> Jan 27 20:29:10 an-c03n01 kernel: [19652.471539] drbd r0: asender terminated
> Jan 27 20:29:10 an-c03n01 kernel: [19652.471542] drbd r0: Terminating drbd_a_r0
> Jan 27 20:29:10 an-c03n01 kernel: [19652.472011] drbd r0: conn( TearDown -> Disconnecting )
> Jan 27 20:29:10 an-c03n01 kernel: [19652.472332] drbd r0: Connection closed
> Jan 27 20:29:10 an-c03n01 kernel: [19652.472339] drbd r0: conn( Disconnecting -> StandAlone )
> Jan 27 20:29:10 an-c03n01 kernel: [19652.472340] drbd r0: receiver terminated
> Jan 27 20:29:10 an-c03n01 kernel: [19652.472351] drbd r0: Terminating drbd_r_r0
> Jan 27 20:29:10 an-c03n01 kernel: [19652.472377] block drbd0: disk( UpToDate -> Failed )
> Jan 27 20:29:10 an-c03n01 kernel: [19652.482181] block drbd0: bitmap WRITE of 0 pages took 0 jiffies
> Jan 27 20:29:10 an-c03n01 kernel: [19652.482186] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> Jan 27 20:29:10 an-c03n01 kernel: [19652.482208] block drbd0: disk( Failed -> Diskless )
> Jan 27 20:29:10 an-c03n01 kernel: [19652.482288] block drbd0: drbd_bm_resize called with capacity == 0
> Jan 27 20:29:10 an-c03n01 kernel: [19652.482327] drbd r0: Terminating drbd_w_r0
> Jan 27 20:29:10 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (<null>)
> Jan 27 20:29:10 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent delete 17: node=1, attr=master-drbd_r0, id=<n/a>, set=(null), section=status
> Jan 27 20:29:10 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_stop_0 (call=26, rc=0, cib-update=17, confirmed=true) ok
> Jan 27 20:29:10 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent delete 19: node=1, attr=master-drbd_r0, id=<n/a>, set=(null), section=status
> ====
> 
> Disable logs from an-c03n02:
> ====
> Jan 27 20:29:10 an-c03n02 cib[21128]: notice: cib:diff: Diff: --- 0.139.23
> Jan 27 20:29:10 an-c03n02 cib[21128]: notice: cib:diff: Diff: +++ 0.140.1 ae30c6348ea7b6da2cce70635f3b0a29
> Jan 27 20:29:10 an-c03n02 cib[21128]: notice: cib:diff: -- <cib admin_epoch="0" epoch="139" num_updates="23"/>
> Jan 27 20:29:10 an-c03n02 cib[21128]: notice: cib:diff: ++ <nvpair id="drbd_r0_Clone-meta_attributes-target-role" name="target-role" value="Stopped"/>
> Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Jan 27 20:29:10 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jan 27 20:29:10 an-c03n02 pengine[21132]: notice: LogActions: Demote drbd_r0:0	(Master -> Stopped an-c03n02.alteeve.ca)
> Jan 27 20:29:10 an-c03n02 pengine[21132]: notice: LogActions: Demote drbd_r0:1	(Master -> Stopped an-c03n01.alteeve.ca)
> Jan 27 20:29:10 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-167.bz2
> Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 46: notify drbd_r0_pre_notify_demote_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 48: notify drbd_r0_pre_notify_demote_0 on an-c03n01.alteeve.ca
> Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=22, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 11: demote drbd_r0_demote_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 13: demote drbd_r0_demote_0 on an-c03n01.alteeve.ca
> Jan 27 20:29:10 an-c03n02 kernel: [ 5060.718998] block drbd0: role( Primary -> Secondary )
> Jan 27 20:29:10 an-c03n02 kernel: [ 5060.719041] block drbd0: bitmap WRITE of 0 pages took 0 jiffies
> Jan 27 20:29:10 an-c03n02 kernel: [ 5060.719043] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> Jan 27 20:29:10 an-c03n02 kernel: [ 5060.727041] block drbd0: peer( Primary -> Secondary )
> Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_demote_0 (call=23, rc=0, cib-update=36, confirmed=true) ok
> Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 47: notify drbd_r0_post_notify_demote_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 49: notify drbd_r0_post_notify_demote_0 on an-c03n01.alteeve.ca
> Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=24, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 44: notify drbd_r0_pre_notify_stop_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 45: notify drbd_r0_pre_notify_stop_0 on an-c03n01.alteeve.ca
> Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=25, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 12: stop drbd_r0_stop_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 14: stop drbd_r0_stop_0 on an-c03n01.alteeve.ca
> Jan 27 20:29:10 an-c03n02 kernel: [ 5060.835968] drbd r0: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
> Jan 27 20:29:10 an-c03n02 kernel: [ 5060.835976] drbd r0: asender terminated
> Jan 27 20:29:10 an-c03n02 kernel: [ 5060.835977] drbd r0: Terminating drbd_a_r0
> Jan 27 20:29:10 an-c03n02 kernel: [ 5060.836358] drbd r0: Connection closed
> Jan 27 20:29:10 an-c03n02 kernel: [ 5060.836368] drbd r0: conn( Disconnecting -> StandAlone )
> Jan 27 20:29:10 an-c03n02 kernel: [ 5060.836369] drbd r0: receiver terminated
> Jan 27 20:29:10 an-c03n02 kernel: [ 5060.836371] drbd r0: Terminating drbd_r_r0
> Jan 27 20:29:10 an-c03n02 kernel: [ 5060.836435] block drbd0: disk( UpToDate -> Failed )
> Jan 27 20:29:10 an-c03n02 kernel: [ 5060.846158] block drbd0: bitmap WRITE of 0 pages took 0 jiffies
> Jan 27 20:29:10 an-c03n02 kernel: [ 5060.846161] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> Jan 27 20:29:10 an-c03n02 kernel: [ 5060.846165] block drbd0: disk( Failed -> Diskless )
> Jan 27 20:29:10 an-c03n02 kernel: [ 5060.846249] block drbd0: drbd_bm_resize called with capacity == 0
> Jan 27 20:29:10 an-c03n02 kernel: [ 5060.846269] drbd r0: Terminating drbd_w_r0
> Jan 27 20:29:10 an-c03n02 attrd[21131]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (<null>)
> Jan 27 20:29:10 an-c03n02 attrd[21131]: notice: attrd_perform_update: Sent delete 19: node=2, attr=master-drbd_r0, id=<n/a>, set=(null), section=status
> Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_stop_0 (call=26, rc=0, cib-update=37, confirmed=true) ok
> Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: run_graph: Transition 3 (Complete=22, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-167.bz2): Stopped
> Jan 27 20:29:10 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jan 27 20:29:10 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 4: /var/lib/pacemaker/pengine/pe-input-168.bz2
> Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: run_graph: Transition 4 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-168.bz2): Complete
> Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> ====
> 
> 
> Still looking good. Now here is where things go sideways...
> 
> ====
> [root at an-c03n01 ~]# pcs resource enable drbd_r0_Clone
> ====
> [root at an-c03n02 ~]# pcs status
> Cluster name: an-cluster-03
> Last updated: Mon Jan 27 20:32:52 2014
> Last change: Mon Jan 27 20:32:05 2014 via cibadmin on an-c03n01.alteeve.ca
> Stack: corosync
> Current DC: an-c03n02.alteeve.ca (2) - partition with quorum
> Version: 1.1.10-19.el7-368c726
> 2 Nodes configured
> 4 Resources configured
> 
> 
> Online: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
> 
> Full list of resources:
> 
> fence_n01_virsh	(stonith:fence_virsh):	Started an-c03n01.alteeve.ca
> fence_n02_virsh	(stonith:fence_virsh):	Started an-c03n02.alteeve.ca
> Master/Slave Set: drbd_r0_Clone [drbd_r0]
>     Masters: [ an-c03n02.alteeve.ca ]
>     Slaves: [ an-c03n01.alteeve.ca ]
> 
> Failed actions:
>    drbd_r0_promote_0 on an-c03n01.alteeve.ca 'unknown error' (1): call=30, status=complete, last-rc-change='Mon Jan 27 20:32:05 2014', queued=15187ms, exec=0ms
> 
> 
> PCSD Status:
> an-c03n01.alteeve.ca:
>  an-c03n01.alteeve.ca: Online
> an-c03n02.alteeve.ca:
>  an-c03n02.alteeve.ca: Online
> 
> Daemon Status:
>  corosync: active/disabled
>  pacemaker: active/disabled
>  pcsd: active/enabled
> ====
> 
> Enable logs from an-c03n01:
> ====
> Jan 27 20:32:05 an-c03n01 kernel: [19827.078454] drbd r0: Starting worker thread (from drbdsetup [1337])
> Jan 27 20:32:05 an-c03n01 kernel: [19827.078587] block drbd0: disk( Diskless -> Attaching )
> Jan 27 20:32:05 an-c03n01 kernel: [19827.078655] drbd r0: Method to ensure write ordering: drain
> Jan 27 20:32:05 an-c03n01 kernel: [19827.078657] block drbd0: max BIO size = 1048576
> Jan 27 20:32:05 an-c03n01 kernel: [19827.078661] block drbd0: Adjusting my ra_pages to backing device's (32 -> 1024)
> Jan 27 20:32:05 an-c03n01 kernel: [19827.078664] block drbd0: drbd_bm_resize called with capacity == 41937592
> Jan 27 20:32:05 an-c03n01 kernel: [19827.078732] block drbd0: resync bitmap: bits=5242199 words=81910 pages=160
> Jan 27 20:32:05 an-c03n01 kernel: [19827.078734] block drbd0: size = 20 GB (20968796 KB)
> Jan 27 20:32:05 an-c03n01 kernel: [19827.080475] block drbd0: bitmap READ of 160 pages took 2 jiffies
> Jan 27 20:32:05 an-c03n01 kernel: [19827.080566] block drbd0: recounting of set bits took additional 0 jiffies
> Jan 27 20:32:05 an-c03n01 kernel: [19827.080568] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> Jan 27 20:32:05 an-c03n01 kernel: [19827.080575] block drbd0: disk( Attaching -> Consistent )
> Jan 27 20:32:05 an-c03n01 kernel: [19827.080577] block drbd0: attached to UUIDs AA966D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D
> Jan 27 20:32:05 an-c03n01 kernel: [19827.086606] drbd r0: conn( StandAlone -> Unconnected )
> Jan 27 20:32:05 an-c03n01 kernel: [19827.086663] drbd r0: Starting receiver thread (from drbd_w_r0 [1338])
> Jan 27 20:32:05 an-c03n01 kernel: [19827.086677] drbd r0: receiver (re)started
> Jan 27 20:32:05 an-c03n01 kernel: [19827.086682] drbd r0: conn( Unconnected -> WFConnection )
> Jan 27 20:32:05 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (5)
> Jan 27 20:32:05 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 23: master-drbd_r0=5
> Jan 27 20:32:05 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_start_0 (call=27, rc=0, cib-update=18, confirmed=true) ok
> Jan 27 20:32:05 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=28, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:32:05 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=29, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:32:05 an-c03n01 kernel: [19827.235110] drbd r0: helper command: /sbin/drbdadm fence-peer r0
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: invoked for r0
> Jan 27 20:32:05 an-c03n01 crmd[845]: notice: handle_request: Current ping state: S_NOT_DC
> Jan 27 20:32:05 an-c03n01 cibadmin[1469]: notice: crm_log_args: Invoked: cibadmin -C -o constraints -X <rsc_location rsc="drbd_r0_Clone" id="drbd-fence-by-handler-r0-drbd_r0_Clone">
>  <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone">
>    <expression attribute="#uname" operation="ne" value="an-c03n01.alteeve.ca" id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/>
>  </rule>
> </rsc_location>
> Jan 27 20:32:05 an-c03n01 stonith-ng[841]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: Call cib_create failed (-76): Name not unique on network
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: <failed>
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: <failed_update id="drbd-fence-by-handler-r0-drbd_r0_Clone" object_type="rsc_location" operation="cib_create" reason="Name not unique on network">
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: <rsc_location rsc="drbd_r0_Clone" id="drbd-fence-by-handler-r0-drbd_r0_Clone">
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone">
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: <expression attribute="#uname" operation="ne" value="an-c03n01.alteeve.ca" id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/>
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: </rule>
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: </rsc_location>
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: </failed_update>
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: </failed>
> Jan 27 20:32:05 an-c03n01 kernel: [19827.302587] drbd r0: helper command: /sbin/drbdadm fence-peer r0 exit code 1 (0x100)
> Jan 27 20:32:05 an-c03n01 kernel: [19827.302590] drbd r0: fence-peer helper broken, returned 1
> Jan 27 20:32:05 an-c03n01 kernel: [19827.302607] drbd r0: helper command: /sbin/drbdadm fence-peer r0
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: WARNING DATA INTEGRITY at RISK: could not place the fencing constraint!
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1484]: invoked for r0
> Jan 27 20:32:05 an-c03n01 stonith-ng[841]: notice: stonith_device_register: Device 'fence_n01_virsh' already existed in device list (2 active devices)
> Jan 27 20:32:05 an-c03n01 kernel: [19827.328528] drbd r0: helper command: /sbin/drbdadm fence-peer r0 exit code 1 (0x100)
> Jan 27 20:32:05 an-c03n01 kernel: [19827.328532] drbd r0: fence-peer helper broken, returned 1
> Jan 27 20:32:05 an-c03n01 kernel: [19827.328553] drbd r0: helper command: /sbin/drbdadm fence-peer r0
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1513]: invoked for r0
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1484]: WARNING constraint <expression attribute="#uname" <expression operation="ne" <expression value="an-c03n02.alteeve.ca" <rsc_location rsc="drbd_r0_Clone" <rule role="Master" <rule score="-INFINITY" already exists
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1484]: WARNING DATA INTEGRITY at RISK: could not place the fencing constraint!
> Jan 27 20:32:05 an-c03n01 kernel: [19827.359166] drbd r0: helper command: /sbin/drbdadm fence-peer r0 exit code 1 (0x100)
> Jan 27 20:32:05 an-c03n01 kernel: [19827.359170] drbd r0: fence-peer helper broken, returned 1
> Jan 27 20:32:05 an-c03n01 kernel: [19827.359193] drbd r0: helper command: /sbin/drbdadm fence-peer r0
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1513]: WARNING constraint <expression attribute="#uname" <expression operation="ne" <expression value="an-c03n02.alteeve.ca" <rsc_location rsc="drbd_r0_Clone" <rule role="Master" <rule score="-INFINITY" already exists
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1513]: WARNING DATA INTEGRITY at RISK: could not place the fencing constraint!
> Jan 27 20:32:05 an-c03n01 stonith-ng[841]: notice: stonith_device_register: Added 'fence_n02_virsh' to the device list (2 active devices)
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1541]: invoked for r0
> Jan 27 20:32:05 an-c03n01 kernel: [19827.379932] drbd r0: helper command: /sbin/drbdadm fence-peer r0 exit code 1 (0x100)
> Jan 27 20:32:05 an-c03n01 kernel: [19827.379935] drbd r0: fence-peer helper broken, returned 1
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1541]: WARNING constraint <expression attribute="#uname" <expression operation="ne" <expression value="an-c03n02.alteeve.ca" <rsc_location rsc="drbd_r0_Clone" <rule role="Master" <rule score="-INFINITY" already exists
> Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1541]: WARNING DATA INTEGRITY at RISK: could not place the fencing constraint!
> Jan 27 20:32:05 an-c03n01 drbd(drbd_r0)[1408]: ERROR: r0: Called drbdadm -c /etc/drbd.conf primary r0
> Jan 27 20:32:05 an-c03n01 drbd(drbd_r0)[1408]: ERROR: r0: Exit code 17
> Jan 27 20:32:05 an-c03n01 drbd(drbd_r0)[1408]: ERROR: r0: Command output:
> Jan 27 20:32:05 an-c03n01 drbd(drbd_r0)[1408]: CRIT: Refusing to be promoted to Primary without UpToDate data
> Jan 27 20:32:05 an-c03n01 drbd(drbd_r0)[1408]: WARNING: promotion failed; sleep 15 # to prevent tight recovery loop
> Jan 27 20:32:05 an-c03n01 kernel: [19827.597081] drbd r0: Handshake successful: Agreed network protocol version 101
> Jan 27 20:32:05 an-c03n01 kernel: [19827.597084] drbd r0: Agreed to support TRIM on protocol level
> Jan 27 20:32:05 an-c03n01 kernel: [19827.597142] drbd r0: conn( WFConnection -> WFReportParams )
> Jan 27 20:32:05 an-c03n01 kernel: [19827.597145] drbd r0: Starting asender thread (from drbd_r_r0 [1347])
> Jan 27 20:32:05 an-c03n01 kernel: [19827.606053] block drbd0: drbd_sync_handshake:
> Jan 27 20:32:05 an-c03n01 kernel: [19827.606057] block drbd0: self AA966D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D bits:0 flags:0
> Jan 27 20:32:05 an-c03n01 kernel: [19827.606058] block drbd0: peer 853E72BBF0C9260D:AA966D5345E69DAA:4F366962CD263E3C:4F356962CD263E3D bits:0 flags:0
> Jan 27 20:32:05 an-c03n01 kernel: [19827.606060] block drbd0: uuid_compare()=-1 by rule 50
> Jan 27 20:32:05 an-c03n01 kernel: [19827.606065] block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( Consistent -> Outdated ) pdsk( DUnknown -> UpToDate )
> Jan 27 20:32:05 an-c03n01 kernel: [19827.606296] block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Jan 27 20:32:05 an-c03n01 kernel: [19827.606388] block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Jan 27 20:32:05 an-c03n01 kernel: [19827.606391] block drbd0: conn( WFBitMapT -> WFSyncUUID )
> Jan 27 20:32:05 an-c03n01 kernel: [19827.607961] block drbd0: updated sync uuid AA976D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D
> Jan 27 20:32:05 an-c03n01 kernel: [19827.608137] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0
> Jan 27 20:32:05 an-c03n01 kernel: [19827.609229] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
> Jan 27 20:32:05 an-c03n01 kernel: [19827.609243] block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent )
> Jan 27 20:32:05 an-c03n01 kernel: [19827.609251] block drbd0: Began resync as SyncTarget (will sync 0 KB [0 bits set]).
> Jan 27 20:32:05 an-c03n01 kernel: [19827.610184] block drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
> Jan 27 20:32:05 an-c03n01 kernel: [19827.610188] block drbd0: updated UUIDs 853E72BBF0C9260C:0000000000000000:AA976D5345E69DAA:AA966D5345E69DAA
> Jan 27 20:32:05 an-c03n01 kernel: [19827.610191] block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
> Jan 27 20:32:05 an-c03n01 kernel: [19827.610627] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0
> Jan 27 20:32:05 an-c03n01 crm-unfence-peer.sh[1589]: invoked for r0
> Jan 27 20:32:05 an-c03n01 cibadmin[1603]: notice: crm_log_args: Invoked: cibadmin -D -X <rsc_location rsc="drbd_r0_Clone" id="drbd-fence-by-handler-r0-drbd_r0_Clone"/>
> Jan 27 20:32:05 an-c03n01 stonith-ng[841]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jan 27 20:32:05 an-c03n01 kernel: [19827.637304] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0)
> Jan 27 20:32:05 an-c03n01 stonith-ng[841]: notice: stonith_device_register: Device 'fence_n01_virsh' already existed in device list (2 active devices)
> Jan 27 20:32:05 an-c03n01 stonith-ng[841]: notice: stonith_device_register: Added 'fence_n02_virsh' to the device list (2 active devices)
> Jan 27 20:32:20 an-c03n01 lrmd[842]: notice: operation_finished: drbd_r0_promote_0:1408:stderr [ 0: State change failed: (-2) Need access to UpToDate data ]
> Jan 27 20:32:20 an-c03n01 lrmd[842]: notice: operation_finished: drbd_r0_promote_0:1408:stderr [ Command 'drbdsetup primary 0' terminated with exit code 17 ]
> Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_promote_0 (call=30, rc=1, cib-update=19, confirmed=true) unknown error
> Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: an-c03n01.alteeve.ca-drbd_r0_promote_0:30 [ \n ]
> Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_cs_dispatch: Update relayed from an-c03n02.alteeve.ca
> Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-drbd_r0 (1)
> Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 28: fail-count-drbd_r0=1
> Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_cs_dispatch: Update relayed from an-c03n02.alteeve.ca
> Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-drbd_r0 (1390872740)
> Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 31: last-failure-drbd_r0=1390872740
> Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_cs_dispatch: Update relayed from an-c03n02.alteeve.ca
> Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-drbd_r0 (2)
> Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 34: fail-count-drbd_r0=2
> Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_cs_dispatch: Update relayed from an-c03n02.alteeve.ca
> Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-drbd_r0 (1390872740)
> Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 37: last-failure-drbd_r0=1390872740
> Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (10000)
> Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 39: master-drbd_r0=10000
> Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=31, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=32, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_demote_0 (call=33, rc=0, cib-update=20, confirmed=true) ok
> Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=34, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=35, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:32:20 an-c03n01 kernel: [19842.604453] drbd r0: Requested state change failed by peer: Refusing to be Primary while peer is not outdated (-7)
> Jan 27 20:32:20 an-c03n01 kernel: [19842.605419] drbd r0: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated ) pdsk( UpToDate -> DUnknown )
> Jan 27 20:32:20 an-c03n01 kernel: [19842.605458] drbd r0: asender terminated
> Jan 27 20:32:20 an-c03n01 kernel: [19842.605460] drbd r0: Terminating drbd_a_r0
> Jan 27 20:32:20 an-c03n01 kernel: [19842.605841] drbd r0: Connection closed
> Jan 27 20:32:20 an-c03n01 kernel: [19842.605849] drbd r0: conn( Disconnecting -> StandAlone )
> Jan 27 20:32:20 an-c03n01 kernel: [19842.605850] drbd r0: receiver terminated
> Jan 27 20:32:20 an-c03n01 kernel: [19842.605860] drbd r0: Terminating drbd_r_r0
> Jan 27 20:32:20 an-c03n01 kernel: [19842.605885] block drbd0: disk( Outdated -> Failed )
> Jan 27 20:32:20 an-c03n01 kernel: [19842.617080] block drbd0: bitmap WRITE of 0 pages took 0 jiffies
> Jan 27 20:32:20 an-c03n01 kernel: [19842.617085] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> Jan 27 20:32:20 an-c03n01 kernel: [19842.617103] block drbd0: disk( Failed -> Diskless )
> Jan 27 20:32:20 an-c03n01 kernel: [19842.617174] block drbd0: drbd_bm_resize called with capacity == 0
> Jan 27 20:32:20 an-c03n01 kernel: [19842.617202] drbd r0: Terminating drbd_w_r0
> Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (<null>)
> Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_stop_0 (call=36, rc=0, cib-update=21, confirmed=true) ok
> Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent delete 43: node=1, attr=master-drbd_r0, id=<n/a>, set=(null), section=status
> Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent delete 45: node=1, attr=master-drbd_r0, id=<n/a>, set=(null), section=status
> Jan 27 20:32:21 an-c03n01 kernel: [19842.840388] drbd r0: Starting worker thread (from drbdsetup [1818])
> Jan 27 20:32:21 an-c03n01 kernel: [19842.840614] block drbd0: disk( Diskless -> Attaching )
> Jan 27 20:32:21 an-c03n01 kernel: [19842.840687] drbd r0: Method to ensure write ordering: drain
> Jan 27 20:32:21 an-c03n01 kernel: [19842.840689] block drbd0: max BIO size = 1048576
> Jan 27 20:32:21 an-c03n01 kernel: [19842.840692] block drbd0: Adjusting my ra_pages to backing device's (32 -> 1024)
> Jan 27 20:32:21 an-c03n01 kernel: [19842.840694] block drbd0: drbd_bm_resize called with capacity == 41937592
> Jan 27 20:32:21 an-c03n01 kernel: [19842.840770] block drbd0: resync bitmap: bits=5242199 words=81910 pages=160
> Jan 27 20:32:21 an-c03n01 kernel: [19842.840772] block drbd0: size = 20 GB (20968796 KB)
> Jan 27 20:32:21 an-c03n01 kernel: [19842.850197] block drbd0: bitmap READ of 160 pages took 10 jiffies
> Jan 27 20:32:21 an-c03n01 kernel: [19842.850288] block drbd0: recounting of set bits took additional 0 jiffies
> Jan 27 20:32:21 an-c03n01 kernel: [19842.850290] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> Jan 27 20:32:21 an-c03n01 kernel: [19842.850295] block drbd0: disk( Attaching -> Outdated )
> Jan 27 20:32:21 an-c03n01 kernel: [19842.850297] block drbd0: attached to UUIDs 853E72BBF0C9260C:0000000000000000:AA976D5345E69DAA:AA966D5345E69DAA
> Jan 27 20:32:21 an-c03n01 kernel: [19842.856274] drbd r0: conn( StandAlone -> Unconnected )
> Jan 27 20:32:21 an-c03n01 kernel: [19842.856311] drbd r0: Starting receiver thread (from drbd_w_r0 [1819])
> Jan 27 20:32:21 an-c03n01 kernel: [19842.856332] drbd r0: receiver (re)started
> Jan 27 20:32:21 an-c03n01 kernel: [19842.856340] drbd r0: conn( Unconnected -> WFConnection )
> Jan 27 20:32:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_start_0 (call=37, rc=0, cib-update=22, confirmed=true) ok
> Jan 27 20:32:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=38, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:32:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_monitor_60000 (call=39, rc=0, cib-update=23, confirmed=false) ok
> Jan 27 20:32:21 an-c03n01 kernel: [19843.356430] drbd r0: Handshake successful: Agreed network protocol version 101
> Jan 27 20:32:21 an-c03n01 kernel: [19843.356432] drbd r0: Agreed to support TRIM on protocol level
> Jan 27 20:32:21 an-c03n01 kernel: [19843.356473] drbd r0: conn( WFConnection -> WFReportParams )
> Jan 27 20:32:21 an-c03n01 kernel: [19843.356475] drbd r0: Starting asender thread (from drbd_r_r0 [1829])
> Jan 27 20:32:21 an-c03n01 kernel: [19843.362052] block drbd0: drbd_sync_handshake:
> Jan 27 20:32:21 an-c03n01 kernel: [19843.362056] block drbd0: self 853E72BBF0C9260C:0000000000000000:AA976D5345E69DAA:AA966D5345E69DAA bits:0 flags:0
> Jan 27 20:32:21 an-c03n01 kernel: [19843.362057] block drbd0: peer FD6969A6E17CBA41:853E72BBF0C9260D:AA976D5345E69DAA:AA966D5345E69DAA bits:0 flags:0
> Jan 27 20:32:21 an-c03n01 kernel: [19843.362059] block drbd0: uuid_compare()=-1 by rule 50
> Jan 27 20:32:21 an-c03n01 kernel: [19843.362063] block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
> Jan 27 20:32:21 an-c03n01 kernel: [19843.365473] block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Jan 27 20:32:21 an-c03n01 kernel: [19843.365579] block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Jan 27 20:32:21 an-c03n01 kernel: [19843.365583] block drbd0: conn( WFBitMapT -> WFSyncUUID )
> Jan 27 20:32:21 an-c03n01 kernel: [19843.367483] block drbd0: updated sync uuid 853F72BBF0C9260C:0000000000000000:AA976D5345E69DAA:AA966D5345E69DAA
> Jan 27 20:32:21 an-c03n01 kernel: [19843.367693] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0
> Jan 27 20:32:21 an-c03n01 kernel: [19843.368877] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
> Jan 27 20:32:21 an-c03n01 kernel: [19843.368892] block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent )
> Jan 27 20:32:21 an-c03n01 kernel: [19843.368899] block drbd0: Began resync as SyncTarget (will sync 0 KB [0 bits set]).
> Jan 27 20:32:21 an-c03n01 kernel: [19843.369304] block drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
> Jan 27 20:32:21 an-c03n01 kernel: [19843.369309] block drbd0: updated UUIDs FD6969A6E17CBA40:0000000000000000:853F72BBF0C9260C:853E72BBF0C9260D
> Jan 27 20:32:21 an-c03n01 kernel: [19843.369313] block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
> Jan 27 20:32:21 an-c03n01 kernel: [19843.369433] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0
> Jan 27 20:32:21 an-c03n01 crm-unfence-peer.sh[1900]: invoked for r0
> Jan 27 20:32:21 an-c03n01 kernel: [19843.384987] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0)
> ====
> 
> Enable logs from an-c03n02:
> ====
> Jan 27 20:32:04 an-c03n02 cib[21128]: notice: cib:diff: Diff: --- 0.140.7
> Jan 27 20:32:04 an-c03n02 cib[21128]: notice: cib:diff: Diff: +++ 0.141.1 fcc6dc293b799186774cfb583055eb9f
> Jan 27 20:32:04 an-c03n02 cib[21128]: notice: cib:diff: -- <nvpair id="drbd_r0_Clone-meta_attributes-target-role" name="target-role" value="Stopped"/>
> Jan 27 20:32:04 an-c03n02 cib[21128]: notice: cib:diff: ++ <cib admin_epoch="0" cib-last-written="Mon Jan 27 20:32:04 2014" crm_feature_set="3.0.7" epoch="141" have-quorum="1" num_updates="1" update-client="crm_resource" update-origin="an-c03n01.alteeve.ca" validate-with="pacemaker-1.2" dc-uuid="2"/>
> Jan 27 20:32:04 an-c03n02 crmd[21133]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Jan 27 20:32:04 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jan 27 20:32:04 an-c03n02 pengine[21132]: notice: LogActions: Start drbd_r0:0	(an-c03n01.alteeve.ca)
> Jan 27 20:32:04 an-c03n02 pengine[21132]: notice: LogActions: Start drbd_r0:1	(an-c03n02.alteeve.ca)
> Jan 27 20:32:04 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 5: /var/lib/pacemaker/pengine/pe-input-169.bz2
> Jan 27 20:32:04 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 11: start drbd_r0_start_0 on an-c03n01.alteeve.ca
> Jan 27 20:32:04 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 13: start drbd_r0:1_start_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.451554] drbd r0: Starting worker thread (from drbdsetup [21714])
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452326] block drbd0: disk( Diskless -> Attaching )
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452402] drbd r0: Method to ensure write ordering: drain
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452404] block drbd0: max BIO size = 1048576
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452407] block drbd0: Adjusting my ra_pages to backing device's (32 -> 1024)
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452409] block drbd0: drbd_bm_resize called with capacity == 41937592
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452467] block drbd0: resync bitmap: bits=5242199 words=81910 pages=160
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452469] block drbd0: size = 20 GB (20968796 KB)
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.453954] block drbd0: bitmap READ of 160 pages took 1 jiffies
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.454067] block drbd0: recounting of set bits took additional 1 jiffies
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.454069] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.454073] block drbd0: disk( Attaching -> Consistent )
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.454076] block drbd0: attached to UUIDs AA966D5345E69DAA:0000000000000000:4F366962CD263E3C:4F356962CD263E3D
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.460539] drbd r0: conn( StandAlone -> Unconnected )
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.460598] drbd r0: Starting receiver thread (from drbd_w_r0 [21715])
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.461937] drbd r0: receiver (re)started
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.461957] drbd r0: conn( Unconnected -> WFConnection )
> Jan 27 20:32:05 an-c03n02 attrd[21131]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (5)
> Jan 27 20:32:05 an-c03n02 attrd[21131]: notice: attrd_perform_update: Sent update 24: master-drbd_r0=5
> Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_start_0 (call=27, rc=0, cib-update=40, confirmed=true) ok
> Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 44: notify drbd_r0_post_notify_start_0 on an-c03n01.alteeve.ca
> Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 45: notify drbd_r0:1_post_notify_start_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=28, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: run_graph: Transition 5 (Complete=10, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-169.bz2): Stopped
> Jan 27 20:32:05 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jan 27 20:32:05 an-c03n02 pengine[21132]: notice: LogActions: Promote drbd_r0:0	(Slave -> Master an-c03n02.alteeve.ca)
> Jan 27 20:32:05 an-c03n02 pengine[21132]: notice: LogActions: Promote drbd_r0:1	(Slave -> Master an-c03n01.alteeve.ca)
> Jan 27 20:32:05 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 6: /var/lib/pacemaker/pengine/pe-input-170.bz2
> Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 52: notify drbd_r0_pre_notify_promote_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 54: notify drbd_r0_pre_notify_promote_0 on an-c03n01.alteeve.ca
> Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=29, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 13: promote drbd_r0_promote_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 16: promote drbd_r0_promote_0 on an-c03n01.alteeve.ca
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.599706] drbd r0: helper command: /sbin/drbdadm fence-peer r0
> Jan 27 20:32:05 an-c03n02 crm-fence-peer.sh[21814]: invoked for r0
> Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: handle_request: Current ping state: S_TRANSITION_ENGINE
> Jan 27 20:32:05 an-c03n02 cibadmin[21846]: notice: crm_log_args: Invoked: cibadmin -C -o constraints -X <rsc_location rsc="drbd_r0_Clone" id="drbd-fence-by-handler-r0-drbd_r0_Clone">
>  <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone">
>    <expression attribute="#uname" operation="ne" value="an-c03n02.alteeve.ca" id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/>
>  </rule>
> </rsc_location>
> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: Diff: --- 0.141.5
> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: Diff: +++ 0.142.1 c0646876db9897523b58236bb6890452
> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: -- <cib admin_epoch="0" epoch="141" num_updates="5"/>
> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: ++ <rsc_location rsc="drbd_r0_Clone" id="drbd-fence-by-handler-r0-drbd_r0_Clone">
> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: ++         <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone">
> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: ++ <expression attribute="#uname" operation="ne" value="an-c03n02.alteeve.ca" id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/>
> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: ++         </rule>
> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: ++ </rsc_location>
> Jan 27 20:32:05 an-c03n02 stonith-ng[21129]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jan 27 20:32:05 an-c03n02 crm-fence-peer.sh[21814]: INFO peer is reachable, my disk is Consistent: placed constraint 'drbd-fence-by-handler-r0-drbd_r0_Clone'
> Jan 27 20:32:05 an-c03n02 cib[21128]: warning: update_results: Action cib_create failed: Name not unique on network (cde=-76)
> Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB Update failures   <failed>
> Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB Update failures     <failed_update id="drbd-fence-by-handler-r0-drbd_r0_Clone" object_type="rsc_location" operation="cib_create" reason="Name not unique on network">
> Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB Update failures       <rsc_location rsc="drbd_r0_Clone" id="drbd-fence-by-handler-r0-drbd_r0_Clone">
> Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB Update failures         <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone">
> Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB Update failures           <expression attribute="#uname" operation="ne" value="an-c03n01.alteeve.ca" id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/>
> Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB Update failures         </rule>
> Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB Update failures       </rsc_location>
> Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB Update failures     </failed_update>
> Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB Update failures   </failed>
> Jan 27 20:32:05 an-c03n02 cib[21128]: warning: cib_process_request: Completed cib_create operation for section constraints: Name not unique on network (rc=-76, origin=an-c03n01.alteeve.ca/cibadmin/2, version=0.142.1)
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.651646] drbd r0: helper command: /sbin/drbdadm fence-peer r0 exit code 4 (0x400)
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.651650] drbd r0: fence-peer helper returned 4 (peer was fenced)
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.651660] drbd r0: pdsk( DUnknown -> Outdated )
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.651666] block drbd0: role( Secondary -> Primary ) disk( Consistent -> UpToDate )
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.651876] block drbd0: new current UUID 853E72BBF0C9260D:AA966D5345E69DAA:4F366962CD263E3C:4F356962CD263E3D
> Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_promote_0 (call=30, rc=0, cib-update=42, confirmed=true) ok
> Jan 27 20:32:05 an-c03n02 stonith-ng[21129]: notice: stonith_device_register: Added 'fence_n01_virsh' to the device list (2 active devices)
> Jan 27 20:32:05 an-c03n02 stonith-ng[21129]: notice: stonith_device_register: Device 'fence_n02_virsh' already existed in device list (2 active devices)
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.962021] drbd r0: Handshake successful: Agreed network protocol version 101
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.962023] drbd r0: Agreed to support TRIM on protocol level
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.962069] drbd r0: conn( WFConnection -> WFReportParams )
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.962072] drbd r0: Starting asender thread (from drbd_r_r0 [21724])
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.968085] block drbd0: drbd_sync_handshake:
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.968090] block drbd0: self 853E72BBF0C9260D:AA966D5345E69DAA:4F366962CD263E3C:4F356962CD263E3D bits:0 flags:0
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.968092] block drbd0: peer AA966D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D bits:0 flags:0
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.968094] block drbd0: uuid_compare()=1 by rule 70
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.968100] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> Consistent )
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.968256] block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.971293] block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.971299] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.972381] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0)
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.972395] block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent )
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.972402] block drbd0: Began resync as SyncSource (will sync 0 KB [0 bits set]).
> Jan 27 20:32:05 an-c03n02 kernel: [ 5235.972433] block drbd0: updated sync UUID 853E72BBF0C9260D:AA976D5345E69DAA:AA966D5345E69DAA:4F366962CD263E3C
> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: Diff: --- 0.142.2
> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: Diff: +++ 0.143.1 fbd603d69e81ccfe94726267b74d5322
> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: -- <rsc_location rsc="drbd_r0_Clone" id="drbd-fence-by-handler-r0-drbd_r0_Clone">
> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: --         <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone">
> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: -- <expression attribute="#uname" operation="ne" value="an-c03n02.alteeve.ca" id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/>
> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: --         </rule>
> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: -- </rsc_location>
> Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: ++ <cib admin_epoch="0" cib-last-written="Mon Jan 27 20:32:05 2014" crm_feature_set="3.0.7" epoch="143" have-quorum="1" num_updates="1" update-client="cibadmin" update-origin="an-c03n01.alteeve.ca" validate-with="pacemaker-1.2" dc-uuid="2"/>
> Jan 27 20:32:05 an-c03n02 stonith-ng[21129]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jan 27 20:32:05 an-c03n02 kernel: [ 5236.007605] block drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
> Jan 27 20:32:05 an-c03n02 kernel: [ 5236.007612] block drbd0: updated UUIDs 853E72BBF0C9260D:0000000000000000:AA976D5345E69DAA:AA966D5345E69DAA
> Jan 27 20:32:05 an-c03n02 kernel: [ 5236.007618] block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
> Jan 27 20:32:05 an-c03n02 stonith-ng[21129]: notice: stonith_device_register: Added 'fence_n01_virsh' to the device list (2 active devices)
> Jan 27 20:32:05 an-c03n02 stonith-ng[21129]: notice: stonith_device_register: Device 'fence_n02_virsh' already existed in device list (2 active devices)
> Jan 27 20:32:20 an-c03n02 crmd[21133]: warning: status_from_rc: Action 16 (drbd_r0_promote_0) on an-c03n01.alteeve.ca failed (target: 0 vs. rc: 1): Error
> Jan 27 20:32:20 an-c03n02 crmd[21133]: warning: update_failcount: Updating failcount for drbd_r0 on an-c03n01.alteeve.ca after failed promote: rc=1 (update=value++, time=1390872740)
> Jan 27 20:32:20 an-c03n02 crmd[21133]: warning: update_failcount: Updating failcount for drbd_r0 on an-c03n01.alteeve.ca after failed promote: rc=1 (update=value++, time=1390872740)
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 53: notify drbd_r0_post_notify_promote_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 55: notify drbd_r0_post_notify_promote_0 on an-c03n01.alteeve.ca
> Jan 27 20:32:20 an-c03n02 attrd[21131]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (10000)
> Jan 27 20:32:20 an-c03n02 attrd[21131]: notice: attrd_perform_update: Sent update 32: master-drbd_r0=10000
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=31, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: run_graph: Transition 6 (Complete=12, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-170.bz2): Complete
> Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jan 27 20:32:20 an-c03n02 pengine[21132]: warning: unpack_rsc_op: Processing failed op promote for drbd_r0:1 on an-c03n01.alteeve.ca: unknown error (1)
> Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: LogActions: Demote drbd_r0:1	(Master -> Slave an-c03n01.alteeve.ca)
> Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: LogActions: Recover drbd_r0:1	(Master an-c03n01.alteeve.ca)
> Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 7: /var/lib/pacemaker/pengine/pe-input-171.bz2
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 55: notify drbd_r0_pre_notify_demote_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 57: notify drbd_r0_pre_notify_demote_0 on an-c03n01.alteeve.ca
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=32, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 16: demote drbd_r0_demote_0 on an-c03n01.alteeve.ca
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 56: notify drbd_r0_post_notify_demote_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 58: notify drbd_r0_post_notify_demote_0 on an-c03n01.alteeve.ca
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=33, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 48: notify drbd_r0_pre_notify_stop_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 50: notify drbd_r0_pre_notify_stop_0 on an-c03n01.alteeve.ca
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=34, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 3: stop drbd_r0_stop_0 on an-c03n01.alteeve.ca
> Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969170] block drbd0: State change failed: Refusing to be Primary while peer is not outdated
> Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969190] block drbd0:   state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
> Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969196] block drbd0:  wanted = { cs:TearDown ro:Primary/Unknown ds:UpToDate/DUnknown s---F- }
> Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969201] drbd r0: State change failed: Refusing to be Primary while peer is not outdated
> Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969205] drbd r0:  mask = 0x1f0 val = 0x70
> Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969218] drbd r0: old_conn:WFReportParams wanted_conn:TearDown
> Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969396] drbd r0: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> Outdated )
> Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969407] drbd r0: asender terminated
> Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969408] drbd r0: Terminating drbd_a_r0
> Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969457] block drbd0: new current UUID FD6969A6E17CBA41:853E72BBF0C9260D:AA976D5345E69DAA:AA966D5345E69DAA
> Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969708] drbd r0: Connection closed
> Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969717] drbd r0: conn( TearDown -> Unconnected )
> Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969718] drbd r0: receiver terminated
> Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969719] drbd r0: Restarting receiver thread
> Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969720] drbd r0: receiver (re)started
> Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969725] drbd r0: conn( Unconnected -> WFConnection )
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 49: notify drbd_r0_post_notify_stop_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=35, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: run_graph: Transition 7 (Complete=21, Pending=0, Fired=0, Skipped=7, Incomplete=5, Source=/var/lib/pacemaker/pengine/pe-input-171.bz2): Stopped
> Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jan 27 20:32:20 an-c03n02 pengine[21132]: warning: unpack_rsc_op: Processing failed op promote for drbd_r0:1 on an-c03n01.alteeve.ca: unknown error (1)
> Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: LogActions: Start drbd_r0:1	(an-c03n01.alteeve.ca)
> Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 8: /var/lib/pacemaker/pengine/pe-input-172.bz2
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 44: notify drbd_r0_pre_notify_start_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=36, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 15: start drbd_r0_start_0 on an-c03n01.alteeve.ca
> Jan 27 20:32:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 45: notify drbd_r0_post_notify_start_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:32:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 46: notify drbd_r0_post_notify_start_0 on an-c03n01.alteeve.ca
> Jan 27 20:32:21 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=37, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:32:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 16: monitor drbd_r0_monitor_60000 on an-c03n01.alteeve.ca
> Jan 27 20:32:21 an-c03n02 crmd[21133]: notice: run_graph: Transition 8 (Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-172.bz2): Complete
> Jan 27 20:32:21 an-c03n02 crmd[21133]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> Jan 27 20:32:21 an-c03n02 kernel: [ 5251.721118] drbd r0: Handshake successful: Agreed network protocol version 101
> Jan 27 20:32:21 an-c03n02 kernel: [ 5251.721120] drbd r0: Agreed to support TRIM on protocol level
> Jan 27 20:32:21 an-c03n02 kernel: [ 5251.721145] drbd r0: conn( WFConnection -> WFReportParams )
> Jan 27 20:32:21 an-c03n02 kernel: [ 5251.721146] drbd r0: Starting asender thread (from drbd_r_r0 [21724])
> Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730101] block drbd0: drbd_sync_handshake:
> Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730104] block drbd0: self FD6969A6E17CBA41:853E72BBF0C9260D:AA976D5345E69DAA:AA966D5345E69DAA bits:0 flags:0
> Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730106] block drbd0: peer 853E72BBF0C9260C:0000000000000000:AA976D5345E69DAA:AA966D5345E69DAA bits:0 flags:0
> Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730107] block drbd0: uuid_compare()=1 by rule 70
> Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730111] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> Consistent )
> Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730229] block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730496] block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730499] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0
> Jan 27 20:32:21 an-c03n02 kernel: [ 5251.731835] block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0)
> Jan 27 20:32:21 an-c03n02 kernel: [ 5251.731848] block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent )
> Jan 27 20:32:21 an-c03n02 kernel: [ 5251.731861] block drbd0: Began resync as SyncSource (will sync 0 KB [0 bits set]).
> Jan 27 20:32:21 an-c03n02 kernel: [ 5251.731888] block drbd0: updated sync UUID FD6969A6E17CBA41:853F72BBF0C9260D:853E72BBF0C9260D:AA976D5345E69DAA
> Jan 27 20:32:21 an-c03n02 kernel: [ 5251.750241] block drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
> Jan 27 20:32:21 an-c03n02 kernel: [ 5251.750248] block drbd0: updated UUIDs FD6969A6E17CBA41:0000000000000000:853F72BBF0C9260D:853E72BBF0C9260D
> Jan 27 20:32:21 an-c03n02 kernel: [ 5251.750253] block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
> ====
> 
> 
> What?
> 
> After about a moment, things sort of clear up:
> 
> ====
> [root at an-c03n02 ~]# pcs status
> Cluster name: an-cluster-03
> Last updated: Mon Jan 27 20:34:37 2014
> Last change: Mon Jan 27 20:32:05 2014 via cibadmin on an-c03n01.alteeve.ca
> Stack: corosync
> Current DC: an-c03n02.alteeve.ca (2) - partition with quorum
> Version: 1.1.10-19.el7-368c726
> 2 Nodes configured
> 4 Resources configured
> 
> 
> Online: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
> 
> Full list of resources:
> 
> fence_n01_virsh	(stonith:fence_virsh):	Started an-c03n01.alteeve.ca
> fence_n02_virsh	(stonith:fence_virsh):	Started an-c03n02.alteeve.ca
> Master/Slave Set: drbd_r0_Clone [drbd_r0]
>     Masters: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
> 
> Failed actions:
>    drbd_r0_promote_0 on an-c03n01.alteeve.ca 'unknown error' (1): call=30, status=complete, last-rc-change='Mon Jan 27 20:32:05 2014', queued=15187ms, exec=0ms
> 
> 
> PCSD Status:
> an-c03n01.alteeve.ca:
>  an-c03n01.alteeve.ca: Online
> an-c03n02.alteeve.ca:
>  an-c03n02.alteeve.ca: Online
> 
> Daemon Status:
>  corosync: active/disabled
>  pacemaker: active/disabled
>  pcsd: active/enabled
> ====
> 
> 
> Post-enable logs from an-c03n01:
> ====
> Jan 27 20:33:21 an-c03n01 attrd[843]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (10000)
> Jan 27 20:33:21 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent update 48: master-drbd_r0=10000
> Jan 27 20:33:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=41, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:33:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=42, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:33:21 an-c03n01 kernel: [19903.079190] block drbd0: role( Secondary -> Primary )
> Jan 27 20:33:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_promote_0 (call=43, rc=0, cib-update=25, confirmed=true) ok
> Jan 27 20:33:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=44, rc=0, cib-update=0, confirmed=true) ok
> ====
> 
> Post-enable logs from an-c03n01:
> ====
> Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Jan 27 20:33:21 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jan 27 20:33:21 an-c03n02 pengine[21132]: warning: unpack_rsc_op: Processing failed op promote for drbd_r0:1 on an-c03n01.alteeve.ca: unknown error (1)
> Jan 27 20:33:21 an-c03n02 pengine[21132]: notice: LogActions: Promote drbd_r0:1	(Slave -> Master an-c03n01.alteeve.ca)
> Jan 27 20:33:21 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 9: /var/lib/pacemaker/pengine/pe-input-173.bz2
> Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 3: cancel drbd_r0_cancel_60000 on an-c03n01.alteeve.ca
> Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 53: notify drbd_r0_pre_notify_promote_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 55: notify drbd_r0_pre_notify_promote_0 on an-c03n01.alteeve.ca
> Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=38, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: run_graph: Transition 9 (Complete=4, Pending=0, Fired=0, Skipped=3, Incomplete=5, Source=/var/lib/pacemaker/pengine/pe-input-173.bz2): Stopped
> Jan 27 20:33:21 an-c03n02 pengine[21132]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jan 27 20:33:21 an-c03n02 pengine[21132]: warning: unpack_rsc_op: Processing failed op promote for drbd_r0:1 on an-c03n01.alteeve.ca: unknown error (1)
> Jan 27 20:33:21 an-c03n02 pengine[21132]: notice: LogActions: Promote drbd_r0:1	(Slave -> Master an-c03n01.alteeve.ca)
> Jan 27 20:33:21 an-c03n02 pengine[21132]: notice: process_pe_message: Calculated Transition 10: /var/lib/pacemaker/pengine/pe-input-174.bz2
> Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 52: notify drbd_r0_pre_notify_promote_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 54: notify drbd_r0_pre_notify_promote_0 on an-c03n01.alteeve.ca
> Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=39, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 17: promote drbd_r0_promote_0 on an-c03n01.alteeve.ca
> Jan 27 20:33:21 an-c03n02 kernel: [ 5311.444071] block drbd0: peer( Secondary -> Primary )
> Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 53: notify drbd_r0_post_notify_promote_0 on an-c03n02.alteeve.ca (local)
> Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command: Initiating action 55: notify drbd_r0_post_notify_promote_0 on an-c03n01.alteeve.ca
> Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=40, rc=0, cib-update=0, confirmed=true) ok
> Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: run_graph: Transition 10 (Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-174.bz2): Complete
> Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> ====
> 
> I have no idea what's going wrong here... I'd love for any insight/help.
> 
> -- 
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without access to education?
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140217/a3d435ef/attachment-0002.sig>