[Pacemaker] Having trouble with DRBD 8.4.4 on RHEL 7 beta w/ Pacemaker 1.1.10 - calls crm-fence-peer.sh when restarting the drbd resource
Digimer
lists at alteeve.ca
Tue Jan 28 01:36:43 UTC 2014
Hi all,
I initially posted this to the DRBD mailing list, but it got
moderated for being too large. I hope it's ok to cross-post it here in
the mean time.
I'm trying to get DRBD dual-primary working on pacemaker 1.1.10 on
RHEL 7 (beta 1). It's mostly working, except for a really strange problem.
When I start pacemaker/corosync, DRBD starts and promotes to primary
on both nodes quickly and without issue. After that, if I disable the
DRBD resource, both nodes stop drbd just fine.
The problem is when I try to re-enable the DRBD resource... One of
the nodes will invoke crm-fence-peer.sh, which in turn adds a constraint
blocking DRBD from becoming primary on one of the nodes (seems to be
random, it's done this to both nodes). This, of course, leads to the
resource entering a FAILED state on one of the nodes.
I tried adding: handlers { after-resync-target
"/usr/lib/drbd/crm-unfence-peer.sh"; }. With this in place, eventually
(about 60 seconds later), crm-unfence-peer.sh was called and the
constraint was removed. However, by then, the resource had already
entered a failed state.
Here is the current config:
====
[root at an-c03n01 ~]# drbdadm dump
# /etc/drbd.conf
global {
usage-count yes;
}
common {
net {
protocol C;
allow-two-primaries yes;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
disk {
fencing resource-and-stonith;
}
handlers {
fence-peer /usr/lib/drbd/crm-fence-peer.sh;
after-resync-target /usr/lib/drbd/crm-unfence-peer.sh;
}
}
# resource r0 on an-c03n01.alteeve.ca: not ignored, not stacked
# defined at /etc/drbd.d/r0.res:3
resource r0 {
on an-c03n01.alteeve.ca {
volume 0 {
device /dev/drbd0 minor 0;
disk /dev/vdb1;
meta-disk internal;
}
address ipv4 10.10.30.1:7788;
}
on an-c03n02.alteeve.ca {
volume 0 {
device /dev/drbd0 minor 0;
disk /dev/vdb1;
meta-disk internal;
}
address ipv4 10.10.30.2:7788;
}
net {
verify-alg md5;
data-integrity-alg md5;
}
disk {
disk-flushes no;
md-flushes no;
}
}
====
I'll walk through the steps, showing the logs from both nodes as I go.
First, I start the cluster:
====
[root at an-c03n01 ~]# pcs cluster start --all
an-c03n01.alteeve.ca: Starting Cluster...
an-c03n02.alteeve.ca: Starting Cluster...
====
[root at an-c03n02 ~]# pcs status
Cluster name: an-cluster-03
Last updated: Mon Jan 27 20:26:38 2014
Last change: Mon Jan 27 20:25:06 2014 via crmd on an-c03n01.alteeve.ca
Stack: corosync
Current DC: an-c03n02.alteeve.ca (2) - partition with quorum
Version: 1.1.10-19.el7-368c726
2 Nodes configured
4 Resources configured
Online: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
Full list of resources:
fence_n01_virsh (stonith:fence_virsh): Started an-c03n01.alteeve.ca
fence_n02_virsh (stonith:fence_virsh): Started an-c03n02.alteeve.ca
Master/Slave Set: drbd_r0_Clone [drbd_r0]
Masters: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
PCSD Status:
an-c03n01.alteeve.ca:
an-c03n01.alteeve.ca: Online
an-c03n02.alteeve.ca:
an-c03n02.alteeve.ca: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
====
[root at an-c03n02 ~]# cat /proc/drbd
version: 8.4.4 (api:1/proto:86-101)
GIT-hash: 74402fecf24da8e5438171ee8c19e28627e1c98a build by
root at an-c03n02.alteeve.ca, 2014-01-26 16:48:51
0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
ns:0 nr:0 dw:0 dr:152 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
====
Startup logs from an-c03n01:
====
Jan 27 20:26:09 an-c03n01 systemd: Starting Corosync Cluster Engine...
Jan 27 20:26:09 an-c03n01 corosync[823]: [MAIN ] Corosync Cluster
Engine ('2.3.2'): started and ready to provide service.
Jan 27 20:26:09 an-c03n01 corosync[823]: [MAIN ] Corosync built-in
features: dbus systemd xmlconf snmp pie relro bindnow
Jan 27 20:26:09 an-c03n01 corosync[824]: [TOTEM ] Initializing transport
(UDP/IP Unicast).
Jan 27 20:26:09 an-c03n01 corosync[824]: [TOTEM ] Initializing
transmit/receive security (NSS) crypto: none hash: none
Jan 27 20:26:09 an-c03n01 corosync[824]: [TOTEM ] The network interface
[10.20.30.1] is now up.
Jan 27 20:26:09 an-c03n01 corosync[824]: [SERV ] Service engine loaded:
corosync configuration map access [0]
Jan 27 20:26:09 an-c03n01 corosync[824]: [QB ] server name: cmap
Jan 27 20:26:09 an-c03n01 corosync[824]: [SERV ] Service engine loaded:
corosync configuration service [1]
Jan 27 20:26:09 an-c03n01 corosync[824]: [QB ] server name: cfg
Jan 27 20:26:09 an-c03n01 corosync[824]: [SERV ] Service engine loaded:
corosync cluster closed process group service v1.01 [2]
Jan 27 20:26:09 an-c03n01 corosync[824]: [QB ] server name: cpg
Jan 27 20:26:09 an-c03n01 corosync[824]: [SERV ] Service engine loaded:
corosync profile loading service [4]
Jan 27 20:26:09 an-c03n01 corosync[824]: [QUORUM] Using quorum provider
corosync_votequorum
Jan 27 20:26:09 an-c03n01 corosync[824]: [QUORUM] Waiting for all
cluster members. Current votes: 1 expected_votes: 2
Jan 27 20:26:09 an-c03n01 corosync[824]: [SERV ] Service engine loaded:
corosync vote quorum service v1.0 [5]
Jan 27 20:26:09 an-c03n01 corosync[824]: [QB ] server name: votequorum
Jan 27 20:26:09 an-c03n01 corosync[824]: [SERV ] Service engine loaded:
corosync cluster quorum service v0.1 [3]
Jan 27 20:26:09 an-c03n01 corosync[824]: [QB ] server name: quorum
Jan 27 20:26:09 an-c03n01 corosync[824]: [TOTEM ] adding new UDPU member
{10.20.30.1}
Jan 27 20:26:09 an-c03n01 corosync[824]: [TOTEM ] adding new UDPU member
{10.20.30.2}
Jan 27 20:26:09 an-c03n01 corosync[824]: [TOTEM ] A new membership
(10.20.30.1:200) was formed. Members joined: 1
Jan 27 20:26:09 an-c03n01 corosync[824]: [QUORUM] Waiting for all
cluster members. Current votes: 1 expected_votes: 2
Jan 27 20:26:09 an-c03n01 corosync[824]: [QUORUM] Waiting for all
cluster members. Current votes: 1 expected_votes: 2
Jan 27 20:26:09 an-c03n01 corosync[824]: [QUORUM] Waiting for all
cluster members. Current votes: 1 expected_votes: 2
Jan 27 20:26:09 an-c03n01 corosync[824]: [QUORUM] Members[1]: 1
Jan 27 20:26:09 an-c03n01 corosync[824]: [MAIN ] Completed service
synchronization, ready to provide service.
Jan 27 20:26:10 an-c03n01 corosync[824]: [TOTEM ] A new membership
(10.20.30.1:208) was formed. Members joined: 2
Jan 27 20:26:10 an-c03n01 corosync[824]: [QUORUM] Waiting for all
cluster members. Current votes: 1 expected_votes: 2
Jan 27 20:26:10 an-c03n01 corosync[824]: [QUORUM] This node is within
the primary component and will provide service.
Jan 27 20:26:10 an-c03n01 corosync[824]: [QUORUM] Members[2]: 1 2
Jan 27 20:26:10 an-c03n01 corosync[824]: [MAIN ] Completed service
synchronization, ready to provide service.
Jan 27 20:26:10 an-c03n01 corosync: Starting Corosync Cluster Engine
(corosync): [ OK ]
Jan 27 20:26:10 an-c03n01 systemd: Started Corosync Cluster Engine.
Jan 27 20:26:10 an-c03n01 systemd: Starting Pacemaker High Availability
Cluster Manager...
Jan 27 20:26:10 an-c03n01 systemd: Started Pacemaker High Availability
Cluster Manager.
Jan 27 20:26:10 an-c03n01 pacemakerd: Could not establish pacemakerd
connection: Connection refused (111)
Jan 27 20:26:10 an-c03n01 pacemakerd[839]: notice: mcp_read_config:
Configured corosync to accept connections from group 189: OK (1)
Jan 27 20:26:10 an-c03n01 pacemakerd[839]: notice: main: Starting
Pacemaker 1.1.10-19.el7 (Build: 368c726): generated-manpages
agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc
upstart systemd nagios corosync-native
Jan 27 20:26:10 an-c03n01 pacemakerd[839]: notice:
cluster_connect_quorum: Quorum acquired
Jan 27 20:26:10 an-c03n01 pacemakerd[839]: notice:
crm_update_peer_state: pcmk_quorum_notification: Node
an-c03n01.alteeve.ca[1] - state is now member (was (null))
Jan 27 20:26:10 an-c03n01 pacemakerd[839]: notice:
crm_update_peer_state: pcmk_quorum_notification: Node
an-c03n02.alteeve.ca[2] - state is now member (was (null))
Jan 27 20:26:10 an-c03n01 attrd[843]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: corosync
Jan 27 20:26:10 an-c03n01 crmd[845]: notice: main: CRM Git Version: 368c726
Jan 27 20:26:10 an-c03n01 stonith-ng[841]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: corosync
Jan 27 20:26:10 an-c03n01 attrd[843]: notice: main: Starting mainloop...
Jan 27 20:26:10 an-c03n01 cib[840]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: corosync
Jan 27 20:26:11 an-c03n01 crmd[845]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: corosync
Jan 27 20:26:11 an-c03n01 crmd[845]: notice: cluster_connect_quorum:
Quorum acquired
Jan 27 20:26:11 an-c03n01 stonith-ng[841]: notice: setup_cib: Watching
for stonith topology changes
Jan 27 20:26:11 an-c03n01 stonith-ng[841]: notice: unpack_config: On
loss of CCM Quorum: Ignore
Jan 27 20:26:11 an-c03n01 crmd[845]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node an-c03n01.alteeve.ca[1] - state is now
member (was (null))
Jan 27 20:26:11 an-c03n01 crmd[845]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node an-c03n02.alteeve.ca[2] - state is now
member (was (null))
Jan 27 20:26:11 an-c03n01 crmd[845]: notice: do_started: The local CRM
is operational
Jan 27 20:26:11 an-c03n01 crmd[845]: notice: do_state_transition: State
transition S_STARTING -> S_PENDING [ input=I_PENDING
cause=C_FSA_INTERNAL origin=do_started ]
Jan 27 20:26:12 an-c03n01 stonith-ng[841]: notice:
stonith_device_register: Added 'fence_n01_virsh' to the device list (1
active devices)
Jan 27 20:26:13 an-c03n01 stonith-ng[841]: notice:
stonith_device_register: Added 'fence_n02_virsh' to the device list (2
active devices)
Jan 27 20:26:32 an-c03n01 crmd[845]: warning: do_log: FSA: Input
I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
Jan 27 20:26:32 an-c03n01 crmd[845]: notice: do_state_transition: State
transition S_ELECTION -> S_PENDING [ input=I_PENDING
cause=C_FSA_INTERNAL origin=do_election_count_vote ]
Jan 27 20:26:32 an-c03n01 crmd[845]: notice: do_state_transition: State
transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE
origin=do_cl_join_finalize_respond ]
Jan 27 20:26:32 an-c03n01 attrd[843]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Jan 27 20:26:33 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_monitor_0 (call=14, rc=7, cib-update=11,
confirmed=true) not running
Jan 27 20:26:33 an-c03n01 attrd[843]: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
Jan 27 20:26:33 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent
update 4: probe_complete=true
Jan 27 20:26:33 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent
update 7: probe_complete=true
Jan 27 20:26:34 an-c03n01 stonith-ng[841]: notice:
stonith_device_register: Device 'fence_n01_virsh' already existed in
device list (2 active devices)
Jan 27 20:26:34 an-c03n01 kernel: [19496.418912] drbd r0: Starting
worker thread (from drbdsetup [946])
Jan 27 20:26:34 an-c03n01 kernel: [19496.419207] block drbd0: disk(
Diskless -> Attaching )
Jan 27 20:26:34 an-c03n01 kernel: [19496.419268] drbd r0: Method to
ensure write ordering: drain
Jan 27 20:26:34 an-c03n01 kernel: [19496.419270] block drbd0: max BIO
size = 1048576
Jan 27 20:26:34 an-c03n01 kernel: [19496.419273] block drbd0: Adjusting
my ra_pages to backing device's (32 -> 1024)
Jan 27 20:26:34 an-c03n01 kernel: [19496.419275] block drbd0:
drbd_bm_resize called with capacity == 41937592
Jan 27 20:26:34 an-c03n01 kernel: [19496.419346] block drbd0: resync
bitmap: bits=5242199 words=81910 pages=160
Jan 27 20:26:34 an-c03n01 kernel: [19496.419348] block drbd0: size = 20
GB (20968796 KB)
Jan 27 20:26:34 an-c03n01 kernel: [19496.420788] block drbd0: bitmap
READ of 160 pages took 1 jiffies
Jan 27 20:26:34 an-c03n01 kernel: [19496.420892] block drbd0: recounting
of set bits took additional 0 jiffies
Jan 27 20:26:34 an-c03n01 kernel: [19496.420895] block drbd0: 0 KB (0
bits) marked out-of-sync by on disk bit-map.
Jan 27 20:26:34 an-c03n01 kernel: [19496.420900] block drbd0: disk(
Attaching -> Consistent )
Jan 27 20:26:34 an-c03n01 kernel: [19496.420904] block drbd0: attached
to UUIDs AA966D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D
Jan 27 20:26:34 an-c03n01 kernel: [19496.428933] drbd r0: conn(
StandAlone -> Unconnected )
Jan 27 20:26:34 an-c03n01 kernel: [19496.428949] drbd r0: Starting
receiver thread (from drbd_w_r0 [947])
Jan 27 20:26:34 an-c03n01 kernel: [19496.428970] drbd r0: receiver
(re)started
Jan 27 20:26:34 an-c03n01 kernel: [19496.428978] drbd r0: conn(
Unconnected -> WFConnection )
Jan 27 20:26:34 an-c03n01 attrd[843]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd_r0 (5)
Jan 27 20:26:34 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent
update 11: master-drbd_r0=5
Jan 27 20:26:34 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_start_0 (call=16, rc=0, cib-update=12, confirmed=true) ok
Jan 27 20:26:34 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=17, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:26:35 an-c03n01 kernel: [19496.930042] drbd r0: Handshake
successful: Agreed network protocol version 101
Jan 27 20:26:35 an-c03n01 kernel: [19496.930046] drbd r0: Agreed to
support TRIM on protocol level
Jan 27 20:26:35 an-c03n01 kernel: [19496.930093] drbd r0: conn(
WFConnection -> WFReportParams )
Jan 27 20:26:35 an-c03n01 kernel: [19496.930095] drbd r0: Starting
asender thread (from drbd_r_r0 [956])
Jan 27 20:26:35 an-c03n01 kernel: [19496.937081] block drbd0:
drbd_sync_handshake:
Jan 27 20:26:35 an-c03n01 kernel: [19496.937086] block drbd0: self
AA966D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D
bits:0 flags:0
Jan 27 20:26:35 an-c03n01 kernel: [19496.937088] block drbd0: peer
AA966D5345E69DAA:0000000000000000:4F366962CD263E3C:4F356962CD263E3D
bits:0 flags:0
Jan 27 20:26:35 an-c03n01 kernel: [19496.937091] block drbd0:
uuid_compare()=0 by rule 40
Jan 27 20:26:35 an-c03n01 kernel: [19496.937098] block drbd0: peer(
Unknown -> Secondary ) conn( WFReportParams -> Connected ) disk(
Consistent -> UpToDate ) pdsk( DUnknown -> UpToDate )
Jan 27 20:26:35 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation fence_n01_virsh_start_0 (call=15, rc=0, cib-update=13,
confirmed=true) ok
Jan 27 20:26:35 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=19, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:26:35 an-c03n01 kernel: [19497.258935] block drbd0: peer(
Secondary -> Primary )
Jan 27 20:26:35 an-c03n01 kernel: [19497.262592] block drbd0: role(
Secondary -> Primary )
Jan 27 20:26:35 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_promote_0 (call=20, rc=0, cib-update=14,
confirmed=true) ok
Jan 27 20:26:35 an-c03n01 attrd[843]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd_r0 (10000)
Jan 27 20:26:35 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent
update 13: master-drbd_r0=10000
Jan 27 20:26:35 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=21, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:26:35 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent
update 15: master-drbd_r0=10000
Jan 27 20:26:36 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation fence_n01_virsh_monitor_60000 (call=18, rc=0, cib-update=15,
confirmed=false) ok
====
Startup logs from an-c03n02:
====
Jan 27 20:26:09 an-c03n02 systemd: Starting Corosync Cluster Engine...
Jan 27 20:26:09 an-c03n02 corosync[21111]: [MAIN ] Corosync Cluster
Engine ('2.3.2'): started and ready to provide service.
Jan 27 20:26:09 an-c03n02 corosync[21111]: [MAIN ] Corosync built-in
features: dbus systemd xmlconf snmp pie relro bindnow
Jan 27 20:26:09 an-c03n02 corosync[21112]: [TOTEM ] Initializing
transport (UDP/IP Unicast).
Jan 27 20:26:09 an-c03n02 corosync[21112]: [TOTEM ] Initializing
transmit/receive security (NSS) crypto: none hash: none
Jan 27 20:26:09 an-c03n02 corosync[21112]: [TOTEM ] The network
interface [10.20.30.2] is now up.
Jan 27 20:26:09 an-c03n02 corosync[21112]: [SERV ] Service engine
loaded: corosync configuration map access [0]
Jan 27 20:26:09 an-c03n02 corosync[21112]: [QB ] server name: cmap
Jan 27 20:26:09 an-c03n02 corosync[21112]: [SERV ] Service engine
loaded: corosync configuration service [1]
Jan 27 20:26:09 an-c03n02 corosync[21112]: [QB ] server name: cfg
Jan 27 20:26:09 an-c03n02 corosync[21112]: [SERV ] Service engine
loaded: corosync cluster closed process group service v1.01 [2]
Jan 27 20:26:09 an-c03n02 corosync[21112]: [QB ] server name: cpg
Jan 27 20:26:09 an-c03n02 corosync[21112]: [SERV ] Service engine
loaded: corosync profile loading service [4]
Jan 27 20:26:09 an-c03n02 corosync[21112]: [QUORUM] Using quorum
provider corosync_votequorum
Jan 27 20:26:09 an-c03n02 corosync[21112]: [QUORUM] Waiting for all
cluster members. Current votes: 1 expected_votes: 2
Jan 27 20:26:09 an-c03n02 corosync[21112]: [SERV ] Service engine
loaded: corosync vote quorum service v1.0 [5]
Jan 27 20:26:09 an-c03n02 corosync[21112]: [QB ] server name: votequorum
Jan 27 20:26:09 an-c03n02 corosync[21112]: [SERV ] Service engine
loaded: corosync cluster quorum service v0.1 [3]
Jan 27 20:26:09 an-c03n02 corosync[21112]: [QB ] server name: quorum
Jan 27 20:26:09 an-c03n02 corosync[21112]: [TOTEM ] adding new UDPU
member {10.20.30.1}
Jan 27 20:26:09 an-c03n02 corosync[21112]: [TOTEM ] adding new UDPU
member {10.20.30.2}
Jan 27 20:26:09 an-c03n02 corosync[21112]: [TOTEM ] A new membership
(10.20.30.2:204) was formed. Members joined: 2
Jan 27 20:26:09 an-c03n02 corosync[21112]: [QUORUM] Waiting for all
cluster members. Current votes: 1 expected_votes: 2
Jan 27 20:26:09 an-c03n02 corosync[21112]: [QUORUM] Waiting for all
cluster members. Current votes: 1 expected_votes: 2
Jan 27 20:26:09 an-c03n02 corosync[21112]: [QUORUM] Waiting for all
cluster members. Current votes: 1 expected_votes: 2
Jan 27 20:26:09 an-c03n02 corosync[21112]: [QUORUM] Members[1]: 2
Jan 27 20:26:09 an-c03n02 corosync[21112]: [MAIN ] Completed service
synchronization, ready to provide service.
Jan 27 20:26:10 an-c03n02 corosync[21112]: [TOTEM ] A new membership
(10.20.30.1:208) was formed. Members joined: 1
Jan 27 20:26:10 an-c03n02 corosync[21112]: [QUORUM] This node is within
the primary component and will provide service.
Jan 27 20:26:10 an-c03n02 corosync[21112]: [QUORUM] Members[2]: 1 2
Jan 27 20:26:10 an-c03n02 corosync[21112]: [MAIN ] Completed service
synchronization, ready to provide service.
Jan 27 20:26:10 an-c03n02 corosync: Starting Corosync Cluster Engine
(corosync): [ OK ]
Jan 27 20:26:10 an-c03n02 systemd: Started Corosync Cluster Engine.
Jan 27 20:26:10 an-c03n02 systemd: Starting Pacemaker High Availability
Cluster Manager...
Jan 27 20:26:10 an-c03n02 systemd: Started Pacemaker High Availability
Cluster Manager.
Jan 27 20:26:10 an-c03n02 pacemakerd: Could not establish pacemakerd
connection: Connection refused (111)
Jan 27 20:26:10 an-c03n02 pacemakerd[21127]: notice: mcp_read_config:
Configured corosync to accept connections from group 189: OK (1)
Jan 27 20:26:10 an-c03n02 pacemakerd[21127]: notice: main: Starting
Pacemaker 1.1.10-19.el7 (Build: 368c726): generated-manpages
agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc
upstart systemd nagios corosync-native
Jan 27 20:26:10 an-c03n02 pacemakerd[21127]: notice:
cluster_connect_quorum: Quorum acquired
Jan 27 20:26:10 an-c03n02 pacemakerd[21127]: notice:
crm_update_peer_state: pcmk_quorum_notification: Node
an-c03n01.alteeve.ca[1] - state is now member (was (null))
Jan 27 20:26:10 an-c03n02 pacemakerd[21127]: notice:
crm_update_peer_state: pcmk_quorum_notification: Node
an-c03n02.alteeve.ca[2] - state is now member (was (null))
Jan 27 20:26:10 an-c03n02 stonith-ng[21129]: notice:
crm_cluster_connect: Connecting to cluster infrastructure: corosync
Jan 27 20:26:10 an-c03n02 cib[21128]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: corosync
Jan 27 20:26:10 an-c03n02 crmd[21133]: notice: main: CRM Git Version:
368c726
Jan 27 20:26:10 an-c03n02 attrd[21131]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: corosync
Jan 27 20:26:10 an-c03n02 attrd[21131]: notice: main: Starting mainloop...
Jan 27 20:26:11 an-c03n02 stonith-ng[21129]: notice: setup_cib: Watching
for stonith topology changes
Jan 27 20:26:11 an-c03n02 crmd[21133]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: corosync
Jan 27 20:26:11 an-c03n02 stonith-ng[21129]: notice: unpack_config: On
loss of CCM Quorum: Ignore
Jan 27 20:26:11 an-c03n02 crmd[21133]: notice: cluster_connect_quorum:
Quorum acquired
Jan 27 20:26:11 an-c03n02 crmd[21133]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node an-c03n01.alteeve.ca[1] - state is now
member (was (null))
Jan 27 20:26:11 an-c03n02 crmd[21133]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node an-c03n02.alteeve.ca[2] - state is now
member (was (null))
Jan 27 20:26:11 an-c03n02 crmd[21133]: notice: do_started: The local CRM
is operational
Jan 27 20:26:11 an-c03n02 crmd[21133]: notice: do_state_transition:
State transition S_STARTING -> S_PENDING [ input=I_PENDING
cause=C_FSA_INTERNAL origin=do_started ]
Jan 27 20:26:12 an-c03n02 stonith-ng[21129]: notice:
stonith_device_register: Added 'fence_n01_virsh' to the device list (1
active devices)
Jan 27 20:26:13 an-c03n02 stonith-ng[21129]: notice:
stonith_device_register: Added 'fence_n02_virsh' to the device list (2
active devices)
Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: do_state_transition:
State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Jan 27 20:26:32 an-c03n02 attrd[21131]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Jan 27 20:26:32 an-c03n02 pengine[21132]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Jan 27 20:26:32 an-c03n02 pengine[21132]: notice: LogActions: Start
fence_n01_virsh (an-c03n01.alteeve.ca)
Jan 27 20:26:32 an-c03n02 pengine[21132]: notice: LogActions: Start
fence_n02_virsh (an-c03n02.alteeve.ca)
Jan 27 20:26:32 an-c03n02 pengine[21132]: notice: LogActions: Start
drbd_r0:0 (an-c03n01.alteeve.ca)
Jan 27 20:26:32 an-c03n02 pengine[21132]: notice: LogActions: Start
drbd_r0:1 (an-c03n02.alteeve.ca)
Jan 27 20:26:32 an-c03n02 pengine[21132]: notice: process_pe_message:
Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-164.bz2
Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 8: monitor fence_n01_virsh_monitor_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 4: monitor fence_n01_virsh_monitor_0 on
an-c03n01.alteeve.ca
Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 9: monitor fence_n02_virsh_monitor_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 5: monitor fence_n02_virsh_monitor_0 on
an-c03n01.alteeve.ca
Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 6: monitor drbd_r0:0_monitor_0 on an-c03n01.alteeve.ca
Jan 27 20:26:32 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 10: monitor drbd_r0:1_monitor_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_monitor_0 (call=14, rc=7, cib-update=28,
confirmed=true) not running
Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 7: probe_complete probe_complete on
an-c03n02.alteeve.ca (local) - no waiting
Jan 27 20:26:33 an-c03n02 attrd[21131]: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
Jan 27 20:26:33 an-c03n02 attrd[21131]: notice: attrd_perform_update:
Sent update 4: probe_complete=true
Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 3: probe_complete probe_complete on
an-c03n01.alteeve.ca - no waiting
Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 11: start fence_n01_virsh_start_0 on an-c03n01.alteeve.ca
Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 13: start fence_n02_virsh_start_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 15: start drbd_r0:0_start_0 on an-c03n01.alteeve.ca
Jan 27 20:26:33 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 17: start drbd_r0:1_start_0 on an-c03n02.alteeve.ca
(local)
Jan 27 20:26:34 an-c03n02 stonith-ng[21129]: notice:
stonith_device_register: Device 'fence_n02_virsh' already existed in
device list (2 active devices)
Jan 27 20:26:34 an-c03n02 kernel: [ 4904.724683] drbd r0: Starting
worker thread (from drbdsetup [21238])
Jan 27 20:26:34 an-c03n02 kernel: [ 4904.724970] block drbd0: disk(
Diskless -> Attaching )
Jan 27 20:26:34 an-c03n02 kernel: [ 4904.725081] drbd r0: Method to
ensure write ordering: drain
Jan 27 20:26:34 an-c03n02 kernel: [ 4904.725084] block drbd0: max BIO
size = 1048576
Jan 27 20:26:34 an-c03n02 kernel: [ 4904.725087] block drbd0: Adjusting
my ra_pages to backing device's (32 -> 1024)
Jan 27 20:26:34 an-c03n02 kernel: [ 4904.725090] block drbd0:
drbd_bm_resize called with capacity == 41937592
Jan 27 20:26:34 an-c03n02 kernel: [ 4904.725180] block drbd0: resync
bitmap: bits=5242199 words=81910 pages=160
Jan 27 20:26:34 an-c03n02 kernel: [ 4904.725183] block drbd0: size = 20
GB (20968796 KB)
Jan 27 20:26:34 an-c03n02 kernel: [ 4904.727769] block drbd0: bitmap
READ of 160 pages took 2 jiffies
Jan 27 20:26:34 an-c03n02 kernel: [ 4904.727981] block drbd0: recounting
of set bits took additional 0 jiffies
Jan 27 20:26:34 an-c03n02 kernel: [ 4904.727985] block drbd0: 0 KB (0
bits) marked out-of-sync by on disk bit-map.
Jan 27 20:26:34 an-c03n02 kernel: [ 4904.728001] block drbd0: disk(
Attaching -> Consistent )
Jan 27 20:26:34 an-c03n02 kernel: [ 4904.728013] block drbd0: attached
to UUIDs AA966D5345E69DAA:0000000000000000:4F366962CD263E3C:4F356962CD263E3D
Jan 27 20:26:34 an-c03n02 kernel: [ 4904.738601] drbd r0: conn(
StandAlone -> Unconnected )
Jan 27 20:26:34 an-c03n02 kernel: [ 4904.738688] drbd r0: Starting
receiver thread (from drbd_w_r0 [21239])
Jan 27 20:26:34 an-c03n02 kernel: [ 4904.738709] drbd r0: receiver
(re)started
Jan 27 20:26:34 an-c03n02 kernel: [ 4904.738721] drbd r0: conn(
Unconnected -> WFConnection )
Jan 27 20:26:34 an-c03n02 attrd[21131]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd_r0 (5)
Jan 27 20:26:34 an-c03n02 attrd[21131]: notice: attrd_perform_update:
Sent update 9: master-drbd_r0=5
Jan 27 20:26:34 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_start_0 (call=16, rc=0, cib-update=29, confirmed=true) ok
Jan 27 20:26:34 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 48: notify drbd_r0:0_post_notify_start_0 on
an-c03n01.alteeve.ca
Jan 27 20:26:34 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 49: notify drbd_r0:1_post_notify_start_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:26:34 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=17, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:26:35 an-c03n02 kernel: [ 4905.294095] drbd r0: Handshake
successful: Agreed network protocol version 101
Jan 27 20:26:35 an-c03n02 kernel: [ 4905.294099] drbd r0: Agreed to
support TRIM on protocol level
Jan 27 20:26:35 an-c03n02 kernel: [ 4905.294132] drbd r0: conn(
WFConnection -> WFReportParams )
Jan 27 20:26:35 an-c03n02 kernel: [ 4905.294134] drbd r0: Starting
asender thread (from drbd_r_r0 [21248])
Jan 27 20:26:35 an-c03n02 kernel: [ 4905.303108] block drbd0:
drbd_sync_handshake:
Jan 27 20:26:35 an-c03n02 kernel: [ 4905.303112] block drbd0: self
AA966D5345E69DAA:0000000000000000:4F366962CD263E3C:4F356962CD263E3D
bits:0 flags:0
Jan 27 20:26:35 an-c03n02 kernel: [ 4905.303114] block drbd0: peer
AA966D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D
bits:0 flags:0
Jan 27 20:26:35 an-c03n02 kernel: [ 4905.303115] block drbd0:
uuid_compare()=0 by rule 40
Jan 27 20:26:35 an-c03n02 kernel: [ 4905.303120] block drbd0: peer(
Unknown -> Secondary ) conn( WFReportParams -> Connected ) disk(
Consistent -> UpToDate ) pdsk( DUnknown -> UpToDate )
Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation fence_n02_virsh_start_0 (call=15, rc=0, cib-update=30,
confirmed=true) ok
Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: run_graph: Transition 0
(Complete=21, Pending=0, Fired=0, Skipped=4, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-164.bz2): Stopped
Jan 27 20:26:35 an-c03n02 pengine[21132]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Jan 27 20:26:35 an-c03n02 pengine[21132]: notice: LogActions: Promote
drbd_r0:0 (Slave -> Master an-c03n02.alteeve.ca)
Jan 27 20:26:35 an-c03n02 pengine[21132]: notice: LogActions: Promote
drbd_r0:1 (Slave -> Master an-c03n01.alteeve.ca)
Jan 27 20:26:35 an-c03n02 pengine[21132]: notice: process_pe_message:
Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-165.bz2
Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 7: monitor fence_n01_virsh_monitor_60000 on
an-c03n01.alteeve.ca
Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 10: monitor fence_n02_virsh_monitor_60000 on
an-c03n02.alteeve.ca (local)
Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 52: notify drbd_r0_pre_notify_promote_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 54: notify drbd_r0_pre_notify_promote_0 on
an-c03n01.alteeve.ca
Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=19, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 13: promote drbd_r0_promote_0 on an-c03n02.alteeve.ca
(local)
Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 16: promote drbd_r0_promote_0 on an-c03n01.alteeve.ca
Jan 27 20:26:35 an-c03n02 kernel: [ 4905.623345] block drbd0: role(
Secondary -> Primary )
Jan 27 20:26:35 an-c03n02 kernel: [ 4905.626560] block drbd0: peer(
Secondary -> Primary )
Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_promote_0 (call=20, rc=0, cib-update=32,
confirmed=true) ok
Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 53: notify drbd_r0_post_notify_promote_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 55: notify drbd_r0_post_notify_promote_0 on
an-c03n01.alteeve.ca
Jan 27 20:26:35 an-c03n02 attrd[21131]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd_r0 (10000)
Jan 27 20:26:35 an-c03n02 attrd[21131]: notice: attrd_perform_update:
Sent update 13: master-drbd_r0=10000
Jan 27 20:26:35 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=21, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:26:35 an-c03n02 attrd[21131]: notice: attrd_perform_update:
Sent update 15: master-drbd_r0=10000
Jan 27 20:26:36 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation fence_n02_virsh_monitor_60000 (call=18, rc=0, cib-update=33,
confirmed=false) ok
Jan 27 20:26:36 an-c03n02 crmd[21133]: notice: run_graph: Transition 1
(Complete=14, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-165.bz2): Complete
Jan 27 20:26:36 an-c03n02 pengine[21132]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Jan 27 20:26:36 an-c03n02 pengine[21132]: notice: process_pe_message:
Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-166.bz2
Jan 27 20:26:36 an-c03n02 crmd[21133]: notice: run_graph: Transition 2
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-166.bz2): Complete
Jan 27 20:26:36 an-c03n02 crmd[21133]: notice: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
====
So everything looks good. So now I'll disable the DRBD resource:
====
[root at an-c03n01 ~]# pcs resource disable drbd_r0_Clone
[root at an-c03n01 ~]# pcs constraint
Location Constraints:
Ordering Constraints:
Colocation Constraints:
====
[root at an-c03n02 ~]# pcs status
Cluster name: an-cluster-03
Last updated: Mon Jan 27 20:29:23 2014
Last change: Mon Jan 27 20:29:10 2014 via crm_resource on
an-c03n01.alteeve.ca
Stack: corosync
Current DC: an-c03n02.alteeve.ca (2) - partition with quorum
Version: 1.1.10-19.el7-368c726
2 Nodes configured
4 Resources configured
Online: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
Full list of resources:
fence_n01_virsh (stonith:fence_virsh): Started an-c03n01.alteeve.ca
fence_n02_virsh (stonith:fence_virsh): Started an-c03n02.alteeve.ca
Master/Slave Set: drbd_r0_Clone [drbd_r0]
Stopped: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
PCSD Status:
an-c03n01.alteeve.ca:
an-c03n01.alteeve.ca: Online
an-c03n02.alteeve.ca:
an-c03n02.alteeve.ca: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
====
Disable logs from an-c03n01:
====
Jan 27 20:29:10 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=22, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:29:10 an-c03n01 kernel: [19652.354342] block drbd0: role(
Primary -> Secondary )
Jan 27 20:29:10 an-c03n01 kernel: [19652.354362] block drbd0: bitmap
WRITE of 0 pages took 0 jiffies
Jan 27 20:29:10 an-c03n01 kernel: [19652.354364] block drbd0: 0 KB (0
bits) marked out-of-sync by on disk bit-map.
Jan 27 20:29:10 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_demote_0 (call=23, rc=0, cib-update=16, confirmed=true) ok
Jan 27 20:29:10 an-c03n01 kernel: [19652.363096] block drbd0: peer(
Primary -> Secondary )
Jan 27 20:29:10 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=24, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:29:10 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=25, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:29:10 an-c03n01 kernel: [19652.471517] drbd r0: peer(
Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate ->
DUnknown )
Jan 27 20:29:10 an-c03n01 kernel: [19652.471539] drbd r0: asender terminated
Jan 27 20:29:10 an-c03n01 kernel: [19652.471542] drbd r0: Terminating
drbd_a_r0
Jan 27 20:29:10 an-c03n01 kernel: [19652.472011] drbd r0: conn( TearDown
-> Disconnecting )
Jan 27 20:29:10 an-c03n01 kernel: [19652.472332] drbd r0: Connection closed
Jan 27 20:29:10 an-c03n01 kernel: [19652.472339] drbd r0: conn(
Disconnecting -> StandAlone )
Jan 27 20:29:10 an-c03n01 kernel: [19652.472340] drbd r0: receiver
terminated
Jan 27 20:29:10 an-c03n01 kernel: [19652.472351] drbd r0: Terminating
drbd_r_r0
Jan 27 20:29:10 an-c03n01 kernel: [19652.472377] block drbd0: disk(
UpToDate -> Failed )
Jan 27 20:29:10 an-c03n01 kernel: [19652.482181] block drbd0: bitmap
WRITE of 0 pages took 0 jiffies
Jan 27 20:29:10 an-c03n01 kernel: [19652.482186] block drbd0: 0 KB (0
bits) marked out-of-sync by on disk bit-map.
Jan 27 20:29:10 an-c03n01 kernel: [19652.482208] block drbd0: disk(
Failed -> Diskless )
Jan 27 20:29:10 an-c03n01 kernel: [19652.482288] block drbd0:
drbd_bm_resize called with capacity == 0
Jan 27 20:29:10 an-c03n01 kernel: [19652.482327] drbd r0: Terminating
drbd_w_r0
Jan 27 20:29:10 an-c03n01 attrd[843]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd_r0 (<null>)
Jan 27 20:29:10 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent
delete 17: node=1, attr=master-drbd_r0, id=<n/a>, set=(null), section=status
Jan 27 20:29:10 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_stop_0 (call=26, rc=0, cib-update=17, confirmed=true) ok
Jan 27 20:29:10 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent
delete 19: node=1, attr=master-drbd_r0, id=<n/a>, set=(null), section=status
====
Disable logs from an-c03n02:
====
Jan 27 20:29:10 an-c03n02 cib[21128]: notice: cib:diff: Diff: --- 0.139.23
Jan 27 20:29:10 an-c03n02 cib[21128]: notice: cib:diff: Diff: +++
0.140.1 ae30c6348ea7b6da2cce70635f3b0a29
Jan 27 20:29:10 an-c03n02 cib[21128]: notice: cib:diff: -- <cib
admin_epoch="0" epoch="139" num_updates="23"/>
Jan 27 20:29:10 an-c03n02 cib[21128]: notice: cib:diff: ++ <nvpair
id="drbd_r0_Clone-meta_attributes-target-role" name="target-role"
value="Stopped"/>
Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: do_state_transition:
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jan 27 20:29:10 an-c03n02 pengine[21132]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Jan 27 20:29:10 an-c03n02 pengine[21132]: notice: LogActions: Demote
drbd_r0:0 (Master -> Stopped an-c03n02.alteeve.ca)
Jan 27 20:29:10 an-c03n02 pengine[21132]: notice: LogActions: Demote
drbd_r0:1 (Master -> Stopped an-c03n01.alteeve.ca)
Jan 27 20:29:10 an-c03n02 pengine[21132]: notice: process_pe_message:
Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-167.bz2
Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 46: notify drbd_r0_pre_notify_demote_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 48: notify drbd_r0_pre_notify_demote_0 on
an-c03n01.alteeve.ca
Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=22, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 11: demote drbd_r0_demote_0 on an-c03n02.alteeve.ca
(local)
Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 13: demote drbd_r0_demote_0 on an-c03n01.alteeve.ca
Jan 27 20:29:10 an-c03n02 kernel: [ 5060.718998] block drbd0: role(
Primary -> Secondary )
Jan 27 20:29:10 an-c03n02 kernel: [ 5060.719041] block drbd0: bitmap
WRITE of 0 pages took 0 jiffies
Jan 27 20:29:10 an-c03n02 kernel: [ 5060.719043] block drbd0: 0 KB (0
bits) marked out-of-sync by on disk bit-map.
Jan 27 20:29:10 an-c03n02 kernel: [ 5060.727041] block drbd0: peer(
Primary -> Secondary )
Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_demote_0 (call=23, rc=0, cib-update=36, confirmed=true) ok
Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 47: notify drbd_r0_post_notify_demote_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 49: notify drbd_r0_post_notify_demote_0 on
an-c03n01.alteeve.ca
Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=24, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 44: notify drbd_r0_pre_notify_stop_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 45: notify drbd_r0_pre_notify_stop_0 on
an-c03n01.alteeve.ca
Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=25, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 12: stop drbd_r0_stop_0 on an-c03n02.alteeve.ca (local)
Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 14: stop drbd_r0_stop_0 on an-c03n01.alteeve.ca
Jan 27 20:29:10 an-c03n02 kernel: [ 5060.835968] drbd r0: peer(
Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate
-> DUnknown )
Jan 27 20:29:10 an-c03n02 kernel: [ 5060.835976] drbd r0: asender terminated
Jan 27 20:29:10 an-c03n02 kernel: [ 5060.835977] drbd r0: Terminating
drbd_a_r0
Jan 27 20:29:10 an-c03n02 kernel: [ 5060.836358] drbd r0: Connection closed
Jan 27 20:29:10 an-c03n02 kernel: [ 5060.836368] drbd r0: conn(
Disconnecting -> StandAlone )
Jan 27 20:29:10 an-c03n02 kernel: [ 5060.836369] drbd r0: receiver
terminated
Jan 27 20:29:10 an-c03n02 kernel: [ 5060.836371] drbd r0: Terminating
drbd_r_r0
Jan 27 20:29:10 an-c03n02 kernel: [ 5060.836435] block drbd0: disk(
UpToDate -> Failed )
Jan 27 20:29:10 an-c03n02 kernel: [ 5060.846158] block drbd0: bitmap
WRITE of 0 pages took 0 jiffies
Jan 27 20:29:10 an-c03n02 kernel: [ 5060.846161] block drbd0: 0 KB (0
bits) marked out-of-sync by on disk bit-map.
Jan 27 20:29:10 an-c03n02 kernel: [ 5060.846165] block drbd0: disk(
Failed -> Diskless )
Jan 27 20:29:10 an-c03n02 kernel: [ 5060.846249] block drbd0:
drbd_bm_resize called with capacity == 0
Jan 27 20:29:10 an-c03n02 kernel: [ 5060.846269] drbd r0: Terminating
drbd_w_r0
Jan 27 20:29:10 an-c03n02 attrd[21131]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd_r0 (<null>)
Jan 27 20:29:10 an-c03n02 attrd[21131]: notice: attrd_perform_update:
Sent delete 19: node=2, attr=master-drbd_r0, id=<n/a>, set=(null),
section=status
Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_stop_0 (call=26, rc=0, cib-update=37, confirmed=true) ok
Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: run_graph: Transition 3
(Complete=22, Pending=0, Fired=0, Skipped=1, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-167.bz2): Stopped
Jan 27 20:29:10 an-c03n02 pengine[21132]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Jan 27 20:29:10 an-c03n02 pengine[21132]: notice: process_pe_message:
Calculated Transition 4: /var/lib/pacemaker/pengine/pe-input-168.bz2
Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: run_graph: Transition 4
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-168.bz2): Complete
Jan 27 20:29:10 an-c03n02 crmd[21133]: notice: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
====
Still looking good. Now here is where things go sideways...
====
[root at an-c03n01 ~]# pcs resource enable drbd_r0_Clone
====
[root at an-c03n02 ~]# pcs status
Cluster name: an-cluster-03
Last updated: Mon Jan 27 20:32:52 2014
Last change: Mon Jan 27 20:32:05 2014 via cibadmin on an-c03n01.alteeve.ca
Stack: corosync
Current DC: an-c03n02.alteeve.ca (2) - partition with quorum
Version: 1.1.10-19.el7-368c726
2 Nodes configured
4 Resources configured
Online: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
Full list of resources:
fence_n01_virsh (stonith:fence_virsh): Started an-c03n01.alteeve.ca
fence_n02_virsh (stonith:fence_virsh): Started an-c03n02.alteeve.ca
Master/Slave Set: drbd_r0_Clone [drbd_r0]
Masters: [ an-c03n02.alteeve.ca ]
Slaves: [ an-c03n01.alteeve.ca ]
Failed actions:
drbd_r0_promote_0 on an-c03n01.alteeve.ca 'unknown error' (1):
call=30, status=complete, last-rc-change='Mon Jan 27 20:32:05 2014',
queued=15187ms, exec=0ms
PCSD Status:
an-c03n01.alteeve.ca:
an-c03n01.alteeve.ca: Online
an-c03n02.alteeve.ca:
an-c03n02.alteeve.ca: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
====
Enable logs from an-c03n01:
====
Jan 27 20:32:05 an-c03n01 kernel: [19827.078454] drbd r0: Starting
worker thread (from drbdsetup [1337])
Jan 27 20:32:05 an-c03n01 kernel: [19827.078587] block drbd0: disk(
Diskless -> Attaching )
Jan 27 20:32:05 an-c03n01 kernel: [19827.078655] drbd r0: Method to
ensure write ordering: drain
Jan 27 20:32:05 an-c03n01 kernel: [19827.078657] block drbd0: max BIO
size = 1048576
Jan 27 20:32:05 an-c03n01 kernel: [19827.078661] block drbd0: Adjusting
my ra_pages to backing device's (32 -> 1024)
Jan 27 20:32:05 an-c03n01 kernel: [19827.078664] block drbd0:
drbd_bm_resize called with capacity == 41937592
Jan 27 20:32:05 an-c03n01 kernel: [19827.078732] block drbd0: resync
bitmap: bits=5242199 words=81910 pages=160
Jan 27 20:32:05 an-c03n01 kernel: [19827.078734] block drbd0: size = 20
GB (20968796 KB)
Jan 27 20:32:05 an-c03n01 kernel: [19827.080475] block drbd0: bitmap
READ of 160 pages took 2 jiffies
Jan 27 20:32:05 an-c03n01 kernel: [19827.080566] block drbd0: recounting
of set bits took additional 0 jiffies
Jan 27 20:32:05 an-c03n01 kernel: [19827.080568] block drbd0: 0 KB (0
bits) marked out-of-sync by on disk bit-map.
Jan 27 20:32:05 an-c03n01 kernel: [19827.080575] block drbd0: disk(
Attaching -> Consistent )
Jan 27 20:32:05 an-c03n01 kernel: [19827.080577] block drbd0: attached
to UUIDs AA966D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D
Jan 27 20:32:05 an-c03n01 kernel: [19827.086606] drbd r0: conn(
StandAlone -> Unconnected )
Jan 27 20:32:05 an-c03n01 kernel: [19827.086663] drbd r0: Starting
receiver thread (from drbd_w_r0 [1338])
Jan 27 20:32:05 an-c03n01 kernel: [19827.086677] drbd r0: receiver
(re)started
Jan 27 20:32:05 an-c03n01 kernel: [19827.086682] drbd r0: conn(
Unconnected -> WFConnection )
Jan 27 20:32:05 an-c03n01 attrd[843]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd_r0 (5)
Jan 27 20:32:05 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent
update 23: master-drbd_r0=5
Jan 27 20:32:05 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_start_0 (call=27, rc=0, cib-update=18, confirmed=true) ok
Jan 27 20:32:05 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=28, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:32:05 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=29, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:32:05 an-c03n01 kernel: [19827.235110] drbd r0: helper
command: /sbin/drbdadm fence-peer r0
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: invoked for r0
Jan 27 20:32:05 an-c03n01 crmd[845]: notice: handle_request: Current
ping state: S_NOT_DC
Jan 27 20:32:05 an-c03n01 cibadmin[1469]: notice: crm_log_args: Invoked:
cibadmin -C -o constraints -X <rsc_location rsc="drbd_r0_Clone"
id="drbd-fence-by-handler-r0-drbd_r0_Clone">
<rule role="Master" score="-INFINITY"
id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone">
<expression attribute="#uname" operation="ne"
value="an-c03n01.alteeve.ca"
id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/>
</rule>
</rsc_location>
Jan 27 20:32:05 an-c03n01 stonith-ng[841]: notice: unpack_config: On
loss of CCM Quorum: Ignore
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: Call cib_create
failed (-76): Name not unique on network
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: <failed>
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: <failed_update
id="drbd-fence-by-handler-r0-drbd_r0_Clone" object_type="rsc_location"
operation="cib_create" reason="Name not unique on network">
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: <rsc_location
rsc="drbd_r0_Clone" id="drbd-fence-by-handler-r0-drbd_r0_Clone">
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: <rule role="Master"
score="-INFINITY" id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone">
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: <expression
attribute="#uname" operation="ne" value="an-c03n01.alteeve.ca"
id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/>
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: </rule>
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: </rsc_location>
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: </failed_update>
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: </failed>
Jan 27 20:32:05 an-c03n01 kernel: [19827.302587] drbd r0: helper
command: /sbin/drbdadm fence-peer r0 exit code 1 (0x100)
Jan 27 20:32:05 an-c03n01 kernel: [19827.302590] drbd r0: fence-peer
helper broken, returned 1
Jan 27 20:32:05 an-c03n01 kernel: [19827.302607] drbd r0: helper
command: /sbin/drbdadm fence-peer r0
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1437]: WARNING DATA
INTEGRITY at RISK: could not place the fencing constraint!
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1484]: invoked for r0
Jan 27 20:32:05 an-c03n01 stonith-ng[841]: notice:
stonith_device_register: Device 'fence_n01_virsh' already existed in
device list (2 active devices)
Jan 27 20:32:05 an-c03n01 kernel: [19827.328528] drbd r0: helper
command: /sbin/drbdadm fence-peer r0 exit code 1 (0x100)
Jan 27 20:32:05 an-c03n01 kernel: [19827.328532] drbd r0: fence-peer
helper broken, returned 1
Jan 27 20:32:05 an-c03n01 kernel: [19827.328553] drbd r0: helper
command: /sbin/drbdadm fence-peer r0
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1513]: invoked for r0
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1484]: WARNING constraint
<expression attribute="#uname" <expression operation="ne" <expression
value="an-c03n02.alteeve.ca" <rsc_location rsc="drbd_r0_Clone" <rule
role="Master" <rule score="-INFINITY" already exists
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1484]: WARNING DATA
INTEGRITY at RISK: could not place the fencing constraint!
Jan 27 20:32:05 an-c03n01 kernel: [19827.359166] drbd r0: helper
command: /sbin/drbdadm fence-peer r0 exit code 1 (0x100)
Jan 27 20:32:05 an-c03n01 kernel: [19827.359170] drbd r0: fence-peer
helper broken, returned 1
Jan 27 20:32:05 an-c03n01 kernel: [19827.359193] drbd r0: helper
command: /sbin/drbdadm fence-peer r0
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1513]: WARNING constraint
<expression attribute="#uname" <expression operation="ne" <expression
value="an-c03n02.alteeve.ca" <rsc_location rsc="drbd_r0_Clone" <rule
role="Master" <rule score="-INFINITY" already exists
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1513]: WARNING DATA
INTEGRITY at RISK: could not place the fencing constraint!
Jan 27 20:32:05 an-c03n01 stonith-ng[841]: notice:
stonith_device_register: Added 'fence_n02_virsh' to the device list (2
active devices)
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1541]: invoked for r0
Jan 27 20:32:05 an-c03n01 kernel: [19827.379932] drbd r0: helper
command: /sbin/drbdadm fence-peer r0 exit code 1 (0x100)
Jan 27 20:32:05 an-c03n01 kernel: [19827.379935] drbd r0: fence-peer
helper broken, returned 1
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1541]: WARNING constraint
<expression attribute="#uname" <expression operation="ne" <expression
value="an-c03n02.alteeve.ca" <rsc_location rsc="drbd_r0_Clone" <rule
role="Master" <rule score="-INFINITY" already exists
Jan 27 20:32:05 an-c03n01 crm-fence-peer.sh[1541]: WARNING DATA
INTEGRITY at RISK: could not place the fencing constraint!
Jan 27 20:32:05 an-c03n01 drbd(drbd_r0)[1408]: ERROR: r0: Called drbdadm
-c /etc/drbd.conf primary r0
Jan 27 20:32:05 an-c03n01 drbd(drbd_r0)[1408]: ERROR: r0: Exit code 17
Jan 27 20:32:05 an-c03n01 drbd(drbd_r0)[1408]: ERROR: r0: Command output:
Jan 27 20:32:05 an-c03n01 drbd(drbd_r0)[1408]: CRIT: Refusing to be
promoted to Primary without UpToDate data
Jan 27 20:32:05 an-c03n01 drbd(drbd_r0)[1408]: WARNING: promotion
failed; sleep 15 # to prevent tight recovery loop
Jan 27 20:32:05 an-c03n01 kernel: [19827.597081] drbd r0: Handshake
successful: Agreed network protocol version 101
Jan 27 20:32:05 an-c03n01 kernel: [19827.597084] drbd r0: Agreed to
support TRIM on protocol level
Jan 27 20:32:05 an-c03n01 kernel: [19827.597142] drbd r0: conn(
WFConnection -> WFReportParams )
Jan 27 20:32:05 an-c03n01 kernel: [19827.597145] drbd r0: Starting
asender thread (from drbd_r_r0 [1347])
Jan 27 20:32:05 an-c03n01 kernel: [19827.606053] block drbd0:
drbd_sync_handshake:
Jan 27 20:32:05 an-c03n01 kernel: [19827.606057] block drbd0: self
AA966D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D
bits:0 flags:0
Jan 27 20:32:05 an-c03n01 kernel: [19827.606058] block drbd0: peer
853E72BBF0C9260D:AA966D5345E69DAA:4F366962CD263E3C:4F356962CD263E3D
bits:0 flags:0
Jan 27 20:32:05 an-c03n01 kernel: [19827.606060] block drbd0:
uuid_compare()=-1 by rule 50
Jan 27 20:32:05 an-c03n01 kernel: [19827.606065] block drbd0: peer(
Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk(
Consistent -> Outdated ) pdsk( DUnknown -> UpToDate )
Jan 27 20:32:05 an-c03n01 kernel: [19827.606296] block drbd0: receive
bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23;
compression: 100.0%
Jan 27 20:32:05 an-c03n01 kernel: [19827.606388] block drbd0: send
bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23;
compression: 100.0%
Jan 27 20:32:05 an-c03n01 kernel: [19827.606391] block drbd0: conn(
WFBitMapT -> WFSyncUUID )
Jan 27 20:32:05 an-c03n01 kernel: [19827.607961] block drbd0: updated
sync uuid
AA976D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D
Jan 27 20:32:05 an-c03n01 kernel: [19827.608137] block drbd0: helper
command: /sbin/drbdadm before-resync-target minor-0
Jan 27 20:32:05 an-c03n01 kernel: [19827.609229] block drbd0: helper
command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
Jan 27 20:32:05 an-c03n01 kernel: [19827.609243] block drbd0: conn(
WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent )
Jan 27 20:32:05 an-c03n01 kernel: [19827.609251] block drbd0: Began
resync as SyncTarget (will sync 0 KB [0 bits set]).
Jan 27 20:32:05 an-c03n01 kernel: [19827.610184] block drbd0: Resync
done (total 1 sec; paused 0 sec; 0 K/sec)
Jan 27 20:32:05 an-c03n01 kernel: [19827.610188] block drbd0: updated
UUIDs 853E72BBF0C9260C:0000000000000000:AA976D5345E69DAA:AA966D5345E69DAA
Jan 27 20:32:05 an-c03n01 kernel: [19827.610191] block drbd0: conn(
SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
Jan 27 20:32:05 an-c03n01 kernel: [19827.610627] block drbd0: helper
command: /sbin/drbdadm after-resync-target minor-0
Jan 27 20:32:05 an-c03n01 crm-unfence-peer.sh[1589]: invoked for r0
Jan 27 20:32:05 an-c03n01 cibadmin[1603]: notice: crm_log_args: Invoked:
cibadmin -D -X <rsc_location rsc="drbd_r0_Clone"
id="drbd-fence-by-handler-r0-drbd_r0_Clone"/>
Jan 27 20:32:05 an-c03n01 stonith-ng[841]: notice: unpack_config: On
loss of CCM Quorum: Ignore
Jan 27 20:32:05 an-c03n01 kernel: [19827.637304] block drbd0: helper
command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0)
Jan 27 20:32:05 an-c03n01 stonith-ng[841]: notice:
stonith_device_register: Device 'fence_n01_virsh' already existed in
device list (2 active devices)
Jan 27 20:32:05 an-c03n01 stonith-ng[841]: notice:
stonith_device_register: Added 'fence_n02_virsh' to the device list (2
active devices)
Jan 27 20:32:20 an-c03n01 lrmd[842]: notice: operation_finished:
drbd_r0_promote_0:1408:stderr [ 0: State change failed: (-2) Need access
to UpToDate data ]
Jan 27 20:32:20 an-c03n01 lrmd[842]: notice: operation_finished:
drbd_r0_promote_0:1408:stderr [ Command 'drbdsetup primary 0' terminated
with exit code 17 ]
Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_promote_0 (call=30, rc=1, cib-update=19,
confirmed=true) unknown error
Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event:
an-c03n01.alteeve.ca-drbd_r0_promote_0:30 [ \n ]
Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_cs_dispatch: Update
relayed from an-c03n02.alteeve.ca
Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_trigger_update:
Sending flush op to all hosts for: fail-count-drbd_r0 (1)
Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent
update 28: fail-count-drbd_r0=1
Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_cs_dispatch: Update
relayed from an-c03n02.alteeve.ca
Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_trigger_update:
Sending flush op to all hosts for: last-failure-drbd_r0 (1390872740)
Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent
update 31: last-failure-drbd_r0=1390872740
Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_cs_dispatch: Update
relayed from an-c03n02.alteeve.ca
Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_trigger_update:
Sending flush op to all hosts for: fail-count-drbd_r0 (2)
Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent
update 34: fail-count-drbd_r0=2
Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_cs_dispatch: Update
relayed from an-c03n02.alteeve.ca
Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_trigger_update:
Sending flush op to all hosts for: last-failure-drbd_r0 (1390872740)
Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent
update 37: last-failure-drbd_r0=1390872740
Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd_r0 (10000)
Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent
update 39: master-drbd_r0=10000
Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=31, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=32, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_demote_0 (call=33, rc=0, cib-update=20, confirmed=true) ok
Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=34, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=35, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:32:20 an-c03n01 kernel: [19842.604453] drbd r0: Requested
state change failed by peer: Refusing to be Primary while peer is not
outdated (-7)
Jan 27 20:32:20 an-c03n01 kernel: [19842.605419] drbd r0: peer( Primary
-> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate ->
Outdated ) pdsk( UpToDate -> DUnknown )
Jan 27 20:32:20 an-c03n01 kernel: [19842.605458] drbd r0: asender terminated
Jan 27 20:32:20 an-c03n01 kernel: [19842.605460] drbd r0: Terminating
drbd_a_r0
Jan 27 20:32:20 an-c03n01 kernel: [19842.605841] drbd r0: Connection closed
Jan 27 20:32:20 an-c03n01 kernel: [19842.605849] drbd r0: conn(
Disconnecting -> StandAlone )
Jan 27 20:32:20 an-c03n01 kernel: [19842.605850] drbd r0: receiver
terminated
Jan 27 20:32:20 an-c03n01 kernel: [19842.605860] drbd r0: Terminating
drbd_r_r0
Jan 27 20:32:20 an-c03n01 kernel: [19842.605885] block drbd0: disk(
Outdated -> Failed )
Jan 27 20:32:20 an-c03n01 kernel: [19842.617080] block drbd0: bitmap
WRITE of 0 pages took 0 jiffies
Jan 27 20:32:20 an-c03n01 kernel: [19842.617085] block drbd0: 0 KB (0
bits) marked out-of-sync by on disk bit-map.
Jan 27 20:32:20 an-c03n01 kernel: [19842.617103] block drbd0: disk(
Failed -> Diskless )
Jan 27 20:32:20 an-c03n01 kernel: [19842.617174] block drbd0:
drbd_bm_resize called with capacity == 0
Jan 27 20:32:20 an-c03n01 kernel: [19842.617202] drbd r0: Terminating
drbd_w_r0
Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd_r0 (<null>)
Jan 27 20:32:20 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_stop_0 (call=36, rc=0, cib-update=21, confirmed=true) ok
Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent
delete 43: node=1, attr=master-drbd_r0, id=<n/a>, set=(null), section=status
Jan 27 20:32:20 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent
delete 45: node=1, attr=master-drbd_r0, id=<n/a>, set=(null), section=status
Jan 27 20:32:21 an-c03n01 kernel: [19842.840388] drbd r0: Starting
worker thread (from drbdsetup [1818])
Jan 27 20:32:21 an-c03n01 kernel: [19842.840614] block drbd0: disk(
Diskless -> Attaching )
Jan 27 20:32:21 an-c03n01 kernel: [19842.840687] drbd r0: Method to
ensure write ordering: drain
Jan 27 20:32:21 an-c03n01 kernel: [19842.840689] block drbd0: max BIO
size = 1048576
Jan 27 20:32:21 an-c03n01 kernel: [19842.840692] block drbd0: Adjusting
my ra_pages to backing device's (32 -> 1024)
Jan 27 20:32:21 an-c03n01 kernel: [19842.840694] block drbd0:
drbd_bm_resize called with capacity == 41937592
Jan 27 20:32:21 an-c03n01 kernel: [19842.840770] block drbd0: resync
bitmap: bits=5242199 words=81910 pages=160
Jan 27 20:32:21 an-c03n01 kernel: [19842.840772] block drbd0: size = 20
GB (20968796 KB)
Jan 27 20:32:21 an-c03n01 kernel: [19842.850197] block drbd0: bitmap
READ of 160 pages took 10 jiffies
Jan 27 20:32:21 an-c03n01 kernel: [19842.850288] block drbd0: recounting
of set bits took additional 0 jiffies
Jan 27 20:32:21 an-c03n01 kernel: [19842.850290] block drbd0: 0 KB (0
bits) marked out-of-sync by on disk bit-map.
Jan 27 20:32:21 an-c03n01 kernel: [19842.850295] block drbd0: disk(
Attaching -> Outdated )
Jan 27 20:32:21 an-c03n01 kernel: [19842.850297] block drbd0: attached
to UUIDs 853E72BBF0C9260C:0000000000000000:AA976D5345E69DAA:AA966D5345E69DAA
Jan 27 20:32:21 an-c03n01 kernel: [19842.856274] drbd r0: conn(
StandAlone -> Unconnected )
Jan 27 20:32:21 an-c03n01 kernel: [19842.856311] drbd r0: Starting
receiver thread (from drbd_w_r0 [1819])
Jan 27 20:32:21 an-c03n01 kernel: [19842.856332] drbd r0: receiver
(re)started
Jan 27 20:32:21 an-c03n01 kernel: [19842.856340] drbd r0: conn(
Unconnected -> WFConnection )
Jan 27 20:32:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_start_0 (call=37, rc=0, cib-update=22, confirmed=true) ok
Jan 27 20:32:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=38, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:32:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_monitor_60000 (call=39, rc=0, cib-update=23,
confirmed=false) ok
Jan 27 20:32:21 an-c03n01 kernel: [19843.356430] drbd r0: Handshake
successful: Agreed network protocol version 101
Jan 27 20:32:21 an-c03n01 kernel: [19843.356432] drbd r0: Agreed to
support TRIM on protocol level
Jan 27 20:32:21 an-c03n01 kernel: [19843.356473] drbd r0: conn(
WFConnection -> WFReportParams )
Jan 27 20:32:21 an-c03n01 kernel: [19843.356475] drbd r0: Starting
asender thread (from drbd_r_r0 [1829])
Jan 27 20:32:21 an-c03n01 kernel: [19843.362052] block drbd0:
drbd_sync_handshake:
Jan 27 20:32:21 an-c03n01 kernel: [19843.362056] block drbd0: self
853E72BBF0C9260C:0000000000000000:AA976D5345E69DAA:AA966D5345E69DAA
bits:0 flags:0
Jan 27 20:32:21 an-c03n01 kernel: [19843.362057] block drbd0: peer
FD6969A6E17CBA41:853E72BBF0C9260D:AA976D5345E69DAA:AA966D5345E69DAA
bits:0 flags:0
Jan 27 20:32:21 an-c03n01 kernel: [19843.362059] block drbd0:
uuid_compare()=-1 by rule 50
Jan 27 20:32:21 an-c03n01 kernel: [19843.362063] block drbd0: peer(
Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown
-> UpToDate )
Jan 27 20:32:21 an-c03n01 kernel: [19843.365473] block drbd0: receive
bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23;
compression: 100.0%
Jan 27 20:32:21 an-c03n01 kernel: [19843.365579] block drbd0: send
bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23;
compression: 100.0%
Jan 27 20:32:21 an-c03n01 kernel: [19843.365583] block drbd0: conn(
WFBitMapT -> WFSyncUUID )
Jan 27 20:32:21 an-c03n01 kernel: [19843.367483] block drbd0: updated
sync uuid
853F72BBF0C9260C:0000000000000000:AA976D5345E69DAA:AA966D5345E69DAA
Jan 27 20:32:21 an-c03n01 kernel: [19843.367693] block drbd0: helper
command: /sbin/drbdadm before-resync-target minor-0
Jan 27 20:32:21 an-c03n01 kernel: [19843.368877] block drbd0: helper
command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
Jan 27 20:32:21 an-c03n01 kernel: [19843.368892] block drbd0: conn(
WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent )
Jan 27 20:32:21 an-c03n01 kernel: [19843.368899] block drbd0: Began
resync as SyncTarget (will sync 0 KB [0 bits set]).
Jan 27 20:32:21 an-c03n01 kernel: [19843.369304] block drbd0: Resync
done (total 1 sec; paused 0 sec; 0 K/sec)
Jan 27 20:32:21 an-c03n01 kernel: [19843.369309] block drbd0: updated
UUIDs FD6969A6E17CBA40:0000000000000000:853F72BBF0C9260C:853E72BBF0C9260D
Jan 27 20:32:21 an-c03n01 kernel: [19843.369313] block drbd0: conn(
SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
Jan 27 20:32:21 an-c03n01 kernel: [19843.369433] block drbd0: helper
command: /sbin/drbdadm after-resync-target minor-0
Jan 27 20:32:21 an-c03n01 crm-unfence-peer.sh[1900]: invoked for r0
Jan 27 20:32:21 an-c03n01 kernel: [19843.384987] block drbd0: helper
command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0)
====
Enable logs from an-c03n02:
====
Jan 27 20:32:04 an-c03n02 cib[21128]: notice: cib:diff: Diff: --- 0.140.7
Jan 27 20:32:04 an-c03n02 cib[21128]: notice: cib:diff: Diff: +++
0.141.1 fcc6dc293b799186774cfb583055eb9f
Jan 27 20:32:04 an-c03n02 cib[21128]: notice: cib:diff: -- <nvpair
id="drbd_r0_Clone-meta_attributes-target-role" name="target-role"
value="Stopped"/>
Jan 27 20:32:04 an-c03n02 cib[21128]: notice: cib:diff: ++ <cib
admin_epoch="0" cib-last-written="Mon Jan 27 20:32:04 2014"
crm_feature_set="3.0.7" epoch="141" have-quorum="1" num_updates="1"
update-client="crm_resource" update-origin="an-c03n01.alteeve.ca"
validate-with="pacemaker-1.2" dc-uuid="2"/>
Jan 27 20:32:04 an-c03n02 crmd[21133]: notice: do_state_transition:
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jan 27 20:32:04 an-c03n02 pengine[21132]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Jan 27 20:32:04 an-c03n02 pengine[21132]: notice: LogActions: Start
drbd_r0:0 (an-c03n01.alteeve.ca)
Jan 27 20:32:04 an-c03n02 pengine[21132]: notice: LogActions: Start
drbd_r0:1 (an-c03n02.alteeve.ca)
Jan 27 20:32:04 an-c03n02 pengine[21132]: notice: process_pe_message:
Calculated Transition 5: /var/lib/pacemaker/pengine/pe-input-169.bz2
Jan 27 20:32:04 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 11: start drbd_r0_start_0 on an-c03n01.alteeve.ca
Jan 27 20:32:04 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 13: start drbd_r0:1_start_0 on an-c03n02.alteeve.ca
(local)
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.451554] drbd r0: Starting
worker thread (from drbdsetup [21714])
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452326] block drbd0: disk(
Diskless -> Attaching )
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452402] drbd r0: Method to
ensure write ordering: drain
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452404] block drbd0: max BIO
size = 1048576
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452407] block drbd0: Adjusting
my ra_pages to backing device's (32 -> 1024)
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452409] block drbd0:
drbd_bm_resize called with capacity == 41937592
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452467] block drbd0: resync
bitmap: bits=5242199 words=81910 pages=160
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.452469] block drbd0: size = 20
GB (20968796 KB)
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.453954] block drbd0: bitmap
READ of 160 pages took 1 jiffies
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.454067] block drbd0: recounting
of set bits took additional 1 jiffies
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.454069] block drbd0: 0 KB (0
bits) marked out-of-sync by on disk bit-map.
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.454073] block drbd0: disk(
Attaching -> Consistent )
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.454076] block drbd0: attached
to UUIDs AA966D5345E69DAA:0000000000000000:4F366962CD263E3C:4F356962CD263E3D
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.460539] drbd r0: conn(
StandAlone -> Unconnected )
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.460598] drbd r0: Starting
receiver thread (from drbd_w_r0 [21715])
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.461937] drbd r0: receiver
(re)started
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.461957] drbd r0: conn(
Unconnected -> WFConnection )
Jan 27 20:32:05 an-c03n02 attrd[21131]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd_r0 (5)
Jan 27 20:32:05 an-c03n02 attrd[21131]: notice: attrd_perform_update:
Sent update 24: master-drbd_r0=5
Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_start_0 (call=27, rc=0, cib-update=40, confirmed=true) ok
Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 44: notify drbd_r0_post_notify_start_0 on
an-c03n01.alteeve.ca
Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 45: notify drbd_r0:1_post_notify_start_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=28, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: run_graph: Transition 5
(Complete=10, Pending=0, Fired=0, Skipped=2, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-169.bz2): Stopped
Jan 27 20:32:05 an-c03n02 pengine[21132]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Jan 27 20:32:05 an-c03n02 pengine[21132]: notice: LogActions: Promote
drbd_r0:0 (Slave -> Master an-c03n02.alteeve.ca)
Jan 27 20:32:05 an-c03n02 pengine[21132]: notice: LogActions: Promote
drbd_r0:1 (Slave -> Master an-c03n01.alteeve.ca)
Jan 27 20:32:05 an-c03n02 pengine[21132]: notice: process_pe_message:
Calculated Transition 6: /var/lib/pacemaker/pengine/pe-input-170.bz2
Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 52: notify drbd_r0_pre_notify_promote_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 54: notify drbd_r0_pre_notify_promote_0 on
an-c03n01.alteeve.ca
Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=29, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 13: promote drbd_r0_promote_0 on an-c03n02.alteeve.ca
(local)
Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 16: promote drbd_r0_promote_0 on an-c03n01.alteeve.ca
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.599706] drbd r0: helper
command: /sbin/drbdadm fence-peer r0
Jan 27 20:32:05 an-c03n02 crm-fence-peer.sh[21814]: invoked for r0
Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: handle_request: Current
ping state: S_TRANSITION_ENGINE
Jan 27 20:32:05 an-c03n02 cibadmin[21846]: notice: crm_log_args:
Invoked: cibadmin -C -o constraints -X <rsc_location rsc="drbd_r0_Clone"
id="drbd-fence-by-handler-r0-drbd_r0_Clone">
<rule role="Master" score="-INFINITY"
id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone">
<expression attribute="#uname" operation="ne"
value="an-c03n02.alteeve.ca"
id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/>
</rule>
</rsc_location>
Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: Diff: --- 0.141.5
Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: Diff: +++
0.142.1 c0646876db9897523b58236bb6890452
Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: -- <cib
admin_epoch="0" epoch="141" num_updates="5"/>
Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: ++ <rsc_location
rsc="drbd_r0_Clone" id="drbd-fence-by-handler-r0-drbd_r0_Clone">
Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: ++ <rule
role="Master" score="-INFINITY"
id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone">
Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: ++ <expression
attribute="#uname" operation="ne" value="an-c03n02.alteeve.ca"
id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/>
Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: ++ </rule>
Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: ++ </rsc_location>
Jan 27 20:32:05 an-c03n02 stonith-ng[21129]: notice: unpack_config: On
loss of CCM Quorum: Ignore
Jan 27 20:32:05 an-c03n02 crm-fence-peer.sh[21814]: INFO peer is
reachable, my disk is Consistent: placed constraint
'drbd-fence-by-handler-r0-drbd_r0_Clone'
Jan 27 20:32:05 an-c03n02 cib[21128]: warning: update_results: Action
cib_create failed: Name not unique on network (cde=-76)
Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB
Update failures <failed>
Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB
Update failures <failed_update
id="drbd-fence-by-handler-r0-drbd_r0_Clone" object_type="rsc_location"
operation="cib_create" reason="Name not unique on network">
Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB
Update failures <rsc_location rsc="drbd_r0_Clone"
id="drbd-fence-by-handler-r0-drbd_r0_Clone">
Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB
Update failures <rule role="Master" score="-INFINITY"
id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone">
Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB
Update failures <expression attribute="#uname" operation="ne"
value="an-c03n01.alteeve.ca"
id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/>
Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB
Update failures </rule>
Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB
Update failures </rsc_location>
Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB
Update failures </failed_update>
Jan 27 20:32:05 an-c03n02 cib[21128]: error: cib_process_create: CIB
Update failures </failed>
Jan 27 20:32:05 an-c03n02 cib[21128]: warning: cib_process_request:
Completed cib_create operation for section constraints: Name not unique
on network (rc=-76, origin=an-c03n01.alteeve.ca/cibadmin/2, version=0.142.1)
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.651646] drbd r0: helper
command: /sbin/drbdadm fence-peer r0 exit code 4 (0x400)
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.651650] drbd r0: fence-peer
helper returned 4 (peer was fenced)
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.651660] drbd r0: pdsk( DUnknown
-> Outdated )
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.651666] block drbd0: role(
Secondary -> Primary ) disk( Consistent -> UpToDate )
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.651876] block drbd0: new
current UUID
853E72BBF0C9260D:AA966D5345E69DAA:4F366962CD263E3C:4F356962CD263E3D
Jan 27 20:32:05 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_promote_0 (call=30, rc=0, cib-update=42,
confirmed=true) ok
Jan 27 20:32:05 an-c03n02 stonith-ng[21129]: notice:
stonith_device_register: Added 'fence_n01_virsh' to the device list (2
active devices)
Jan 27 20:32:05 an-c03n02 stonith-ng[21129]: notice:
stonith_device_register: Device 'fence_n02_virsh' already existed in
device list (2 active devices)
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.962021] drbd r0: Handshake
successful: Agreed network protocol version 101
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.962023] drbd r0: Agreed to
support TRIM on protocol level
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.962069] drbd r0: conn(
WFConnection -> WFReportParams )
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.962072] drbd r0: Starting
asender thread (from drbd_r_r0 [21724])
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.968085] block drbd0:
drbd_sync_handshake:
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.968090] block drbd0: self
853E72BBF0C9260D:AA966D5345E69DAA:4F366962CD263E3C:4F356962CD263E3D
bits:0 flags:0
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.968092] block drbd0: peer
AA966D5345E69DAA:0000000000000000:4F366962CD263E3D:4F356962CD263E3D
bits:0 flags:0
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.968094] block drbd0:
uuid_compare()=1 by rule 70
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.968100] block drbd0: peer(
Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk(
Outdated -> Consistent )
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.968256] block drbd0: send
bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23;
compression: 100.0%
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.971293] block drbd0: receive
bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23;
compression: 100.0%
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.971299] block drbd0: helper
command: /sbin/drbdadm before-resync-source minor-0
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.972381] block drbd0: helper
command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0)
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.972395] block drbd0: conn(
WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent )
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.972402] block drbd0: Began
resync as SyncSource (will sync 0 KB [0 bits set]).
Jan 27 20:32:05 an-c03n02 kernel: [ 5235.972433] block drbd0: updated
sync UUID
853E72BBF0C9260D:AA976D5345E69DAA:AA966D5345E69DAA:4F366962CD263E3C
Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: Diff: --- 0.142.2
Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: Diff: +++
0.143.1 fbd603d69e81ccfe94726267b74d5322
Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: -- <rsc_location
rsc="drbd_r0_Clone" id="drbd-fence-by-handler-r0-drbd_r0_Clone">
Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: -- <rule
role="Master" score="-INFINITY"
id="drbd-fence-by-handler-r0-rule-drbd_r0_Clone">
Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: -- <expression
attribute="#uname" operation="ne" value="an-c03n02.alteeve.ca"
id="drbd-fence-by-handler-r0-expr-drbd_r0_Clone"/>
Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: -- </rule>
Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: -- </rsc_location>
Jan 27 20:32:05 an-c03n02 cib[21128]: notice: cib:diff: ++ <cib
admin_epoch="0" cib-last-written="Mon Jan 27 20:32:05 2014"
crm_feature_set="3.0.7" epoch="143" have-quorum="1" num_updates="1"
update-client="cibadmin" update-origin="an-c03n01.alteeve.ca"
validate-with="pacemaker-1.2" dc-uuid="2"/>
Jan 27 20:32:05 an-c03n02 stonith-ng[21129]: notice: unpack_config: On
loss of CCM Quorum: Ignore
Jan 27 20:32:05 an-c03n02 kernel: [ 5236.007605] block drbd0: Resync
done (total 1 sec; paused 0 sec; 0 K/sec)
Jan 27 20:32:05 an-c03n02 kernel: [ 5236.007612] block drbd0: updated
UUIDs 853E72BBF0C9260D:0000000000000000:AA976D5345E69DAA:AA966D5345E69DAA
Jan 27 20:32:05 an-c03n02 kernel: [ 5236.007618] block drbd0: conn(
SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
Jan 27 20:32:05 an-c03n02 stonith-ng[21129]: notice:
stonith_device_register: Added 'fence_n01_virsh' to the device list (2
active devices)
Jan 27 20:32:05 an-c03n02 stonith-ng[21129]: notice:
stonith_device_register: Device 'fence_n02_virsh' already existed in
device list (2 active devices)
Jan 27 20:32:20 an-c03n02 crmd[21133]: warning: status_from_rc: Action
16 (drbd_r0_promote_0) on an-c03n01.alteeve.ca failed (target: 0 vs. rc:
1): Error
Jan 27 20:32:20 an-c03n02 crmd[21133]: warning: update_failcount:
Updating failcount for drbd_r0 on an-c03n01.alteeve.ca after failed
promote: rc=1 (update=value++, time=1390872740)
Jan 27 20:32:20 an-c03n02 crmd[21133]: warning: update_failcount:
Updating failcount for drbd_r0 on an-c03n01.alteeve.ca after failed
promote: rc=1 (update=value++, time=1390872740)
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 53: notify drbd_r0_post_notify_promote_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 55: notify drbd_r0_post_notify_promote_0 on
an-c03n01.alteeve.ca
Jan 27 20:32:20 an-c03n02 attrd[21131]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd_r0 (10000)
Jan 27 20:32:20 an-c03n02 attrd[21131]: notice: attrd_perform_update:
Sent update 32: master-drbd_r0=10000
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=31, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: run_graph: Transition 6
(Complete=12, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-170.bz2): Complete
Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Jan 27 20:32:20 an-c03n02 pengine[21132]: warning: unpack_rsc_op:
Processing failed op promote for drbd_r0:1 on an-c03n01.alteeve.ca:
unknown error (1)
Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: LogActions: Demote
drbd_r0:1 (Master -> Slave an-c03n01.alteeve.ca)
Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: LogActions: Recover
drbd_r0:1 (Master an-c03n01.alteeve.ca)
Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: process_pe_message:
Calculated Transition 7: /var/lib/pacemaker/pengine/pe-input-171.bz2
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 55: notify drbd_r0_pre_notify_demote_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 57: notify drbd_r0_pre_notify_demote_0 on
an-c03n01.alteeve.ca
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=32, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 16: demote drbd_r0_demote_0 on an-c03n01.alteeve.ca
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 56: notify drbd_r0_post_notify_demote_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 58: notify drbd_r0_post_notify_demote_0 on
an-c03n01.alteeve.ca
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=33, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 48: notify drbd_r0_pre_notify_stop_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 50: notify drbd_r0_pre_notify_stop_0 on
an-c03n01.alteeve.ca
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=34, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 3: stop drbd_r0_stop_0 on an-c03n01.alteeve.ca
Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969170] block drbd0: State
change failed: Refusing to be Primary while peer is not outdated
Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969190] block drbd0: state =
{ cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969196] block drbd0: wanted =
{ cs:TearDown ro:Primary/Unknown ds:UpToDate/DUnknown s---F- }
Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969201] drbd r0: State change
failed: Refusing to be Primary while peer is not outdated
Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969205] drbd r0: mask = 0x1f0
val = 0x70
Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969218] drbd r0:
old_conn:WFReportParams wanted_conn:TearDown
Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969396] drbd r0: peer(
Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate ->
Outdated )
Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969407] drbd r0: asender terminated
Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969408] drbd r0: Terminating
drbd_a_r0
Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969457] block drbd0: new
current UUID
FD6969A6E17CBA41:853E72BBF0C9260D:AA976D5345E69DAA:AA966D5345E69DAA
Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969708] drbd r0: Connection closed
Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969717] drbd r0: conn( TearDown
-> Unconnected )
Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969718] drbd r0: receiver
terminated
Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969719] drbd r0: Restarting
receiver thread
Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969720] drbd r0: receiver
(re)started
Jan 27 20:32:20 an-c03n02 kernel: [ 5250.969725] drbd r0: conn(
Unconnected -> WFConnection )
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 49: notify drbd_r0_post_notify_stop_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=35, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: run_graph: Transition 7
(Complete=21, Pending=0, Fired=0, Skipped=7, Incomplete=5,
Source=/var/lib/pacemaker/pengine/pe-input-171.bz2): Stopped
Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Jan 27 20:32:20 an-c03n02 pengine[21132]: warning: unpack_rsc_op:
Processing failed op promote for drbd_r0:1 on an-c03n01.alteeve.ca:
unknown error (1)
Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: LogActions: Start
drbd_r0:1 (an-c03n01.alteeve.ca)
Jan 27 20:32:20 an-c03n02 pengine[21132]: notice: process_pe_message:
Calculated Transition 8: /var/lib/pacemaker/pengine/pe-input-172.bz2
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 44: notify drbd_r0_pre_notify_start_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=36, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:32:20 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 15: start drbd_r0_start_0 on an-c03n01.alteeve.ca
Jan 27 20:32:21 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 45: notify drbd_r0_post_notify_start_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:32:21 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 46: notify drbd_r0_post_notify_start_0 on
an-c03n01.alteeve.ca
Jan 27 20:32:21 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=37, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:32:21 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 16: monitor drbd_r0_monitor_60000 on an-c03n01.alteeve.ca
Jan 27 20:32:21 an-c03n02 crmd[21133]: notice: run_graph: Transition 8
(Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-172.bz2): Complete
Jan 27 20:32:21 an-c03n02 crmd[21133]: notice: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 27 20:32:21 an-c03n02 kernel: [ 5251.721118] drbd r0: Handshake
successful: Agreed network protocol version 101
Jan 27 20:32:21 an-c03n02 kernel: [ 5251.721120] drbd r0: Agreed to
support TRIM on protocol level
Jan 27 20:32:21 an-c03n02 kernel: [ 5251.721145] drbd r0: conn(
WFConnection -> WFReportParams )
Jan 27 20:32:21 an-c03n02 kernel: [ 5251.721146] drbd r0: Starting
asender thread (from drbd_r_r0 [21724])
Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730101] block drbd0:
drbd_sync_handshake:
Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730104] block drbd0: self
FD6969A6E17CBA41:853E72BBF0C9260D:AA976D5345E69DAA:AA966D5345E69DAA
bits:0 flags:0
Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730106] block drbd0: peer
853E72BBF0C9260C:0000000000000000:AA976D5345E69DAA:AA966D5345E69DAA
bits:0 flags:0
Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730107] block drbd0:
uuid_compare()=1 by rule 70
Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730111] block drbd0: peer(
Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk(
Outdated -> Consistent )
Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730229] block drbd0: send
bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23;
compression: 100.0%
Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730496] block drbd0: receive
bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23;
compression: 100.0%
Jan 27 20:32:21 an-c03n02 kernel: [ 5251.730499] block drbd0: helper
command: /sbin/drbdadm before-resync-source minor-0
Jan 27 20:32:21 an-c03n02 kernel: [ 5251.731835] block drbd0: helper
command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0)
Jan 27 20:32:21 an-c03n02 kernel: [ 5251.731848] block drbd0: conn(
WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent )
Jan 27 20:32:21 an-c03n02 kernel: [ 5251.731861] block drbd0: Began
resync as SyncSource (will sync 0 KB [0 bits set]).
Jan 27 20:32:21 an-c03n02 kernel: [ 5251.731888] block drbd0: updated
sync UUID
FD6969A6E17CBA41:853F72BBF0C9260D:853E72BBF0C9260D:AA976D5345E69DAA
Jan 27 20:32:21 an-c03n02 kernel: [ 5251.750241] block drbd0: Resync
done (total 1 sec; paused 0 sec; 0 K/sec)
Jan 27 20:32:21 an-c03n02 kernel: [ 5251.750248] block drbd0: updated
UUIDs FD6969A6E17CBA41:0000000000000000:853F72BBF0C9260D:853E72BBF0C9260D
Jan 27 20:32:21 an-c03n02 kernel: [ 5251.750253] block drbd0: conn(
SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
====
What?
After about a moment, things sort of clear up:
====
[root at an-c03n02 ~]# pcs status
Cluster name: an-cluster-03
Last updated: Mon Jan 27 20:34:37 2014
Last change: Mon Jan 27 20:32:05 2014 via cibadmin on an-c03n01.alteeve.ca
Stack: corosync
Current DC: an-c03n02.alteeve.ca (2) - partition with quorum
Version: 1.1.10-19.el7-368c726
2 Nodes configured
4 Resources configured
Online: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
Full list of resources:
fence_n01_virsh (stonith:fence_virsh): Started an-c03n01.alteeve.ca
fence_n02_virsh (stonith:fence_virsh): Started an-c03n02.alteeve.ca
Master/Slave Set: drbd_r0_Clone [drbd_r0]
Masters: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
Failed actions:
drbd_r0_promote_0 on an-c03n01.alteeve.ca 'unknown error' (1):
call=30, status=complete, last-rc-change='Mon Jan 27 20:32:05 2014',
queued=15187ms, exec=0ms
PCSD Status:
an-c03n01.alteeve.ca:
an-c03n01.alteeve.ca: Online
an-c03n02.alteeve.ca:
an-c03n02.alteeve.ca: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
====
Post-enable logs from an-c03n01:
====
Jan 27 20:33:21 an-c03n01 attrd[843]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd_r0 (10000)
Jan 27 20:33:21 an-c03n01 attrd[843]: notice: attrd_perform_update: Sent
update 48: master-drbd_r0=10000
Jan 27 20:33:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=41, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:33:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=42, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:33:21 an-c03n01 kernel: [19903.079190] block drbd0: role(
Secondary -> Primary )
Jan 27 20:33:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_promote_0 (call=43, rc=0, cib-update=25,
confirmed=true) ok
Jan 27 20:33:21 an-c03n01 crmd[845]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=44, rc=0, cib-update=0, confirmed=true) ok
====
Post-enable logs from an-c03n01:
====
Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: do_state_transition:
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jan 27 20:33:21 an-c03n02 pengine[21132]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Jan 27 20:33:21 an-c03n02 pengine[21132]: warning: unpack_rsc_op:
Processing failed op promote for drbd_r0:1 on an-c03n01.alteeve.ca:
unknown error (1)
Jan 27 20:33:21 an-c03n02 pengine[21132]: notice: LogActions: Promote
drbd_r0:1 (Slave -> Master an-c03n01.alteeve.ca)
Jan 27 20:33:21 an-c03n02 pengine[21132]: notice: process_pe_message:
Calculated Transition 9: /var/lib/pacemaker/pengine/pe-input-173.bz2
Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 3: cancel drbd_r0_cancel_60000 on an-c03n01.alteeve.ca
Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 53: notify drbd_r0_pre_notify_promote_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 55: notify drbd_r0_pre_notify_promote_0 on
an-c03n01.alteeve.ca
Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=38, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: run_graph: Transition 9
(Complete=4, Pending=0, Fired=0, Skipped=3, Incomplete=5,
Source=/var/lib/pacemaker/pengine/pe-input-173.bz2): Stopped
Jan 27 20:33:21 an-c03n02 pengine[21132]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Jan 27 20:33:21 an-c03n02 pengine[21132]: warning: unpack_rsc_op:
Processing failed op promote for drbd_r0:1 on an-c03n01.alteeve.ca:
unknown error (1)
Jan 27 20:33:21 an-c03n02 pengine[21132]: notice: LogActions: Promote
drbd_r0:1 (Slave -> Master an-c03n01.alteeve.ca)
Jan 27 20:33:21 an-c03n02 pengine[21132]: notice: process_pe_message:
Calculated Transition 10: /var/lib/pacemaker/pengine/pe-input-174.bz2
Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 52: notify drbd_r0_pre_notify_promote_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 54: notify drbd_r0_pre_notify_promote_0 on
an-c03n01.alteeve.ca
Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=39, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 17: promote drbd_r0_promote_0 on an-c03n01.alteeve.ca
Jan 27 20:33:21 an-c03n02 kernel: [ 5311.444071] block drbd0: peer(
Secondary -> Primary )
Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 53: notify drbd_r0_post_notify_promote_0 on
an-c03n02.alteeve.ca (local)
Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: te_rsc_command:
Initiating action 55: notify drbd_r0_post_notify_promote_0 on
an-c03n01.alteeve.ca
Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=40, rc=0, cib-update=0, confirmed=true) ok
Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: run_graph: Transition 10
(Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-174.bz2): Complete
Jan 27 20:33:21 an-c03n02 crmd[21133]: notice: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
====
I have no idea what's going wrong here... I'd love for any insight/help.
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
More information about the Pacemaker
mailing list