[Pacemaker] Trying to figure out a constraint
Digimer
lists at alteeve.ca
Wed Jun 18 04:03:32 UTC 2014
Hi all,
I am trying to setup a basic pacemaker 1.1.10 on RHEL 6.5 with DRBD
8.3.16.
I've setup DRBD and configured one clustered LVM volume group using
that drbd resource as the PV. With DRBD configured alone, I can
stop/start pacemaker repeatedly without issue. However, when I add the
LVM VG using ocf:heartbeat:LVM and setup a constraint, subsequent
restarts of pacemaker almost always end up with a fence. I have to think
then that I am messing up my constraints...
Config:
====
Cluster Name: an-anvil-04
Corosync Nodes:
Pacemaker Nodes:
an-a04n01.alteeve.ca an-a04n02.alteeve.ca
Resources:
Master: drbd_r0_Clone
Meta Attrs: master-max=2 master-node-max=1 clone-max=2
clone-node-max=1 notify=true
Resource: drbd_r0 (class=ocf provider=linbit type=drbd)
Attributes: drbd_resource=r0
Operations: monitor interval=30s (drbd_r0-monitor-interval-30s)
Master: lvm_n01_vg0_Clone
Meta Attrs: master-max=2 master-node-max=1 clone-max=2
clone-node-max=1 notify=true
Resource: lvm_n01_vg0 (class=ocf provider=heartbeat type=LVM)
Attributes: volgrpname=an-a04n01_vg0
Operations: monitor interval=30s (lvm_n01_vg0-monitor-interval-30s)
Stonith Devices:
Resource: fence_n01_ipmi (class=stonith type=fence_ipmilan)
Attributes: pcmk_host_list=an-a04n01.alteeve.ca ipaddr=an-a04n01.ipmi
action=reboot login=admin passwd=Initial1 delay=15
Operations: monitor interval=60s (fence_n01_ipmi-monitor-interval-60s)
Resource: fence_n02_ipmi (class=stonith type=fence_ipmilan)
Attributes: pcmk_host_list=an-a04n02.alteeve.ca ipaddr=an-a04n02.ipmi
action=reboot login=admin passwd=Initial1
Operations: monitor interval=60s (fence_n02_ipmi-monitor-interval-60s)
Fencing Levels:
Location Constraints:
Ordering Constraints:
promote drbd_r0_Clone then start lvm_n01_vg0_Clone (Mandatory)
(id:order-drbd_r0_Clone-lvm_n01_vg0_Clone-mandatory)
Colocation Constraints:
Cluster Properties:
cluster-infrastructure: cman
dc-version: 1.1.10-14.el6_5.3-368c726
last-lrm-refresh: 1403062921
no-quorum-policy: ignore
stonith-enabled: true
====
Constraint:
====
Location Constraints:
Ordering Constraints:
promote drbd_r0_Clone then start lvm_n01_vg0_Clone (Mandatory)
(id:order-drbd_r0_Clone-lvm_n01_vg0_Clone-mandatory)
Colocation Constraints:
====
Logs from 'an-a04n01', starting with '/etc/init.d/pacemaker start'
(always survives and fences 'an-a04n02'):
====
Jun 17 23:55:32 an-a04n01 corosync[28088]: [MAIN ] Corosync Cluster
Engine ('1.4.1'): started and ready to provide service.
Jun 17 23:55:32 an-a04n01 corosync[28088]: [MAIN ] Corosync built-in
features: nss dbus rdma snmp
Jun 17 23:55:32 an-a04n01 corosync[28088]: [MAIN ] Successfully read
config from /etc/cluster/cluster.conf
Jun 17 23:55:32 an-a04n01 corosync[28088]: [MAIN ] Successfully
parsed cman config
Jun 17 23:55:32 an-a04n01 corosync[28088]: [TOTEM ] Initializing
transport (UDP/IP Multicast).
Jun 17 23:55:32 an-a04n01 corosync[28088]: [TOTEM ] Initializing
transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jun 17 23:55:32 an-a04n01 corosync[28088]: [TOTEM ] The network
interface [10.20.40.1] is now up.
Jun 17 23:55:32 an-a04n01 corosync[28088]: [QUORUM] Using quorum
provider quorum_cman
Jun 17 23:55:32 an-a04n01 corosync[28088]: [SERV ] Service engine
loaded: corosync cluster quorum service v0.1
Jun 17 23:55:32 an-a04n01 corosync[28088]: [CMAN ] CMAN 3.0.12.1
(built Apr 3 2014 05:12:26) started
Jun 17 23:55:32 an-a04n01 corosync[28088]: [SERV ] Service engine
loaded: corosync CMAN membership service 2.90
Jun 17 23:55:32 an-a04n01 corosync[28088]: [SERV ] Service engine
loaded: openais checkpoint service B.01.01
Jun 17 23:55:32 an-a04n01 corosync[28088]: [SERV ] Service engine
loaded: corosync extended virtual synchrony service
Jun 17 23:55:32 an-a04n01 corosync[28088]: [SERV ] Service engine
loaded: corosync configuration service
Jun 17 23:55:32 an-a04n01 corosync[28088]: [SERV ] Service engine
loaded: corosync cluster closed process group service v1.01
Jun 17 23:55:32 an-a04n01 corosync[28088]: [SERV ] Service engine
loaded: corosync cluster config database access v1.01
Jun 17 23:55:32 an-a04n01 corosync[28088]: [SERV ] Service engine
loaded: corosync profile loading service
Jun 17 23:55:32 an-a04n01 corosync[28088]: [QUORUM] Using quorum
provider quorum_cman
Jun 17 23:55:32 an-a04n01 corosync[28088]: [SERV ] Service engine
loaded: corosync cluster quorum service v0.1
Jun 17 23:55:32 an-a04n01 corosync[28088]: [MAIN ] Compatibility mode
set to whitetank. Using V1 and V2 of the synchronization engine.
Jun 17 23:55:32 an-a04n01 corosync[28088]: [TOTEM ] A processor joined
or left the membership and a new membership was formed.
Jun 17 23:55:32 an-a04n01 corosync[28088]: [CMAN ] quorum regained,
resuming activity
Jun 17 23:55:32 an-a04n01 corosync[28088]: [QUORUM] This node is
within the primary component and will provide service.
Jun 17 23:55:32 an-a04n01 corosync[28088]: [QUORUM] Members[1]: 1
Jun 17 23:55:32 an-a04n01 corosync[28088]: [QUORUM] Members[1]: 1
Jun 17 23:55:32 an-a04n01 corosync[28088]: [CPG ] chosen downlist:
sender r(0) ip(10.20.40.1) ; members(old:0 left:0)
Jun 17 23:55:32 an-a04n01 corosync[28088]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 17 23:55:33 an-a04n01 corosync[28088]: [TOTEM ] A processor joined
or left the membership and a new membership was formed.
Jun 17 23:55:33 an-a04n01 corosync[28088]: [QUORUM] Members[2]: 1 2
Jun 17 23:55:33 an-a04n01 corosync[28088]: [QUORUM] Members[2]: 1 2
Jun 17 23:55:33 an-a04n01 corosync[28088]: [CPG ] chosen downlist:
sender r(0) ip(10.20.40.1) ; members(old:1 left:0)
Jun 17 23:55:33 an-a04n01 corosync[28088]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 17 23:55:36 an-a04n01 fenced[28143]: fenced 3.0.12.1 started
Jun 17 23:55:36 an-a04n01 dlm_controld[28169]: dlm_controld 3.0.12.1 started
Jun 17 23:55:37 an-a04n01 gfs_controld[28218]: gfs_controld 3.0.12.1 started
Jun 17 23:55:38 an-a04n01 pacemaker: Attempting to start clvmd
Jun 17 23:55:39 an-a04n01 kernel: dlm: Using TCP for communications
Jun 17 23:55:40 an-a04n01 kernel: dlm: connecting to 2
Jun 17 23:55:40 an-a04n01 clvmd: Cluster LVM daemon started - connected
to CMAN
Jun 17 23:55:41 an-a04n01 pacemaker: Starting Pacemaker Cluster Manager
Jun 17 23:55:42 an-a04n01 pacemakerd[28349]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Jun 17 23:55:42 an-a04n01 pacemakerd[28349]: notice: main: Starting
Pacemaker 1.1.10-14.el6_5.3 (Build: 368c726): generated-manpages
agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc
nagios corosync-plugin cman
Jun 17 23:55:42 an-a04n01 cib[28355]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Jun 17 23:55:42 an-a04n01 lrmd[28357]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Jun 17 23:55:42 an-a04n01 attrd[28358]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Jun 17 23:55:42 an-a04n01 pengine[28359]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Jun 17 23:55:42 an-a04n01 attrd[28358]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: cman
Jun 17 23:55:42 an-a04n01 crmd[28360]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Jun 17 23:55:42 an-a04n01 crmd[28360]: notice: main: CRM Git Version:
368c726
Jun 17 23:55:42 an-a04n01 stonith-ng[28356]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Jun 17 23:55:42 an-a04n01 stonith-ng[28356]: notice:
crm_cluster_connect: Connecting to cluster infrastructure: cman
Jun 17 23:55:42 an-a04n01 attrd[28358]: notice: main: Starting mainloop...
Jun 17 23:55:42 an-a04n01 cib[28355]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: cman
Jun 17 23:55:43 an-a04n01 crmd[28360]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: cman
Jun 17 23:55:43 an-a04n01 crmd[28360]: notice: cman_event_callback:
Membership 276: quorum acquired
Jun 17 23:55:43 an-a04n01 crmd[28360]: notice: crm_update_peer_state:
cman_event_callback: Node an-a04n01.alteeve.ca[1] - state is now member
(was (null))
Jun 17 23:55:43 an-a04n01 crmd[28360]: notice: crm_update_peer_state:
cman_event_callback: Node an-a04n02.alteeve.ca[2] - state is now member
(was (null))
Jun 17 23:55:43 an-a04n01 stonith-ng[28356]: notice: setup_cib:
Watching for stonith topology changes
Jun 17 23:55:43 an-a04n01 crmd[28360]: notice: do_started: The local
CRM is operational
Jun 17 23:55:43 an-a04n01 crmd[28360]: notice: do_state_transition:
State transition S_STARTING -> S_PENDING [ input=I_PENDING
cause=C_FSA_INTERNAL origin=do_started ]
Jun 17 23:55:43 an-a04n01 stonith-ng[28356]: notice: unpack_config: On
loss of CCM Quorum: Ignore
Jun 17 23:55:44 an-a04n01 stonith-ng[28356]: notice:
stonith_device_register: Added 'fence_n01_ipmi' to the device list (1
active devices)
Jun 17 23:55:45 an-a04n01 stonith-ng[28356]: notice:
stonith_device_register: Added 'fence_n02_ipmi' to the device list (2
active devices)
Jun 17 23:56:04 an-a04n01 crmd[28360]: notice: do_state_transition:
State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Jun 17 23:56:04 an-a04n01 attrd[28358]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Jun 17 23:56:04 an-a04n01 pengine[28359]: notice: unpack_config: On
loss of CCM Quorum: Ignore
Jun 17 23:56:04 an-a04n01 pengine[28359]: notice: LogActions: Start
fence_n01_ipmi#011(an-a04n01.alteeve.ca)
Jun 17 23:56:04 an-a04n01 pengine[28359]: notice: LogActions: Start
fence_n02_ipmi#011(an-a04n02.alteeve.ca)
Jun 17 23:56:04 an-a04n01 pengine[28359]: notice: LogActions: Start
drbd_r0:0#011(an-a04n01.alteeve.ca)
Jun 17 23:56:04 an-a04n01 pengine[28359]: notice: LogActions: Start
drbd_r0:1#011(an-a04n02.alteeve.ca)
Jun 17 23:56:04 an-a04n01 pengine[28359]: notice: LogActions: Start
lvm_n01_vg0:0#011(an-a04n01.alteeve.ca - blocked)
Jun 17 23:56:04 an-a04n01 pengine[28359]: notice: LogActions: Start
lvm_n01_vg0:1#011(an-a04n02.alteeve.ca - blocked)
Jun 17 23:56:04 an-a04n01 pengine[28359]: notice: process_pe_message:
Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-152.bz2
Jun 17 23:56:04 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 9: monitor fence_n01_ipmi_monitor_0 on
an-a04n02.alteeve.ca
Jun 17 23:56:04 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 4: monitor fence_n01_ipmi_monitor_0 on
an-a04n01.alteeve.ca (local)
Jun 17 23:56:04 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 10: monitor fence_n02_ipmi_monitor_0 on
an-a04n02.alteeve.ca
Jun 17 23:56:04 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 5: monitor fence_n02_ipmi_monitor_0 on
an-a04n01.alteeve.ca (local)
Jun 17 23:56:04 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 6: monitor drbd_r0:0_monitor_0 on an-a04n01.alteeve.ca
(local)
Jun 17 23:56:04 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 11: monitor drbd_r0:1_monitor_0 on an-a04n02.alteeve.ca
Jun 17 23:56:04 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 7: monitor lvm_n01_vg0:0_monitor_0 on
an-a04n01.alteeve.ca (local)
Jun 17 23:56:04 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 12: monitor lvm_n01_vg0:1_monitor_0 on
an-a04n02.alteeve.ca
Jun 17 23:56:04 an-a04n01 LVM(lvm_n01_vg0)[28419]: WARNING: LVM Volume
an-a04n01_vg0 is not available (stopped)
Jun 17 23:56:05 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation lvm_n01_vg0_monitor_0 (call=19, rc=7, cib-update=28,
confirmed=true) not running
Jun 17 23:56:05 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation drbd_r0_monitor_0 (call=14, rc=7, cib-update=29,
confirmed=true) not running
Jun 17 23:56:05 an-a04n01 crmd[28360]: notice: process_lrm_event:
an-a04n01.alteeve.ca-drbd_r0_monitor_0:14 [ \n ]
Jun 17 23:56:05 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 3: probe_complete probe_complete on
an-a04n01.alteeve.ca (local) - no waiting
Jun 17 23:56:05 an-a04n01 attrd[28358]: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
Jun 17 23:56:05 an-a04n01 attrd[28358]: notice: attrd_perform_update:
Sent update 4: probe_complete=true
Jun 17 23:56:05 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 8: probe_complete probe_complete on
an-a04n02.alteeve.ca - no waiting
Jun 17 23:56:05 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 13: start fence_n01_ipmi_start_0 on
an-a04n01.alteeve.ca (local)
Jun 17 23:56:05 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 15: start fence_n02_ipmi_start_0 on an-a04n02.alteeve.ca
Jun 17 23:56:05 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 17: start drbd_r0:0_start_0 on an-a04n01.alteeve.ca
(local)
Jun 17 23:56:06 an-a04n01 stonith-ng[28356]: notice:
stonith_device_register: Device 'fence_n01_ipmi' already existed in
device list (2 active devices)
Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 19: start drbd_r0:1_start_0 on an-a04n02.alteeve.ca
Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation fence_n01_ipmi_start_0 (call=25, rc=0, cib-update=30,
confirmed=true) ok
Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 14: monitor fence_n01_ipmi_monitor_60000 on
an-a04n01.alteeve.ca (local)
Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 16: monitor fence_n02_ipmi_monitor_60000 on
an-a04n02.alteeve.ca
Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation fence_n01_ipmi_monitor_60000 (call=30, rc=0, cib-update=31,
confirmed=false) ok
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: Starting worker thread
(from cqueue [3274])
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: disk( Diskless ->
Attaching )
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: Found 4 transactions (126
active extents) in activity log.
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: Method to ensure write
ordering: flush
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: drbd_bm_resize called
with capacity == 909525832
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: resync bitmap:
bits=113690729 words=1776418 pages=3470
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: size = 434 GB (454762916 KB)
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: bitmap READ of 3470 pages
took 9 jiffies
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: recounting of set bits
took additional 16 jiffies
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: 0 KB (0 bits) marked
out-of-sync by on disk bit-map.
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: disk( Attaching ->
Consistent )
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: attached to UUIDs
C71081B1CBAFC620:0000000000000000:F9F9DA52F6D93990:F9F8DA52F6D93991
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: conn( StandAlone ->
Unconnected )
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: Starting receiver thread
(from drbd0_worker [28524])
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: receiver (re)started
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: conn( Unconnected ->
WFConnection )
Jun 17 23:56:06 an-a04n01 attrd[28358]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd_r0 (5)
Jun 17 23:56:06 an-a04n01 attrd[28358]: notice: attrd_perform_update:
Sent update 9: master-drbd_r0=5
Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation drbd_r0_start_0 (call=27, rc=0, cib-update=32, confirmed=true) ok
Jun 17 23:56:06 an-a04n01 attrd[28358]: notice: attrd_perform_update:
Sent update 11: master-drbd_r0=5
Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 82: notify drbd_r0:0_post_notify_start_0 on
an-a04n01.alteeve.ca (local)
Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 83: notify drbd_r0:1_post_notify_start_0 on
an-a04n02.alteeve.ca
Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=34, rc=0, cib-update=0, confirmed=true) ok
Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: run_graph: Transition 0
(Complete=25, Pending=0, Fired=0, Skipped=2, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-152.bz2): Stopped
Jun 17 23:56:06 an-a04n01 pengine[28359]: notice: unpack_config: On
loss of CCM Quorum: Ignore
Jun 17 23:56:06 an-a04n01 pengine[28359]: notice: LogActions: Promote
drbd_r0:0#011(Slave -> Master an-a04n01.alteeve.ca)
Jun 17 23:56:06 an-a04n01 pengine[28359]: notice: LogActions: Promote
drbd_r0:1#011(Slave -> Master an-a04n02.alteeve.ca)
Jun 17 23:56:06 an-a04n01 pengine[28359]: notice: LogActions: Start
lvm_n01_vg0:0#011(an-a04n01.alteeve.ca)
Jun 17 23:56:06 an-a04n01 pengine[28359]: notice: LogActions: Start
lvm_n01_vg0:1#011(an-a04n02.alteeve.ca)
Jun 17 23:56:06 an-a04n01 pengine[28359]: notice: process_pe_message:
Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-153.bz2
Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 84: notify drbd_r0_pre_notify_promote_0 on
an-a04n01.alteeve.ca (local)
Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 86: notify drbd_r0_pre_notify_promote_0 on
an-a04n02.alteeve.ca
Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=37, rc=0, cib-update=0, confirmed=true) ok
Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 13: promote drbd_r0_promote_0 on an-a04n01.alteeve.ca
(local)
Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 16: promote drbd_r0_promote_0 on an-a04n02.alteeve.ca
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: helper command:
/sbin/drbdadm fence-peer minor-0
Jun 17 23:56:07 an-a04n01 kernel: block drbd0: Handshake successful:
Agreed network protocol version 97
Jun 17 23:56:07 an-a04n01 stonith_admin[28637]: notice: crm_log_args:
Invoked: stonith_admin --fence an-a04n02.alteeve.ca
Jun 17 23:56:07 an-a04n01 stonith-ng[28356]: notice: handle_request:
Client stonith_admin.28637.6ed13ba6 wants to fence (off)
'an-a04n02.alteeve.ca' with device '(any)'
Jun 17 23:56:07 an-a04n01 stonith-ng[28356]: notice:
initiate_remote_stonith_op: Initiating remote operation off for
an-a04n02.alteeve.ca: 382bfa3d-55da-4eed-ad8a-a1a883022a35 (0)
Jun 17 23:56:07 an-a04n01 stonith-ng[28356]: notice:
can_fence_host_with_device: fence_n02_ipmi can fence
an-a04n02.alteeve.ca: static-list
Jun 17 23:56:07 an-a04n01 stonith-ng[28356]: notice:
can_fence_host_with_device: fence_n01_ipmi can not fence
an-a04n02.alteeve.ca: static-list
Jun 17 23:56:07 an-a04n01 stonith-ng[28356]: notice:
can_fence_host_with_device: fence_n02_ipmi can fence
an-a04n02.alteeve.ca: static-list
Jun 17 23:56:07 an-a04n01 stonith-ng[28356]: notice:
can_fence_host_with_device: fence_n01_ipmi can not fence
an-a04n02.alteeve.ca: static-list
Jun 17 23:56:07 an-a04n01 stonith-ng[28356]: notice:
can_fence_host_with_device: fence_n02_ipmi can not fence
an-a04n01.alteeve.ca: static-list
Jun 17 23:56:07 an-a04n01 stonith-ng[28356]: notice:
can_fence_host_with_device: fence_n01_ipmi can fence
an-a04n01.alteeve.ca: static-list
Jun 17 23:56:23 an-a04n01 stonith-ng[28356]: notice: log_operation:
Operation 'off' [28638] (call 2 from stonith_admin.28637) for host
'an-a04n02.alteeve.ca' with device 'fence_n02_ipmi' returned: 0 (OK)
Jun 17 23:56:25 an-a04n01 corosync[28088]: [TOTEM ] A processor
failed, forming new configuration.
Jun 17 23:56:26 an-a04n01 lrmd[28357]: warning: child_timeout_callback:
drbd_r0_promote_0 process (PID 28604) timed out
Jun 17 23:56:26 an-a04n01 lrmd[28357]: warning: operation_finished:
drbd_r0_promote_0:28604 - timed out after 20000ms
Jun 17 23:56:26 an-a04n01 crmd[28360]: error: process_lrm_event: LRM
operation drbd_r0_promote_0 (40) Timed Out (timeout=20000ms)
Jun 17 23:56:26 an-a04n01 crmd[28360]: notice: process_lrm_event:
an-a04n01.alteeve.ca-drbd_r0_promote_0:40 [ allow-two-primaries;\n ]
Jun 17 23:56:26 an-a04n01 crmd[28360]: warning: status_from_rc: Action
13 (drbd_r0_promote_0) on an-a04n01.alteeve.ca failed (target: 0 vs. rc:
1): Error
Jun 17 23:56:26 an-a04n01 crmd[28360]: warning: update_failcount:
Updating failcount for drbd_r0 on an-a04n01.alteeve.ca after failed
promote: rc=1 (update=value++, time=1403063786)
Jun 17 23:56:26 an-a04n01 crmd[28360]: warning: update_failcount:
Updating failcount for drbd_r0 on an-a04n01.alteeve.ca after failed
promote: rc=1 (update=value++, time=1403063786)
Jun 17 23:56:26 an-a04n01 attrd[28358]: notice: attrd_trigger_update:
Sending flush op to all hosts for: fail-count-drbd_r0 (1)
Jun 17 23:56:26 an-a04n01 attrd[28358]: notice: attrd_perform_update:
Sent update 14: fail-count-drbd_r0=1
Jun 17 23:56:26 an-a04n01 attrd[28358]: notice: attrd_trigger_update:
Sending flush op to all hosts for: last-failure-drbd_r0 (1403063786)
Jun 17 23:56:26 an-a04n01 attrd[28358]: notice: attrd_perform_update:
Sent update 17: last-failure-drbd_r0=1403063786
Jun 17 23:56:26 an-a04n01 attrd[28358]: notice: attrd_trigger_update:
Sending flush op to all hosts for: fail-count-drbd_r0 (2)
Jun 17 23:56:26 an-a04n01 attrd[28358]: notice: attrd_perform_update:
Sent update 19: fail-count-drbd_r0=2
Jun 17 23:56:26 an-a04n01 attrd[28358]: notice: attrd_trigger_update:
Sending flush op to all hosts for: last-failure-drbd_r0 (1403063786)
Jun 17 23:56:26 an-a04n01 attrd[28358]: notice: attrd_perform_update:
Sent update 21: last-failure-drbd_r0=1403063786
Jun 17 23:56:27 an-a04n01 corosync[28088]: [QUORUM] Members[1]: 1
Jun 17 23:56:27 an-a04n01 corosync[28088]: [TOTEM ] A processor joined
or left the membership and a new membership was formed.
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: crm_update_peer_state:
cman_event_callback: Node an-a04n02.alteeve.ca[2] - state is now lost
(was member)
Jun 17 23:56:27 an-a04n01 crmd[28360]: warning: match_down_event: No
match for shutdown action on an-a04n02.alteeve.ca
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: peer_update_callback:
Stonith/shutdown of an-a04n02.alteeve.ca not matched
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice:
fail_incompletable_actions: Action 87 (87) is scheduled for
an-a04n02.alteeve.ca (offline)
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice:
fail_incompletable_actions: Action 16 (16) was pending on
an-a04n02.alteeve.ca (offline)
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice:
fail_incompletable_actions: Action 89 (89) is scheduled for
an-a04n02.alteeve.ca (offline)
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice:
fail_incompletable_actions: Action 44 (44) is scheduled for
an-a04n02.alteeve.ca (offline)
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice:
fail_incompletable_actions: Action 43 (43) is scheduled for
an-a04n02.alteeve.ca (offline)
Jun 17 23:56:27 an-a04n01 crmd[28360]: warning:
fail_incompletable_actions: Node an-a04n02.alteeve.ca shutdown resulted
in un-runnable actions
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice:
fail_incompletable_actions: Action 87 (87) is scheduled for
an-a04n02.alteeve.ca (offline)
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice:
fail_incompletable_actions: Action 16 (16) was pending on
an-a04n02.alteeve.ca (offline)
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice:
fail_incompletable_actions: Action 89 (89) is scheduled for
an-a04n02.alteeve.ca (offline)
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice:
fail_incompletable_actions: Action 44 (44) is scheduled for
an-a04n02.alteeve.ca (offline)
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice:
fail_incompletable_actions: Action 43 (43) is scheduled for
an-a04n02.alteeve.ca (offline)
Jun 17 23:56:27 an-a04n01 kernel: dlm: closing connection to node 2
Jun 17 23:56:27 an-a04n01 crmd[28360]: warning:
fail_incompletable_actions: Node an-a04n02.alteeve.ca shutdown resulted
in un-runnable actions
Jun 17 23:56:27 an-a04n01 corosync[28088]: [CPG ] chosen downlist:
sender r(0) ip(10.20.40.1) ; members(old:2 left:1)
Jun 17 23:56:27 an-a04n01 corosync[28088]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 17 23:56:27 an-a04n01 stonith-ng[28356]: notice: remote_op_done:
Operation off of an-a04n02.alteeve.ca by an-a04n01.alteeve.ca for
stonith_admin.28637 at an-a04n01.alteeve.ca.382bfa3d: OK
Jun 17 23:56:27 an-a04n01 attrd[28358]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Jun 17 23:56:27 an-a04n01 attrd[28358]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd_r0 (5)
Jun 17 23:56:27 an-a04n01 crmd[28360]: warning: match_down_event: No
match for shutdown action on an-a04n02.alteeve.ca
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: peer_update_callback:
Stonith/shutdown of an-a04n02.alteeve.ca not matched
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice:
fail_incompletable_actions: Action 87 (87) is scheduled for
an-a04n02.alteeve.ca (offline)
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice:
fail_incompletable_actions: Action 16 (16) was pending on
an-a04n02.alteeve.ca (offline)
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice:
fail_incompletable_actions: Action 89 (89) is scheduled for
an-a04n02.alteeve.ca (offline)
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice:
fail_incompletable_actions: Action 44 (44) is scheduled for
an-a04n02.alteeve.ca (offline)
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice:
fail_incompletable_actions: Action 43 (43) is scheduled for
an-a04n02.alteeve.ca (offline)
Jun 17 23:56:27 an-a04n01 crmd[28360]: warning:
fail_incompletable_actions: Node an-a04n02.alteeve.ca shutdown resulted
in un-runnable actions
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: tengine_stonith_notify:
Peer an-a04n02.alteeve.ca was terminated (off) by an-a04n01.alteeve.ca
for an-a04n01.alteeve.ca: OK (ref=382bfa3d-55da-4eed-ad8a-a1a883022a35)
by client stonith_admin.28637
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: tengine_stonith_notify:
Notified CMAN that 'an-a04n02.alteeve.ca' is now fenced
Jun 17 23:56:27 an-a04n01 fenced[28143]: fencing node an-a04n02.alteeve.ca
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 85: notify drbd_r0_post_notify_promote_0 on
an-a04n01.alteeve.ca (local)
Jun 17 23:56:27 an-a04n01 stonith_admin-fence-peer.sh[28708]:
stonith_admin successfully fenced peer an-a04n02.alteeve.ca.
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: helper command:
/sbin/drbdadm fence-peer minor-0 exit code 7 (0x700)
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: fence-peer helper
returned 7 (peer was stonithed)
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: role( Secondary ->
Primary ) disk( Consistent -> UpToDate ) pdsk( DUnknown -> Outdated )
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: new current UUID
B704B7175D09E91D:C71081B1CBAFC620:F9F9DA52F6D93990:F9F8DA52F6D93991
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: Starting asender thread
(from drbd0_receiver [28542])
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: data-integrity-alg:
<not-used>
Jun 17 23:56:27 an-a04n01 attrd[28358]: notice: attrd_trigger_update:
Sending flush op to all hosts for: last-failure-drbd_r0 (1403063786)
Jun 17 23:56:27 an-a04n01 attrd[28358]: notice: attrd_trigger_update:
Sending flush op to all hosts for: fail-count-drbd_r0 (2)
Jun 17 23:56:27 an-a04n01 attrd[28358]: notice: attrd_perform_update:
Sent update 27: fail-count-drbd_r0=2
Jun 17 23:56:27 an-a04n01 attrd[28358]: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
Jun 17 23:56:27 an-a04n01 attrd[28358]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd_r0 (10000)
Jun 17 23:56:27 an-a04n01 attrd[28358]: notice: attrd_perform_update:
Sent update 31: master-drbd_r0=10000
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=43, rc=0, cib-update=0, confirmed=true) ok
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: run_graph: Transition 1
(Complete=12, Pending=0, Fired=0, Skipped=8, Incomplete=4,
Source=/var/lib/pacemaker/pengine/pe-input-153.bz2): Stopped
Jun 17 23:56:27 an-a04n01 pengine[28359]: notice: unpack_config: On
loss of CCM Quorum: Ignore
Jun 17 23:56:27 an-a04n01 pengine[28359]: warning: unpack_rsc_op:
Processing failed op promote for drbd_r0:0 on an-a04n01.alteeve.ca:
unknown error (1)
Jun 17 23:56:27 an-a04n01 pengine[28359]: notice: LogActions: Start
fence_n02_ipmi#011(an-a04n01.alteeve.ca)
Jun 17 23:56:27 an-a04n01 pengine[28359]: notice: LogActions: Demote
drbd_r0:0#011(Master -> Slave an-a04n01.alteeve.ca)
Jun 17 23:56:27 an-a04n01 pengine[28359]: notice: LogActions: Recover
drbd_r0:0#011(Master an-a04n01.alteeve.ca)
Jun 17 23:56:27 an-a04n01 pengine[28359]: notice: LogActions: Start
lvm_n01_vg0:0#011(an-a04n01.alteeve.ca - blocked)
Jun 17 23:56:27 an-a04n01 pengine[28359]: notice: process_pe_message:
Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-154.bz2
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 8: start fence_n02_ipmi_start_0 on
an-a04n01.alteeve.ca (local)
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 75: notify drbd_r0_pre_notify_demote_0 on
an-a04n01.alteeve.ca (local)
Jun 17 23:56:27 an-a04n01 fence_pcmk[28761]: Requesting Pacemaker fence
an-a04n02.alteeve.ca (reset)
Jun 17 23:56:27 an-a04n01 stonith-ng[28356]: notice:
stonith_device_register: Device 'fence_n02_ipmi' already existed in
device list (2 active devices)
Jun 17 23:56:27 an-a04n01 stonith_admin[28763]: notice: crm_log_args:
Invoked: stonith_admin --reboot an-a04n02.alteeve.ca --tolerance 5s
--tag cman
Jun 17 23:56:27 an-a04n01 stonith-ng[28356]: notice: handle_request:
Client stonith_admin.cman.28763.4e2c3020 wants to fence (reboot)
'an-a04n02.alteeve.ca' with device '(any)'
Jun 17 23:56:27 an-a04n01 stonith-ng[28356]: notice:
initiate_remote_stonith_op: Initiating remote operation reboot for
an-a04n02.alteeve.ca: bbb6c5c4-d1a7-4df7-a8b0-e33f4ad74860 (0)
Jun 17 23:56:27 an-a04n01 stonith-ng[28356]: notice:
can_fence_host_with_device: fence_n02_ipmi can fence
an-a04n02.alteeve.ca: static-list
Jun 17 23:56:27 an-a04n01 stonith-ng[28356]: notice:
can_fence_host_with_device: fence_n01_ipmi can not fence
an-a04n02.alteeve.ca: static-list
Jun 17 23:56:27 an-a04n01 stonith-ng[28356]: notice:
can_fence_host_with_device: fence_n02_ipmi can fence
an-a04n02.alteeve.ca: static-list
Jun 17 23:56:27 an-a04n01 stonith-ng[28356]: notice:
can_fence_host_with_device: fence_n01_ipmi can not fence
an-a04n02.alteeve.ca: static-list
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: PingAck did not arrive in
time.
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: conn( WFReportParams ->
NetworkFailure )
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: asender terminated
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: Terminating drbd0_asender
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: Connection closed
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: conn( NetworkFailure ->
Unconnected )
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: receiver terminated
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: Restarting drbd0_receiver
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: receiver (re)started
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: conn( Unconnected ->
WFConnection )
Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation fence_n02_ipmi_start_0 (call=46, rc=0, cib-update=48,
confirmed=true) ok
Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=48, rc=0, cib-update=0, confirmed=true) ok
Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 10: demote drbd_r0_demote_0 on an-a04n01.alteeve.ca
(local)
Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 9: monitor fence_n02_ipmi_monitor_60000 on
an-a04n01.alteeve.ca (local)
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: role( Primary -> Secondary )
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: bitmap WRITE of 0 pages
took 0 jiffies
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: 0 KB (0 bits) marked
out-of-sync by on disk bit-map.
Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation drbd_r0_demote_0 (call=52, rc=0, cib-update=49, confirmed=true) ok
Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 76: notify drbd_r0_post_notify_demote_0 on
an-a04n01.alteeve.ca (local)
Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=57, rc=0, cib-update=0, confirmed=true) ok
Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 74: notify drbd_r0_pre_notify_stop_0 on
an-a04n01.alteeve.ca (local)
Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=60, rc=0, cib-update=0, confirmed=true) ok
Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 2: stop drbd_r0_stop_0 on an-a04n01.alteeve.ca (local)
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: conn( WFConnection ->
Disconnecting )
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: Discarding network
configuration.
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: Connection closed
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: conn( Disconnecting ->
StandAlone )
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: receiver terminated
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: Terminating drbd0_receiver
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: disk( UpToDate -> Failed )
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: bitmap WRITE of 0 pages
took 0 jiffies
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: 0 KB (0 bits) marked
out-of-sync by on disk bit-map.
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: disk( Failed -> Diskless )
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: drbd_bm_resize called
with capacity == 0
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: worker terminated
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: Terminating drbd0_worker
Jun 17 23:56:28 an-a04n01 attrd[28358]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd_r0 (<null>)
Jun 17 23:56:28 an-a04n01 attrd[28358]: notice: attrd_perform_update:
Sent delete 33: node=an-a04n01.alteeve.ca, attr=master-drbd_r0,
id=<n/a>, set=(null), section=status
Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation drbd_r0_stop_0 (call=63, rc=0, cib-update=50, confirmed=true) ok
Jun 17 23:56:43 an-a04n01 stonith-ng[28356]: notice: log_operation:
Operation 'reboot' [28771] (call 2 from stonith_admin.cman.28763) for
host 'an-a04n02.alteeve.ca' with device 'fence_n02_ipmi' returned: 0 (OK)
Jun 17 23:56:43 an-a04n01 stonith-ng[28356]: notice: remote_op_done:
Operation reboot of an-a04n02.alteeve.ca by an-a04n01.alteeve.ca for
stonith_admin.cman.28763 at an-a04n01.alteeve.ca.bbb6c5c4: OK
Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: tengine_stonith_notify:
Peer an-a04n02.alteeve.ca was terminated (reboot) by
an-a04n01.alteeve.ca for an-a04n01.alteeve.ca: OK
(ref=bbb6c5c4-d1a7-4df7-a8b0-e33f4ad74860) by client
stonith_admin.cman.28763
Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: tengine_stonith_notify:
Notified CMAN that 'an-a04n02.alteeve.ca' is now fenced
Jun 17 23:56:43 an-a04n01 fenced[28143]: fence an-a04n02.alteeve.ca success
Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation fence_n02_ipmi_monitor_60000 (call=54, rc=0, cib-update=54,
confirmed=false) ok
Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: run_graph: Transition 2
(Complete=19, Pending=0, Fired=0, Skipped=6, Incomplete=4,
Source=/var/lib/pacemaker/pengine/pe-input-154.bz2): Stopped
Jun 17 23:56:43 an-a04n01 pengine[28359]: notice: unpack_config: On
loss of CCM Quorum: Ignore
Jun 17 23:56:43 an-a04n01 pengine[28359]: warning: unpack_rsc_op:
Processing failed op promote for drbd_r0:0 on an-a04n01.alteeve.ca:
unknown error (1)
Jun 17 23:56:43 an-a04n01 pengine[28359]: notice: LogActions: Start
drbd_r0:0#011(an-a04n01.alteeve.ca)
Jun 17 23:56:43 an-a04n01 pengine[28359]: notice: LogActions: Start
lvm_n01_vg0:0#011(an-a04n01.alteeve.ca - blocked)
Jun 17 23:56:43 an-a04n01 pengine[28359]: notice: process_pe_message:
Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-155.bz2
Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 10: start drbd_r0_start_0 on an-a04n01.alteeve.ca (local)
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: Starting worker thread
(from cqueue [3274])
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: disk( Diskless ->
Attaching )
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: Found 4 transactions (126
active extents) in activity log.
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: Method to ensure write
ordering: flush
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: drbd_bm_resize called
with capacity == 909525832
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: resync bitmap:
bits=113690729 words=1776418 pages=3470
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: size = 434 GB (454762916 KB)
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: bitmap READ of 3470 pages
took 9 jiffies
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: recounting of set bits
took additional 16 jiffies
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: 0 KB (0 bits) marked
out-of-sync by on disk bit-map.
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: disk( Attaching ->
UpToDate ) pdsk( DUnknown -> Outdated )
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: attached to UUIDs
B704B7175D09E91D:C71081B1CBAFC620:F9F9DA52F6D93990:F9F8DA52F6D93991
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: conn( StandAlone ->
Unconnected )
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: Starting receiver thread
(from drbd0_worker [29023])
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: receiver (re)started
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: conn( Unconnected ->
WFConnection )
Jun 17 23:56:43 an-a04n01 attrd[28358]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd_r0 (10000)
Jun 17 23:56:43 an-a04n01 attrd[28358]: notice: attrd_perform_update:
Sent update 37: master-drbd_r0=10000
Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation drbd_r0_start_0 (call=67, rc=0, cib-update=56, confirmed=true) ok
Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 73: notify drbd_r0_post_notify_start_0 on
an-a04n01.alteeve.ca (local)
Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=70, rc=0, cib-update=0, confirmed=true) ok
Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: run_graph: Transition 3
(Complete=8, Pending=0, Fired=0, Skipped=1, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-155.bz2): Stopped
Jun 17 23:56:43 an-a04n01 pengine[28359]: notice: unpack_config: On
loss of CCM Quorum: Ignore
Jun 17 23:56:43 an-a04n01 pengine[28359]: warning: unpack_rsc_op:
Processing failed op promote for drbd_r0:0 on an-a04n01.alteeve.ca:
unknown error (1)
Jun 17 23:56:43 an-a04n01 pengine[28359]: notice: LogActions: Promote
drbd_r0:0#011(Slave -> Master an-a04n01.alteeve.ca)
Jun 17 23:56:43 an-a04n01 pengine[28359]: notice: LogActions: Start
lvm_n01_vg0:0#011(an-a04n01.alteeve.ca)
Jun 17 23:56:43 an-a04n01 pengine[28359]: notice: process_pe_message:
Calculated Transition 4: /var/lib/pacemaker/pengine/pe-input-156.bz2
Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 77: notify drbd_r0_pre_notify_promote_0 on
an-a04n01.alteeve.ca (local)
Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=73, rc=0, cib-update=0, confirmed=true) ok
Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 12: promote drbd_r0_promote_0 on an-a04n01.alteeve.ca
(local)
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: role( Secondary -> Primary )
Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation drbd_r0_promote_0 (call=76, rc=0, cib-update=58,
confirmed=true) ok
Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 78: notify drbd_r0_post_notify_promote_0 on
an-a04n01.alteeve.ca (local)
Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=79, rc=0, cib-update=0, confirmed=true) ok
Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 37: start lvm_n01_vg0_start_0 on an-a04n01.alteeve.ca
(local)
Jun 17 23:56:44 an-a04n01 LVM(lvm_n01_vg0)[29173]: INFO: Activating
volume group an-a04n01_vg0
Jun 17 23:56:44 an-a04n01 LVM(lvm_n01_vg0)[29173]: INFO: Reading all
physical volumes. This may take a while... Found volume group
"an-a04n01_vg0" using metadata type lvm2
Jun 17 23:56:44 an-a04n01 LVM(lvm_n01_vg0)[29173]: INFO: 1 logical
volume(s) in volume group "an-a04n01_vg0" now active
Jun 17 23:56:44 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation lvm_n01_vg0_start_0 (call=82, rc=0, cib-update=59,
confirmed=true) ok
Jun 17 23:56:44 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 79: notify lvm_n01_vg0_post_notify_start_0 on
an-a04n01.alteeve.ca (local)
Jun 17 23:56:44 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation lvm_n01_vg0_notify_0 (call=85, rc=0, cib-update=0,
confirmed=true) ok
Jun 17 23:56:44 an-a04n01 crmd[28360]: notice: te_rsc_command:
Initiating action 38: monitor lvm_n01_vg0_monitor_30000 on
an-a04n01.alteeve.ca (local)
Jun 17 23:56:44 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM
operation lvm_n01_vg0_monitor_30000 (call=88, rc=0, cib-update=60,
confirmed=false) ok
Jun 17 23:56:44 an-a04n01 crmd[28360]: notice: run_graph: Transition 4
(Complete=18, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-156.bz2): Complete
Jun 17 23:56:44 an-a04n01 crmd[28360]: notice: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
====
Logs from the always-fenced 'an-a04n02', starting with
'/etc/init.d/pacemaker start':
====
Jun 17 23:55:32 an-a04n02 kernel: DLM (built Apr 11 2014 17:28:07) installed
Jun 17 23:55:33 an-a04n02 corosync[7176]: [MAIN ] Corosync Cluster
Engine ('1.4.1'): started and ready to provide service.
Jun 17 23:55:33 an-a04n02 corosync[7176]: [MAIN ] Corosync built-in
features: nss dbus rdma snmp
Jun 17 23:55:33 an-a04n02 corosync[7176]: [MAIN ] Successfully read
config from /etc/cluster/cluster.conf
Jun 17 23:55:33 an-a04n02 corosync[7176]: [MAIN ] Successfully parsed
cman config
Jun 17 23:55:33 an-a04n02 corosync[7176]: [TOTEM ] Initializing
transport (UDP/IP Multicast).
Jun 17 23:55:33 an-a04n02 corosync[7176]: [TOTEM ] Initializing
transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jun 17 23:55:33 an-a04n02 corosync[7176]: [TOTEM ] The network
interface [10.20.40.2] is now up.
Jun 17 23:55:33 an-a04n02 corosync[7176]: [QUORUM] Using quorum
provider quorum_cman
Jun 17 23:55:33 an-a04n02 corosync[7176]: [SERV ] Service engine
loaded: corosync cluster quorum service v0.1
Jun 17 23:55:33 an-a04n02 corosync[7176]: [CMAN ] CMAN 3.0.12.1
(built Apr 3 2014 05:12:26) started
Jun 17 23:55:33 an-a04n02 corosync[7176]: [SERV ] Service engine
loaded: corosync CMAN membership service 2.90
Jun 17 23:55:33 an-a04n02 corosync[7176]: [SERV ] Service engine
loaded: openais checkpoint service B.01.01
Jun 17 23:55:33 an-a04n02 corosync[7176]: [SERV ] Service engine
loaded: corosync extended virtual synchrony service
Jun 17 23:55:33 an-a04n02 corosync[7176]: [SERV ] Service engine
loaded: corosync configuration service
Jun 17 23:55:33 an-a04n02 corosync[7176]: [SERV ] Service engine
loaded: corosync cluster closed process group service v1.01
Jun 17 23:55:33 an-a04n02 corosync[7176]: [SERV ] Service engine
loaded: corosync cluster config database access v1.01
Jun 17 23:55:33 an-a04n02 corosync[7176]: [SERV ] Service engine
loaded: corosync profile loading service
Jun 17 23:55:33 an-a04n02 corosync[7176]: [QUORUM] Using quorum
provider quorum_cman
Jun 17 23:55:33 an-a04n02 corosync[7176]: [SERV ] Service engine
loaded: corosync cluster quorum service v0.1
Jun 17 23:55:33 an-a04n02 corosync[7176]: [MAIN ] Compatibility mode
set to whitetank. Using V1 and V2 of the synchronization engine.
Jun 17 23:55:33 an-a04n02 corosync[7176]: [TOTEM ] A processor joined
or left the membership and a new membership was formed.
Jun 17 23:55:33 an-a04n02 corosync[7176]: [TOTEM ] A processor joined
or left the membership and a new membership was formed.
Jun 17 23:55:33 an-a04n02 corosync[7176]: [CMAN ] quorum regained,
resuming activity
Jun 17 23:55:33 an-a04n02 corosync[7176]: [QUORUM] This node is within
the primary component and will provide service.
Jun 17 23:55:33 an-a04n02 corosync[7176]: [QUORUM] Members[1]: 2
Jun 17 23:55:33 an-a04n02 corosync[7176]: [QUORUM] Members[1]: 2
Jun 17 23:55:33 an-a04n02 corosync[7176]: [QUORUM] Members[2]: 1 2
Jun 17 23:55:33 an-a04n02 corosync[7176]: [QUORUM] Members[2]: 1 2
Jun 17 23:55:33 an-a04n02 corosync[7176]: [CPG ] chosen downlist:
sender r(0) ip(10.20.40.1) ; members(old:1 left:0)
Jun 17 23:55:33 an-a04n02 corosync[7176]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 17 23:55:37 an-a04n02 fenced[7231]: fenced 3.0.12.1 started
Jun 17 23:55:37 an-a04n02 dlm_controld[7254]: dlm_controld 3.0.12.1 started
Jun 17 23:55:38 an-a04n02 gfs_controld[7306]: gfs_controld 3.0.12.1 started
Jun 17 23:55:39 an-a04n02 pacemaker: Attempting to start clvmd
Jun 17 23:55:40 an-a04n02 kernel: dlm: Using TCP for communications
Jun 17 23:55:40 an-a04n02 kernel: dlm: got connection from 1
Jun 17 23:55:41 an-a04n02 clvmd: Cluster LVM daemon started - connected
to CMAN
Jun 17 23:55:41 an-a04n02 pacemaker: Starting Pacemaker Cluster Manager
Jun 17 23:55:42 an-a04n02 pacemakerd[7437]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Jun 17 23:55:42 an-a04n02 pacemakerd[7437]: notice: main: Starting
Pacemaker 1.1.10-14.el6_5.3 (Build: 368c726): generated-manpages
agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc
nagios corosync-plugin cman
Jun 17 23:55:42 an-a04n02 lrmd[7445]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Jun 17 23:55:42 an-a04n02 stonith-ng[7444]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Jun 17 23:55:42 an-a04n02 cib[7443]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Jun 17 23:55:42 an-a04n02 crmd[7448]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Jun 17 23:55:42 an-a04n02 pengine[7447]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Jun 17 23:55:42 an-a04n02 attrd[7446]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Jun 17 23:55:42 an-a04n02 stonith-ng[7444]: notice:
crm_cluster_connect: Connecting to cluster infrastructure: cman
Jun 17 23:55:42 an-a04n02 crmd[7448]: notice: main: CRM Git Version:
368c726
Jun 17 23:55:42 an-a04n02 attrd[7446]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: cman
Jun 17 23:55:42 an-a04n02 attrd[7446]: notice: main: Starting mainloop...
Jun 17 23:55:42 an-a04n02 cib[7443]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: cman
Jun 17 23:55:43 an-a04n02 crmd[7448]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: cman
Jun 17 23:55:43 an-a04n02 crmd[7448]: notice: cman_event_callback:
Membership 276: quorum acquired
Jun 17 23:55:43 an-a04n02 crmd[7448]: notice: crm_update_peer_state:
cman_event_callback: Node an-a04n01.alteeve.ca[1] - state is now member
(was (null))
Jun 17 23:55:43 an-a04n02 crmd[7448]: notice: crm_update_peer_state:
cman_event_callback: Node an-a04n02.alteeve.ca[2] - state is now member
(was (null))
Jun 17 23:55:43 an-a04n02 stonith-ng[7444]: notice: setup_cib:
Watching for stonith topology changes
Jun 17 23:55:43 an-a04n02 crmd[7448]: notice: do_started: The local
CRM is operational
Jun 17 23:55:43 an-a04n02 crmd[7448]: notice: do_state_transition:
State transition S_STARTING -> S_PENDING [ input=I_PENDING
cause=C_FSA_INTERNAL origin=do_started ]
Jun 17 23:55:43 an-a04n02 stonith-ng[7444]: notice: unpack_config: On
loss of CCM Quorum: Ignore
Jun 17 23:55:44 an-a04n02 stonith-ng[7444]: notice:
stonith_device_register: Added 'fence_n01_ipmi' to the device list (1
active devices)
Jun 17 23:55:45 an-a04n02 stonith-ng[7444]: notice:
stonith_device_register: Added 'fence_n02_ipmi' to the device list (2
active devices)
Jun 17 23:56:04 an-a04n02 crmd[7448]: warning: do_log: FSA: Input
I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
Jun 17 23:56:04 an-a04n02 crmd[7448]: notice: do_state_transition:
State transition S_ELECTION -> S_PENDING [ input=I_PENDING
cause=C_FSA_INTERNAL origin=do_election_count_vote ]
Jun 17 23:56:04 an-a04n02 attrd[7446]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Jun 17 23:56:04 an-a04n02 crmd[7448]: notice: do_state_transition:
State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC
cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]
Jun 17 23:56:05 an-a04n02 LVM(lvm_n01_vg0)[7509]: WARNING: LVM Volume
an-a04n01_vg0 is not available (stopped)
Jun 17 23:56:05 an-a04n02 crmd[7448]: notice: process_lrm_event: LRM
operation lvm_n01_vg0_monitor_0 (call=20, rc=7, cib-update=11,
confirmed=true) not running
Jun 17 23:56:05 an-a04n02 crmd[7448]: notice: process_lrm_event: LRM
operation drbd_r0_monitor_0 (call=15, rc=7, cib-update=12,
confirmed=true) not running
Jun 17 23:56:05 an-a04n02 crmd[7448]: notice: process_lrm_event:
an-a04n02.alteeve.ca-drbd_r0_monitor_0:15 [ \n ]
Jun 17 23:56:05 an-a04n02 attrd[7446]: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
Jun 17 23:56:05 an-a04n02 attrd[7446]: notice: attrd_perform_update:
Sent update 5: probe_complete=true
Jun 17 23:56:06 an-a04n02 stonith-ng[7444]: notice:
stonith_device_register: Device 'fence_n02_ipmi' already existed in
device list (2 active devices)
Jun 17 23:56:06 an-a04n02 crmd[7448]: notice: process_lrm_event: LRM
operation fence_n02_ipmi_start_0 (call=25, rc=0, cib-update=13,
confirmed=true) ok
Jun 17 23:56:06 an-a04n02 crmd[7448]: notice: process_lrm_event: LRM
operation fence_n02_ipmi_monitor_60000 (call=30, rc=0, cib-update=14,
confirmed=false) ok
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: Starting worker thread
(from cqueue [3220])
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: disk( Diskless ->
Attaching )
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: Found 3 transactions (3
active extents) in activity log.
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: Method to ensure write
ordering: flush
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: drbd_bm_resize called
with capacity == 909525832
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: resync bitmap:
bits=113690729 words=1776418 pages=3470
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: size = 434 GB (454762916 KB)
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: bitmap READ of 3470 pages
took 8 jiffies
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: recounting of set bits
took additional 17 jiffies
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: 0 KB (0 bits) marked
out-of-sync by on disk bit-map.
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: disk( Attaching ->
Consistent )
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: attached to UUIDs
C71081B1CBAFC620:0000000000000000:F9F9DA52F6D93991:F9F8DA52F6D93991
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: conn( StandAlone ->
Unconnected )
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: Starting receiver thread
(from drbd0_worker [7613])
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: receiver (re)started
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: conn( Unconnected ->
WFConnection )
Jun 17 23:56:06 an-a04n02 attrd[7446]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd_r0 (5)
Jun 17 23:56:06 an-a04n02 attrd[7446]: notice: attrd_perform_update:
Sent update 8: master-drbd_r0=5
Jun 17 23:56:06 an-a04n02 crmd[7448]: notice: process_lrm_event: LRM
operation drbd_r0_start_0 (call=27, rc=0, cib-update=15, confirmed=true) ok
Jun 17 23:56:06 an-a04n02 attrd[7446]: notice: attrd_perform_update:
Sent update 10: master-drbd_r0=5
Jun 17 23:56:06 an-a04n02 crmd[7448]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=34, rc=0, cib-update=0, confirmed=true) ok
Jun 17 23:56:06 an-a04n02 crmd[7448]: notice: process_lrm_event: LRM
operation drbd_r0_notify_0 (call=37, rc=0, cib-update=0, confirmed=true) ok
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: helper command:
/sbin/drbdadm fence-peer minor-0
Jun 17 23:56:07 an-a04n02 kernel: block drbd0: Handshake successful:
Agreed network protocol version 97
Jun 17 23:56:07 an-a04n02 stonith-ng[7444]: notice:
can_fence_host_with_device: fence_n02_ipmi can fence
an-a04n02.alteeve.ca: static-list
Jun 17 23:56:07 an-a04n02 stonith-ng[7444]: notice:
can_fence_host_with_device: fence_n01_ipmi can not fence
an-a04n02.alteeve.ca: static-list
Jun 17 23:56:07 an-a04n02 stonith_admin[7726]: notice: crm_log_args:
Invoked: stonith_admin --fence an-a04n01.alteeve.ca
Jun 17 23:56:07 an-a04n02 stonith-ng[7444]: notice: handle_request:
Client stonith_admin.7726.0f660392 wants to fence (off)
'an-a04n01.alteeve.ca' with device '(any)'
Jun 17 23:56:07 an-a04n02 stonith-ng[7444]: notice:
initiate_remote_stonith_op: Initiating remote operation off for
an-a04n01.alteeve.ca: fd2fafff-174a-4744-b83c-e762c88ed12b (0)
Jun 17 23:56:07 an-a04n02 stonith-ng[7444]: notice:
can_fence_host_with_device: fence_n02_ipmi can not fence
an-a04n01.alteeve.ca: static-list
Jun 17 23:56:07 an-a04n02 stonith-ng[7444]: notice:
can_fence_host_with_device: fence_n01_ipmi can fence
an-a04n01.alteeve.ca: static-list
Jun 17 23:56:07 an-a04n02 stonith-ng[7444]: notice:
can_fence_host_with_device: fence_n02_ipmi can not fence
an-a04n01.alteeve.ca: static-list
Jun 17 23:56:07 an-a04n02 stonith-ng[7444]: notice:
can_fence_host_with_device: fence_n01_ipmi can fence
an-a04n01.alteeve.ca: static-list
Jun 17 23:56:08 an-a04n02 ntpd[2540]: 0.0.0.0 c612 02 freq_set kernel
16.841 PPM
Jun 17 23:56:08 an-a04n02 ntpd[2540]: 0.0.0.0 c615 05 clock_sync
====
Cluestick beatins welcomed...
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
More information about the Pacemaker
mailing list