[Pacemaker] pacemaker with cman and dbrd when primary node panics or poweroff

Gianluca Cecchi gianluca.cecchi at gmail.com
Fri Mar 7 19:31:39 EST 2014


so I fixed the problem regarding hostname in drbd.conf and in name
from cluster point of view.
ALso configured and verified fence_vmware agent and enabled stonith
Changed in drbd resource configuration

resource ovirt {
disk {
disk-flushes no;
md-flushes no;
fencing resource-and-stonith;
}
 device minor 0;
 disk /dev/sdb;
 syncer {
 rate 30M;
 verify-alg md5;
 }
 handlers {
 fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
 after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
 }

Put in cluster.conf
<cman expected_votes="1" two_node="1"/>
and restarted pacemaker and cman on nodes.

service active on ovirteng01
I provoke power off of ovirteng01. Fencing agent works ok on
ovirteng02 and reboots it.
I stop boot ofovirteng01 at grub prompt to simulate problem in boot
(for example system put in console mode due to filesystem problem)
In the mean time ovirteng02 becomes master of drbd resource, but
doesn't start the group
This in messages:

Mar  8 01:08:00 ovirteng02 kernel: drbd ovirt: PingAck did not arrive in time.
Mar  8 01:08:00 ovirteng02 kernel: drbd ovirt: peer( Primary ->
Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate ->
DUnknown )
Mar  8 01:08:00 ovirteng02 kernel: drbd ovirt: asender terminated
Mar  8 01:08:00 ovirteng02 kernel: drbd ovirt: Terminating drbd_a_ovirt
Mar  8 01:08:00 ovirteng02 kernel: drbd ovirt: Connection closed
Mar  8 01:08:00 ovirteng02 kernel: drbd ovirt: conn( NetworkFailure ->
Unconnected )
Mar  8 01:08:00 ovirteng02 kernel: drbd ovirt: receiver terminated
Mar  8 01:08:00 ovirteng02 kernel: drbd ovirt: Restarting receiver thread
Mar  8 01:08:00 ovirteng02 kernel: drbd ovirt: receiver (re)started
Mar  8 01:08:00 ovirteng02 kernel: drbd ovirt: conn( Unconnected ->
WFConnection )
Mar  8 01:08:02 ovirteng02 corosync[12908]:   [TOTEM ] A processor
failed, forming new configuration.
Mar  8 01:08:04 ovirteng02 corosync[12908]:   [QUORUM] Members[1]: 2
Mar  8 01:08:04 ovirteng02 corosync[12908]:   [TOTEM ] A processor
joined or left the membership and a new membership was formed.
Mar  8 01:08:04 ovirteng02 corosync[12908]:   [CPG   ] chosen
downlist: sender r(0) ip(192.168.33.46) ; members(old:2 left:1)
Mar  8 01:08:04 ovirteng02 corosync[12908]:   [MAIN  ] Completed
service synchronization, ready to provide service.
Mar  8 01:08:04 ovirteng02 kernel: dlm: closing connection to node 1
Mar  8 01:08:04 ovirteng02 crmd[13168]:   notice:
crm_update_peer_state: cman_event_callback: Node
ovirteng01.localdomain.local[1] - state is now lost (was member)
Mar  8 01:08:04 ovirteng02 crmd[13168]:  warning: reap_dead_nodes: Our
DC node (ovirteng01.localdomain.local) left the cluster
Mar  8 01:08:04 ovirteng02 crmd[13168]:   notice: do_state_transition:
State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION
cause=C_FSA_INTERNAL origin=reap_dead_nodes ]
Mar  8 01:08:04 ovirteng02 crmd[13168]:   notice: do_state_transition:
State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Mar  8 01:08:04 ovirteng02 fenced[12962]: fencing node
ovirteng01.localdomain.local
Mar  8 01:08:04 ovirteng02 attrd[13166]:   notice:
attrd_local_callback: Sending full refresh (origin=crmd)
Mar  8 01:08:04 ovirteng02 attrd[13166]:   notice:
attrd_trigger_update: Sending flush op to all hosts for:
master-OvirtData (10000)
Mar  8 01:08:04 ovirteng02 attrd[13166]:   notice:
attrd_trigger_update: Sending flush op to all hosts for:
probe_complete (true)
Mar  8 01:08:04 ovirteng02 fence_pcmk[13733]: Requesting Pacemaker
fence ovirteng01.localdomain.local (reset)
Mar  8 01:08:04 ovirteng02 stonith_admin[13734]:   notice:
crm_log_args: Invoked: stonith_admin --reboot
ovirteng01.localdomain.local --tolerance 5s --tag cman
Mar  8 01:08:04 ovirteng02 stonith-ng[13164]:   notice:
handle_request: Client stonith_admin.cman.13734.5528351f wants to
fence (reboot) 'ovirteng01.localdomain.local' with device '(any)'
Mar  8 01:08:04 ovirteng02 stonith-ng[13164]:   notice:
initiate_remote_stonith_op: Initiating remote operation reboot for
ovirteng01.localdomain.local: 1e70a341-efbf-470a-bcaa-886a8acfa9d1 (0)
Mar  8 01:08:04 ovirteng02 stonith-ng[13164]:   notice:
can_fence_host_with_device: Fencing can fence
ovirteng01.localdomain.local (aka. 'ovirteng01'): static-list
Mar  8 01:08:04 ovirteng02 stonith-ng[13164]:   notice:
can_fence_host_with_device: Fencing can fence
ovirteng01.localdomain.local (aka. 'ovirteng01'): static-list
Mar  8 01:08:05 ovirteng02 pengine[13167]:   notice: unpack_config: On
loss of CCM Quorum: Ignore
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: pe_fence_node:
Node ovirteng01.localdomain.local will be fenced because the node is
no longer part of the cluster
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning:
determine_online_status: Node ovirteng01.localdomain.local is unclean
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action OvirtData:0_demote_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action OvirtData:0_stop_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action OvirtData:0_demote_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action OvirtData:0_stop_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action OvirtData:0_demote_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action OvirtData:0_stop_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action OvirtData:0_demote_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action OvirtData:0_stop_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action ip_OvirtData_stop_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action ip_OvirtData_stop_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action lvm_ovirt_stop_0 on ovirteng01.localdomain.local is unrunnable
(offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action lvm_ovirt_stop_0 on ovirteng01.localdomain.local is unrunnable
(offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action fs_OvirtData_stop_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action fs_OvirtData_stop_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action pgsql_OvirtData_stop_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action pgsql_OvirtData_stop_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action p_lsb_nfs_stop_0 on ovirteng01.localdomain.local is unrunnable
(offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action p_lsb_nfs_stop_0 on ovirteng01.localdomain.local is unrunnable
(offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action p_exportfs_root_stop_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action p_exportfs_root_stop_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action p_exportfs_iso_stop_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action p_exportfs_iso_stop_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action ovirt-engine_stop_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action ovirt-engine_stop_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action ovirt-websocket-proxy_stop_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action ovirt-websocket-proxy_stop_0 on ovirteng01.localdomain.local is
unrunnable (offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action httpd_stop_0 on ovirteng01.localdomain.local is unrunnable
(offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action httpd_stop_0 on ovirteng01.localdomain.local is unrunnable
(offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: custom_action:
Action Fencing_stop_0 on ovirteng01.localdomain.local is unrunnable
(offline)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning: stage6:
Scheduling Node ovirteng01.localdomain.local for STONITH
Mar  8 01:08:05 ovirteng02 pengine[13167]:   notice: LogActions:
Demote  OvirtData:0#011(Master -> Stopped
ovirteng01.localdomain.local)
Mar  8 01:08:05 ovirteng02 pengine[13167]:   notice: LogActions:
Promote OvirtData:1#011(Slave -> Master ovirteng02.localdomain.local)
Mar  8 01:08:05 ovirteng02 pengine[13167]:   notice: LogActions: Stop
  ip_OvirtData#011(ovirteng01.localdomain.local)
Mar  8 01:08:05 ovirteng02 pengine[13167]:   notice: LogActions: Stop
  lvm_ovirt#011(ovirteng01.localdomain.local)
Mar  8 01:08:05 ovirteng02 pengine[13167]:   notice: LogActions: Stop
  fs_OvirtData#011(ovirteng01.localdomain.local)
Mar  8 01:08:05 ovirteng02 pengine[13167]:   notice: LogActions: Stop
  pgsql_OvirtData#011(ovirteng01.localdomain.local)
Mar  8 01:08:05 ovirteng02 pengine[13167]:   notice: LogActions: Stop
  p_lsb_nfs#011(ovirteng01.localdomain.local)
Mar  8 01:08:05 ovirteng02 pengine[13167]:   notice: LogActions: Stop
  p_exportfs_root#011(ovirteng01.localdomain.local)
Mar  8 01:08:05 ovirteng02 pengine[13167]:   notice: LogActions: Stop
  p_exportfs_iso#011(ovirteng01.localdomain.local)
Mar  8 01:08:05 ovirteng02 pengine[13167]:   notice: LogActions: Stop
  ovirt-engine#011(ovirteng01.localdomain.local)
Mar  8 01:08:05 ovirteng02 pengine[13167]:   notice: LogActions: Stop
  ovirt-websocket-proxy#011(ovirteng01.localdomain.local)
Mar  8 01:08:05 ovirteng02 pengine[13167]:   notice: LogActions: Stop
  httpd#011(ovirteng01.localdomain.local)
Mar  8 01:08:05 ovirteng02 pengine[13167]:   notice: LogActions: Move
  Fencing#011(Started ovirteng01.localdomain.local ->
ovirteng02.localdomain.local)
Mar  8 01:08:05 ovirteng02 pengine[13167]:  warning:
process_pe_message: Calculated Transition 0:
/var/lib/pacemaker/pengine/pe-warn-5.bz2
Mar  8 01:08:05 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 1: cancel OvirtData_cancel_31000 on
ovirteng02.localdomain.local (local)
Mar  8 01:08:05 ovirteng02 crmd[13168]:   notice: te_fence_node:
Executing reboot fencing operation (53) on
ovirteng01.localdomain.local (timeout=60000)
Mar  8 01:08:05 ovirteng02 stonith-ng[13164]:   notice:
handle_request: Client crmd.13168.426620a0 wants to fence (reboot)
'ovirteng01.localdomain.local' with device '(any)'
Mar  8 01:08:05 ovirteng02 stonith-ng[13164]:   notice:
merge_duplicates: Merging stonith action reboot for node
ovirteng01.localdomain.local originating from client
crmd.13168.3f0b1143 with identical request from
stonith_admin.cman.13734 at ovirteng02.localdomain.local.1e70a341 (144s)
Mar  8 01:08:05 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 75: notify OvirtData_pre_notify_demote_0 on
ovirteng02.localdomain.local (local)
Mar  8 01:08:05 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation OvirtData_notify_0 (call=82, rc=0, cib-update=0,
confirmed=true) ok
Mar  8 01:08:17 ovirteng02 stonith-ng[13164]:   notice: log_operation:
Operation 'reboot' [13736] (call 2 from stonith_admin.cman.13734) for
host 'ovirteng01.localdomain.local' with device 'Fencing' returned: 0
(OK)
Mar  8 01:08:17 ovirteng02 stonith-ng[13164]:   notice:
remote_op_done: Operation reboot of ovirteng01.localdomain.local by
ovirteng02.localdomain.local for
stonith_admin.cman.13734 at ovirteng02.localdomain.local.1e70a341: OK
Mar  8 01:08:17 ovirteng02 stonith-ng[13164]:   notice:
remote_op_done: Operation reboot of ovirteng01.localdomain.local by
ovirteng02.localdomain.local for
crmd.13168 at ovirteng02.localdomain.local.3f0b1143: OK
Mar  8 01:08:17 ovirteng02 crmd[13168]:   notice:
tengine_stonith_notify: Peer ovirteng01.localdomain.local was
terminated (reboot) by ovirteng02.localdomain.local for
ovirteng02.localdomain.local: OK
(ref=1e70a341-efbf-470a-bcaa-886a8acfa9d1) by client
stonith_admin.cman.13734
Mar  8 01:08:17 ovirteng02 crmd[13168]:   notice:
tengine_stonith_notify: Notified CMAN that
'ovirteng01.localdomain.local' is now fenced
Mar  8 01:08:17 ovirteng02 crmd[13168]:   notice:
tengine_stonith_callback: Stonith operation
2/53:0:0:c1041760-73fb-42e7-beda-7613fcf53fd6: OK (0)
Mar  8 01:08:17 ovirteng02 crmd[13168]:   notice:
tengine_stonith_notify: Peer ovirteng01.localdomain.local was
terminated (reboot) by ovirteng02.localdomain.local for
ovirteng02.localdomain.local: OK
(ref=3f0b1143-0250-45ca-ab28-0fb18394d124) by client crmd.13168
Mar  8 01:08:17 ovirteng02 crmd[13168]:   notice:
tengine_stonith_notify: Notified CMAN that
'ovirteng01.localdomain.local' is now fenced
Mar  8 01:08:17 ovirteng02 crmd[13168]:   notice: run_graph:
Transition 0 (Complete=5, Pending=0, Fired=0, Skipped=32,
Incomplete=13, Source=/var/lib/pacemaker/pengine/pe-warn-5.bz2):
Stopped
Mar  8 01:08:17 ovirteng02 fenced[12962]: fence
ovirteng01.localdomain.local success
Mar  8 01:08:17 ovirteng02 pengine[13167]:   notice: unpack_config: On
loss of CCM Quorum: Ignore
Mar  8 01:08:17 ovirteng02 pengine[13167]:   notice: LogActions:
Promote OvirtData:0#011(Slave -> Master ovirteng02.localdomain.local)
Mar  8 01:08:17 ovirteng02 pengine[13167]:   notice: LogActions: Start
  Fencing#011(ovirteng02.localdomain.local)
Mar  8 01:08:17 ovirteng02 pengine[13167]:   notice:
process_pe_message: Calculated Transition 1:
/var/lib/pacemaker/pengine/pe-input-1081.bz2
Mar  8 01:08:17 ovirteng02 crmd[13168]:  warning: destroy_action:
Cancelling timer for action 1 (src=50)
Mar  8 01:08:17 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 36: start Fencing_start_0 on
ovirteng02.localdomain.local (local)
Mar  8 01:08:17 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 57: notify OvirtData_pre_notify_promote_0 on
ovirteng02.localdomain.local (local)
Mar  8 01:08:17 ovirteng02 stonith-ng[13164]:   notice:
stonith_device_register: Device 'Fencing' already existed in device
list (1 active devices)
Mar  8 01:08:17 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation OvirtData_notify_0 (call=87, rc=0, cib-update=0,
confirmed=true) ok
Mar  8 01:08:17 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 6: promote OvirtData_promote_0 on
ovirteng02.localdomain.local (local)
Mar  8 01:08:17 ovirteng02 kernel: drbd ovirt: helper command:
/sbin/drbdadm fence-peer ovirt
Mar  8 01:08:17 ovirteng02 crm-fence-peer.sh[13817]: invoked for ovirt
Mar  8 01:08:17 ovirteng02 cibadmin[13848]:   notice: crm_log_args:
Invoked: cibadmin -C -o constraints -X <rsc_location
rsc="ms_OvirtData" id="drbd-fence-by-handler-ovirt-ms_OvirtData">#012
<rule role="Master" score="-INFINITY"
id="drbd-fence-by-handler-ovirt-rule-ms_OvirtData">#012    <expression
attribute="#uname" operation="ne" value="ovirteng02.localdomain.local"
id="drbd-fence-by-handler-ovirt-expr-ms_OvirtData"/>#012
</rule>#012</rsc_location>
Mar  8 01:08:17 ovirteng02 stonith-ng[13164]:   notice: unpack_config:
On loss of CCM Quorum: Ignore
Mar  8 01:08:17 ovirteng02 cib[13163]:   notice: cib:diff: Diff: --- 0.269.29
Mar  8 01:08:17 ovirteng02 cib[13163]:   notice: cib:diff: Diff: +++
0.270.1 128fad6a0899ee7020947394d4e75449
Mar  8 01:08:17 ovirteng02 cib[13163]:   notice: cib:diff: -- <cib
admin_epoch="0" epoch="269" num_updates="29"/>
Mar  8 01:08:17 ovirteng02 cib[13163]:   notice: cib:diff: ++
<rsc_location rsc="ms_OvirtData"
id="drbd-fence-by-handler-ovirt-ms_OvirtData">
Mar  8 01:08:17 ovirteng02 cib[13163]:   notice: cib:diff: ++
<rule role="Master" score="-INFINITY"
id="drbd-fence-by-handler-ovirt-rule-ms_OvirtData">
Mar  8 01:08:17 ovirteng02 cib[13163]:   notice: cib:diff: ++
 <expression attribute="#uname" operation="ne"
value="ovirteng02.localdomain.local"
id="drbd-fence-by-handler-ovirt-expr-ms_OvirtData"/>
Mar  8 01:08:17 ovirteng02 cib[13163]:   notice: cib:diff: ++         </rule>
Mar  8 01:08:17 ovirteng02 cib[13163]:   notice: cib:diff: ++
</rsc_location>
Mar  8 01:08:17 ovirteng02 crm-fence-peer.sh[13817]: INFO peer is
fenced, my disk is UpToDate: placed constraint
'drbd-fence-by-handler-ovirt-ms_OvirtData'
Mar  8 01:08:17 ovirteng02 kernel: drbd ovirt: helper command:
/sbin/drbdadm fence-peer ovirt exit code 7 (0x700)
Mar  8 01:08:17 ovirteng02 kernel: drbd ovirt: fence-peer helper
returned 7 (peer was stonithed)
Mar  8 01:08:17 ovirteng02 kernel: drbd ovirt: pdsk( DUnknown -> Outdated )
Mar  8 01:08:17 ovirteng02 kernel: block drbd0: role( Secondary -> Primary )
Mar  8 01:08:17 ovirteng02 kernel: block drbd0: new current UUID
588C3417F90691DB:8168E91059172F68:62F6D4ABA7053F86:62F5D4ABA7053F87
Mar  8 01:08:17 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation OvirtData_promote_0 (call=90, rc=0, cib-update=52,
confirmed=true) ok
Mar  8 01:08:17 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 58: notify OvirtData_post_notify_promote_0 on
ovirteng02.localdomain.local (local)
Mar  8 01:08:17 ovirteng02 stonith-ng[13164]:   notice:
stonith_device_register: Device 'Fencing' already existed in device
list (1 active devices)
Mar  8 01:08:17 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation OvirtData_notify_0 (call=93, rc=0, cib-update=0,
confirmed=true) ok
Mar  8 01:08:26 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation Fencing_start_0 (call=85, rc=0, cib-update=53,
confirmed=true) ok
Mar  8 01:08:26 ovirteng02 crmd[13168]:   notice: run_graph:
Transition 1 (Complete=10, Pending=0, Fired=0, Skipped=3,
Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-1081.bz2):
Stopped
Mar  8 01:08:26 ovirteng02 pengine[13167]:   notice: unpack_config: On
loss of CCM Quorum: Ignore
Mar  8 01:08:26 ovirteng02 pengine[13167]:   notice:
process_pe_message: Calculated Transition 2:
/var/lib/pacemaker/pengine/pe-input-1082.bz2
Mar  8 01:08:26 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 8: monitor OvirtData_monitor_29000 on
ovirteng02.localdomain.local (local)
Mar  8 01:08:26 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 39: monitor Fencing_monitor_600000 on
ovirteng02.localdomain.local (local)
Mar  8 01:08:26 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation OvirtData_monitor_29000 (call=97, rc=8, cib-update=55,
confirmed=false) master
Mar  8 01:08:26 ovirteng02 crmd[13168]:   notice: process_lrm_event:
ovirteng02.localdomain.local-OvirtData_monitor_29000:97 [ \n ]
Mar  8 01:08:33 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation Fencing_monitor_600000 (call=99, rc=0, cib-update=56,
confirmed=false) ok
Mar  8 01:08:33 ovirteng02 crmd[13168]:   notice: run_graph:
Transition 2 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-1082.bz2): Complete
Mar  8 01:08:33 ovirteng02 crmd[13168]:   notice: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]

situation remains:
Last updated: Sat Mar  8 01:08:33 2014
Last change: Sat Mar  8 01:08:17 2014 via cibadmin on
ovirteng02.localdomain.local
Stack: cman
Current DC: ovirteng02.localdomain.local - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured
13 Resources configured


Online: [ ovirteng02.localdomain.local ]
OFFLINE: [ ovirteng01.localdomain.local ]

 Master/Slave Set: ms_OvirtData [OvirtData]
     Masters: [ ovirteng02.localdomain.local ]
     Stopped: [ ovirteng01.localdomain.local ]
Fencing (stonith:fence_vmware): Started ovirteng02.localdomain.local


I have to manually run (ovirt is the name of my group)
# pcs resource clear ovirt

and I suddenly get
# crm_mon -1
Last updated: Sat Mar  8 01:19:52 2014
Last change: Sat Mar  8 01:19:18 2014 via crm_resource on
ovirteng02.localdomain.local
Stack: cman
Current DC: ovirteng02.localdomain.local - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured
13 Resources configured


Online: [ ovirteng02.localdomain.local ]
OFFLINE: [ ovirteng01.localdomain.local ]

 Master/Slave Set: ms_OvirtData [OvirtData]
     Masters: [ ovirteng02.localdomain.local ]
     Stopped: [ ovirteng01.localdomain.local ]
 Resource Group: ovirt
     ip_OvirtData       (ocf::heartbeat:IPaddr2):       Started
ovirteng02.localdomain.local
     lvm_ovirt  (ocf::heartbeat:LVM):   Started ovirteng02.localdomain.local
     fs_OvirtData       (ocf::heartbeat:Filesystem):    Started
ovirteng02.localdomain.local
     pgsql_OvirtData    (lsb:postgresql):       Started
ovirteng02.localdomain.local
     p_lsb_nfs  (lsb:nfs):      Started ovirteng02.localdomain.local
     p_exportfs_root    (ocf::heartbeat:exportfs):      Started
ovirteng02.localdomain.local
     p_exportfs_iso     (ocf::heartbeat:exportfs):      Started
ovirteng02.localdomain.local
     ovirt-engine       (lsb:ovirt-engine):     Started
ovirteng02.localdomain.local
     ovirt-websocket-proxy      (lsb:ovirt-websocket-proxy):
Started ovirteng02.localdomain.local
     httpd      (ocf::heartbeat:apache):        Started
ovirteng02.localdomain.local
 Fencing        (stonith:fence_vmware): Started ovirteng02.localdomain.local

So where I'm doing wrong that group doesn't automatically start on ovirteng02?

These are the lines right after the clear command on ovirteng02
(ovirteng01 is still at grub prompt)

latest output
Mar  8 01:08:33 ovirteng02 crmd[13168]:   notice: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ i
nput=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]

after clear
Mar  8 01:19:18 ovirteng02 crmd[13168]:   notice: do_state_transition:
State transition S_IDLE -> S_POLICY_ENGINE [ input
=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Mar  8 01:19:18 ovirteng02 stonith-ng[13164]:   notice: unpack_config:
On loss of CCM Quorum: Ignore
Mar  8 01:19:18 ovirteng02 cib[13163]:   notice: cib:diff: Diff: --- 0.270.5
Mar  8 01:19:18 ovirteng02 cib[13163]:   notice: cib:diff: Diff: +++
0.271.1 936dd803304ac6abd83dd63717139bb7
Mar  8 01:19:18 ovirteng02 cib[13163]:   notice: cib:diff: --
<rsc_location id="cli-ban-ovirt-on-ovirteng02.localdo
main.local" rsc="ovirt" role="Started"
node="ovirteng02.localdomain.local" score="-INFINITY"/>
Mar  8 01:19:18 ovirteng02 cib[13163]:   notice: cib:diff: ++ <cib
admin_epoch="0" cib-last-written="Sat Mar  8 01:19:18
2014" crm_feature_set="3.0.7" epoch="271" have-quorum="1"
num_updates="1" update-client="crm_resource" update-origin="ovi
rteng02.localdomain.local" validate-with="pacemaker-1.2"
dc-uuid="ovirteng02.localdomain.local"/>
Mar  8 01:19:18 ovirteng02 pengine[13167]:   notice: unpack_config: On
loss of CCM Quorum: Ignore
Mar  8 01:19:18 ovirteng02 pengine[13167]:   notice: LogActions: Start
  ip_OvirtData#011(ovirteng02.localdomain.local)
Mar  8 01:19:18 ovirteng02 pengine[13167]:   notice: LogActions: Start
  lvm_ovirt#011(ovirteng02.localdomain.local)
Mar  8 01:19:18 ovirteng02 pengine[13167]:   notice: LogActions: Start
  fs_OvirtData#011(ovirteng02.localdomain.local)
Mar  8 01:19:18 ovirteng02 pengine[13167]:   notice: LogActions: Start
  pgsql_OvirtData#011(ovirteng02.localdomain.local
)
Mar  8 01:19:18 ovirteng02 pengine[13167]:   notice: LogActions: Start
  p_lsb_nfs#011(ovirteng02.localdomain.local)
Mar  8 01:19:18 ovirteng02 pengine[13167]:   notice: LogActions: Start
  p_exportfs_root#011(ovirteng02.localdomain.local)
Mar  8 01:19:18 ovirteng02 pengine[13167]:   notice: LogActions: Start
  p_exportfs_iso#011(ovirteng02.localdomain.local)
Mar  8 01:19:18 ovirteng02 pengine[13167]:   notice: LogActions: Start
  ovirt-engine#011(ovirteng02.localdomain.local)
Mar  8 01:19:18 ovirteng02 pengine[13167]:   notice: LogActions: Start
  ovirt-websocket-proxy#011(ovirteng02.localdomain.local)
Mar  8 01:19:18 ovirteng02 pengine[13167]:   notice: LogActions: Start
  httpd#011(ovirteng02.localdomain.local)
Mar  8 01:19:18 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 34: start ip_OvirtData_start_0 on
ovirteng02.localdomain.local (local)
Mar  8 01:19:18 ovirteng02 pengine[13167]:   notice:
process_pe_message: Calculated Transition 3:
/var/lib/pacemaker/pengine/pe-input-1083.bz2
Mar  8 01:19:18 ovirteng02 stonith-ng[13164]:   notice:
stonith_device_register: Device 'Fencing' already existed in device
list (1 active devices)
Mar  8 01:19:18 ovirteng02 IPaddr2(ip_OvirtData)[14528]: INFO: Adding
inet address 192.168.33.47/24 with broadcast address 192.168.33.255 to
device eth0
Mar  8 01:19:18 ovirteng02 IPaddr2(ip_OvirtData)[14528]: INFO:
Bringing device eth0 up
Mar  8 01:19:18 ovirteng02 IPaddr2(ip_OvirtData)[14528]: INFO:
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
/var/run/resource-agents/send_arp-192.168.33.47 eth0 192.168.33.47
auto not_used not_used
Mar  8 01:19:18 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation ip_OvirtData_start_0 (call=103, rc=0, cib-update=58,
confirmed=true) ok
Mar  8 01:19:18 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 35: monitor ip_OvirtData_monitor_60000 on
ovirteng02.localdomain.local (local)
Mar  8 01:19:18 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 36: start lvm_ovirt_start_0 on
ovirteng02.localdomain.local (local)
Mar  8 01:19:18 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation ip_OvirtData_monitor_60000 (call=106, rc=0,
cib-update=59, confirmed=false) ok
Mar  8 01:19:19 ovirteng02 LVM(lvm_ovirt)[14599]: INFO: Activating
volume group VG_OVIRT
Mar  8 01:19:19 ovirteng02 LVM(lvm_ovirt)[14599]: INFO: Reading all
physical volumes. This may take a while... Found volume group "rootvg"
using metadata type lvm2 Found volume group "VG_OVIRT" using metadata
type lvm2
Mar  8 01:19:19 ovirteng02 LVM(lvm_ovirt)[14599]: INFO: 1 logical
volume(s) in volume group "VG_OVIRT" now active
Mar  8 01:19:19 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation lvm_ovirt_start_0 (call=108, rc=0, cib-update=60,
confirmed=true) ok
Mar  8 01:19:19 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 37: monitor lvm_ovirt_monitor_60000 on
ovirteng02.localdomain.local (local)
Mar  8 01:19:19 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 38: start fs_OvirtData_start_0 on
ovirteng02.localdomain.local (local)
Mar  8 01:19:19 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation lvm_ovirt_monitor_60000 (call=112, rc=
0, cib-update=61, confirmed=false) ok
Mar  8 01:19:19 ovirteng02 Filesystem(fs_OvirtData)[14693]: INFO:
Running start for /dev/VG_OVIRT/LV_OVIRT on /shared
Mar  8 01:19:19 ovirteng02 kernel: EXT4-fs (dm-5): warning: maximal
mount count reached, running e2fsck is recommended
Mar  8 01:19:19 ovirteng02 kernel: EXT4-fs (dm-5): 1 orphan inode deleted
Mar  8 01:19:19 ovirteng02 kernel: EXT4-fs (dm-5): recovery complete
Mar  8 01:19:19 ovirteng02 kernel: EXT4-fs (dm-5): mounted filesystem
with ordered data mode. Opts:
Mar  8 01:19:19 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation fs_OvirtData_start_0 (call=114, rc=0, cib-update=62,
confirmed=true) ok
Mar  8 01:19:19 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 39: monitor fs_OvirtData_monitor_60000 on
ovirteng02.localdomain.local (local)
Mar  8 01:19:19 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 40: start pgsql_OvirtData_start_0 on
ovirteng02.localdomain.local (local)
Mar  8 01:19:19 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation fs_OvirtData_monitor_60000 (call=118, rc=0,
cib-update=63, confirmed=false) ok
Mar  8 01:19:21 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation pgsql_OvirtData_start_0 (call=120, rc=0, cib-update=64,
confirmed=true) ok
Mar  8 01:19:21 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 41: monitor pgsql_OvirtData_monitor_30000 on
ovirteng02.localdomain.local (local)
Mar  8 01:19:21 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 42: start p_lsb_nfs_start_0 on
ovirteng02.localdomain.local (local)
Mar  8 01:19:21 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation pgsql_OvirtData_monitor_30000 (call=124, rc=0,
cib-update=65, confirmed=false) ok
Mar  8 01:19:21 ovirteng02 rpc.mountd[14884]: Version 1.2.3 starting
Mar  8 01:19:22 ovirteng02 kernel: NFSD: Using /var/lib/nfs/v4recovery
as the NFSv4 state recovery directory
Mar  8 01:19:22 ovirteng02 kernel: NFSD: starting 90-second grace period
Mar  8 01:19:22 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation p_lsb_nfs_start_0 (call=126, rc=0, cib-update=66,
confirmed=true) ok
Mar  8 01:19:22 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 43: monitor p_lsb_nfs_monitor_30000 o
n ovirteng02.localdomain.local (local)
Mar  8 01:19:22 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 44: start p_exportfs_root_start_0 on
ovirteng02.localdomain.local (local)
Mar  8 01:19:22 ovirteng02 exportfs(p_exportfs_root)[14925]: INFO:
Directory /shared/var/lib/exports is not exported to 0.0.0.0/0.0.0.0
(stopped).
Mar  8 01:19:22 ovirteng02 exportfs(p_exportfs_root)[14925]: INFO:
Exporting file system ...
Mar  8 01:19:22 ovirteng02 exportfs(p_exportfs_root)[14925]: INFO:
exporting 0.0.0.0/0.0.0.0:/shared/var/lib/exports
Mar  8 01:19:22 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation p_lsb_nfs_monitor_30000 (call=130, rc=0, cib-update=67,
confirmed=false) ok
Mar  8 01:19:22 ovirteng02 exportfs(p_exportfs_root)[14925]: INFO:
File system exported
Mar  8 01:19:22 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation p_exportfs_root_start_0 (call=132, rc=0, cib-update=68,
confirmed=true) ok
Mar  8 01:19:22 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 45: monitor p_exportfs_root_monitor_30000 on
ovirteng02.localdomain.local (local)
Mar  8 01:19:22 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 46: start p_exportfs_iso_start_0 on
ovirteng02.localdomain.local (local)
Mar  8 01:19:22 ovirteng02 exportfs(p_exportfs_root)[14990]: INFO:
Directory /shared/var/lib/exports is exported to 0.0.0.0/0.0.0.0
(started).
Mar  8 01:19:22 ovirteng02 exportfs(p_exportfs_iso)[14991]: INFO:
Directory /shared/var/lib/exports/iso is not exported to
0.0.0.0/0.0.0.0 (stopped).
Mar  8 01:19:22 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation p_exportfs_root_monitor_30000 (call=136, rc=0,
cib-update=69, confirmed=false) ok
Mar  8 01:19:22 ovirteng02 exportfs(p_exportfs_iso)[14991]: INFO:
Exporting file system ...
Mar  8 01:19:22 ovirteng02 exportfs(p_exportfs_iso)[14991]: INFO:
exporting 0.0.0.0/0.0.0.0:/shared/var/lib/exports/iso
Mar  8 01:19:22 ovirteng02 exportfs(p_exportfs_iso)[14991]: INFO: File
system exported
Mar  8 01:19:22 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation p_exportfs_iso_start_0 (call=138, rc=0, cib-update=70,
confirmed=true) ok
Mar  8 01:19:22 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 47: monitor p_exportfs_iso_monitor_30
000 on ovirteng02.localdomain.local (local)
Mar  8 01:19:22 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 48: start ovirt-engine_start_0 on
ovirteng02.localdomain.local (local)
Mar  8 01:19:22 ovirteng02 exportfs(p_exportfs_iso)[15034]: INFO:
Directory /shared/var/lib/exports/iso is exported to 0.0.0.0/0.0.0.0
(started).
Mar  8 01:19:22 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation p_exportfs_iso_monitor_30000 (call=142, rc=0,
cib-update=71, confirmed=false) ok
Mar  8 01:19:27 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation ovirt-engine_start_0 (call=144, rc=0, cib-update=72,
confirmed=true) ok
Mar  8 01:19:27 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 49: monitor ovirt-engine_monitor_300000 on
ovirteng02.localdomain.local (local)
Mar  8 01:19:27 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 50: start ovirt-websocket-proxy_start_0 on
ovirteng02.localdomain.local (local)
Mar  8 01:19:27 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation ovirt-engine_monitor_300000 (call=148, rc=0,
cib-update=73, confirmed=false) ok
Mar  8 01:19:33 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation ovirt-websocket-proxy_start_0 (call=150, rc=0,
cib-update=74, confirmed=true) ok
Mar  8 01:19:33 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 51: monitor ovirt-websocket-proxy_monitor_30000 on
ovirteng02.localdomain.local (local)
Mar  8 01:19:33 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 52: start httpd_start_0 on
ovirteng02.localdomain.local (local)
Mar  8 01:19:33 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation ovirt-websocket-proxy_monitor_30000 (call=154, rc=0,
cib-update=75, confirmed=false) ok
Mar  8 01:19:33 ovirteng02 apache(httpd)[15268]: INFO: apache not running
Mar  8 01:19:33 ovirteng02 apache(httpd)[15268]: INFO: waiting for
apache /etc/httpd/conf/httpd.conf to come up
Mar  8 01:19:35 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation httpd_start_0 (call=156, rc=0, cib-update=76,
confirmed=true) ok
Mar  8 01:19:35 ovirteng02 crmd[13168]:   notice: te_rsc_command:
Initiating action 53: monitor httpd_monitor_5000 on ovi
rteng02.localdomain.local (local)
Mar  8 01:19:35 ovirteng02 crmd[13168]:   notice: process_lrm_event:
LRM operation httpd_monitor_5000 (call=160, rc=0, cib-update=77,
confirmed=false) ok
Mar  8 01:19:35 ovirteng02 crmd[13168]:   notice: run_graph:
Transition 3 (Complete=22, Pending=0, Fired=0, Skipped=0,
Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-1083.bz2):
Complete
Mar  8 01:19:35 ovirteng02 crmd[13168]:   notice: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Mar  8 01:19:52 ovirteng02 exportfs(p_exportfs_root)[15621]: INFO:
Directory /shared/var/lib/exports is exported to 0.0.0.0/0.0.0.0
(started).
Mar  8 01:19:52 ovirteng02 exportfs(p_exportfs_iso)[15639]: INFO:
Directory /shared/var/lib/exports/iso is exported to 0.0.0.0/0.0.0.0
(started).
Mar  8 01:20:22 ovirteng02 exportfs(p_exportfs_root)[16072]: INFO:
Directory /shared/var/lib/exports is exported to 0.0.0.0/0.0.0.0
(started).
Mar  8 01:20:22 ovirteng02 exportfs(p_exportfs_iso)[16094]: INFO:
Directory /shared/var/lib/exports/iso is exported to 0.0.0.0/0.0.0.0
(started).
Mar  8 01:20:52 ovirteng02 exportfs(p_exportfs_root)[16444]: INFO:
Directory /shared/var/lib/exports is exported to 0.0.0.0/0.0.0.0
(started).
Mar  8 01:20:52 ovirteng02 exportfs(p_exportfs_iso)[16455]: INFO:
Directory /shared/var/lib/exports/iso is exported to 0.0.0.0/0.0.0.0
(started).

Thanks in advance for any clarification.

Gianluca




More information about the Pacemaker mailing list