[Pacemaker] Two node cluster and no hardware device for stonith.
Andrea
a.bacchi at codices.com
Fri Jan 30 11:38:20 UTC 2015
Andrea <a.bacchi at ...> writes:
>
> Sorry, I used wrong device id.
> Now, with the correct device id, I see 2 key reserved
>
> [ONE] sg_persist -n --read-keys
> --device=/dev/disk/by-id/scsi-36e843b60f3d0cc6d1a11d4ff0da95cd8
> PR generation=0x4, 2 registered reservation keys follow:
> 0x4d5a0001
> 0x4d5a0002
>
> Tomorrow i will do some test for fencing...
>
some news
If I try to fence serverHA2 with this command:
[ONE]pcs stonith fence serverHA2
I obtain that all seem to be ok, but serverHA2 freeze,
below the log from each node (on serverHA2 after loggin these lines, freeze)
The servers are 2 vmware virtual machine (I ask for an account on esx server
to test fence_vmware, I'm waiting response)
log serverHA1
Jan 30 12:13:02 [2510] serverHA1 stonith-ng: notice: handle_request: Client
stonith_admin.1907.b13e0290 wants to fence (reboot) 'serverHA2' with device
'(any)'
Jan 30 12:13:02 [2510] serverHA1 stonith-ng: notice:
initiate_remote_stonith_op: Initiating remote operation reboot for
serverHA2: 70b75107-8919-4510-9c6c-7cc65e6a00a6 (0)
Jan 30 12:13:02 [2510] serverHA1 stonith-ng: notice:
can_fence_host_with_device: iscsi-stonith-device can fence (reboot)
serverHA2: static-list
Jan 30 12:13:02 [2510] serverHA1 stonith-ng: info:
process_remote_stonith_query: Query result 1 of 2 from serverHA1 for
serverHA2/reboot (1 devices) 70b75107-8919-4510-9c6c-7cc65e6a00a6
Jan 30 12:13:02 [2510] serverHA1 stonith-ng: info: call_remote_stonith:
Total remote op timeout set to 120 for fencing of node serverHA2 for
stonith_admin.1907.70b75107
Jan 30 12:13:02 [2510] serverHA1 stonith-ng: info: call_remote_stonith:
Requesting that serverHA1 perform op reboot serverHA2 for stonith_admin.1907
(144s)
Jan 30 12:13:02 [2510] serverHA1 stonith-ng: notice:
can_fence_host_with_device: iscsi-stonith-device can fence (reboot)
serverHA2: static-list
Jan 30 12:13:02 [2510] serverHA1 stonith-ng: info:
stonith_fence_get_devices_cb: Found 1 matching devices for 'serverHA2'
Jan 30 12:13:02 [2510] serverHA1 stonith-ng: warning: stonith_device_execute:
Agent 'fence_scsi' does not advertise support for 'reboot', performing 'off'
action instead
Jan 30 12:13:02 [2510] serverHA1 stonith-ng: info:
process_remote_stonith_query: Query result 2 of 2 from serverHA2 for
serverHA2/reboot (1 devices) 70b75107-8919-4510-9c6c-7cc65e6a00a6
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: notice: log_operation:
Operation 'reboot' [1908] (call 2 from stonith_admin.1907) for host 'serverHA2'
with device 'iscsi-stonith-device' returned: 0 (OK)
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: warning: get_xpath_object:
No match for //@st_delegate in /st-reply
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: notice: remote_op_done:
Operation reboot of serverHA2 by serverHA1 for
stonith_admin.1907 at serverHA1.70b75107: OK
Jan 30 12:13:03 [2514] serverHA1 crmd: notice: tengine_stonith_notify:
Peer serverHA2 was terminated (reboot) by serverHA1 for serverHA1: OK
(ref=70b75107-8919-4510-9c6c-7cc65e6a00a6) by client stonith_admin.1907
Jan 30 12:13:03 [2514] serverHA1 crmd: notice: tengine_stonith_notify:
Notified CMAN that 'serverHA2' is now fenced
Jan 30 12:13:03 [2514] serverHA1 crmd: info: crm_update_peer_join:
crmd_peer_down: Node serverHA2[2] - join-2 phase 4 -> 0
Jan 30 12:13:03 [2514] serverHA1 crmd: info:
crm_update_peer_expected: crmd_peer_down: Node serverHA2[2] - expected
state is now down (was member)
Jan 30 12:13:03 [2514] serverHA1 crmd: info: erase_status_tag:
Deleting xpath: //node_state[@uname='serverHA2']/lrm
Jan 30 12:13:03 [2514] serverHA1 crmd: info: erase_status_tag:
Deleting xpath: //node_state[@uname='serverHA2']/transient_attributes
Jan 30 12:13:03 [2514] serverHA1 crmd: info: tengine_stonith_notify:
External fencing operation from stonith_admin.1907 fenced serverHA2
Jan 30 12:13:03 [2514] serverHA1 crmd: info: abort_transition_graph:
Transition aborted: External Fencing Operation
(source=tengine_stonith_notify:248, 1)
Jan 30 12:13:03 [2514] serverHA1 crmd: notice: do_state_transition:
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jan 30 12:13:03 [2514] serverHA1 crmd: warning: do_state_transition:
Only 1 of 2 cluster nodes are eligible to run resources - continue 0
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_process_request:
Forwarding cib_modify operation for section status to master
(origin=local/crmd/333)
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_process_request:
Forwarding cib_delete operation for section
//node_state[@uname='serverHA2']/lrm to master (origin=local/crmd/334)
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_process_request:
Forwarding cib_delete operation for section
//node_state[@uname='serverHA2']/transient_attributes to master
(origin=local/crmd/335)
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: Diff:
--- 0.51.86 2
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: Diff:
+++ 0.51.87 (null)
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: +
/cib: @num_updates=87
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: +
/cib/status/node_state[@id='serverHA2']:
@crm-debug-origin=send_stonith_update, @join=down, @expected=down
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_process_request:
Completed cib_modify operation for section status: OK (rc=0,
origin=serverHA1/crmd/333, version=0.51.87)
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: Diff:
--- 0.51.87 2
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: Diff:
+++ 0.51.88 (null)
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: --
/cib/status/node_state[@id='serverHA2']/lrm[@id='serverHA2']
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: +
/cib: @num_updates=88
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_process_request:
Completed cib_delete operation for section
//node_state[@uname='serverHA2']/lrm: OK (rc=0, origin=serverHA1/crmd/334,
version=0.51.88)
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: Diff:
--- 0.51.88 2
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: Diff:
+++ 0.51.89 (null)
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: --
/cib/status/node_state[@id='serverHA2']/transient_attributes[@id='serverHA2']
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: +
/cib: @num_updates=89
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_process_request:
Completed cib_delete operation for section
//node_state[@uname='serverHA2']/transient_attributes: OK (rc=0,
origin=serverHA1/crmd/335, version=0.51.89)
Jan 30 12:13:03 [2514] serverHA1 crmd: info: cib_fencing_updated:
Fencing update 333 for serverHA2: complete
Jan 30 12:13:03 [2514] serverHA1 crmd: info: abort_transition_graph:
Transition aborted by deletion of lrm[@id='serverHA2']: Resource state removal
(cib=0.51.88, source=te_update_diff:429,
path=/cib/status/node_state[@id='serverHA2']/lrm[@id='serverHA2'], 1)
Jan 30 12:13:03 [2514] serverHA1 crmd: info: abort_transition_graph:
Transition aborted by deletion of transient_attributes[@id='serverHA2']:
Transient attribute change (cib=0.51.89, source=te_update_diff:391,
path=/cib/status/node_state[@id='serverHA2']/transient_attributes[@id='serverHA2
'], 1)
Jan 30 12:13:03 [2513] serverHA1 pengine: info: process_pe_message:
Input has not changed since last time, not saving to disk
Jan 30 12:13:03 [2513] serverHA1 pengine: notice: unpack_config: On loss
of CCM Quorum: Ignore
Jan 30 12:13:03 [2513] serverHA1 pengine: info:
determine_online_status_fencing: Node serverHA2 is active
Jan 30 12:13:03 [2513] serverHA1 pengine: info: determine_online_status:
Node serverHA2 is online
Jan 30 12:13:03 [2513] serverHA1 pengine: info:
determine_online_status_fencing: Node serverHA1 is active
Jan 30 12:13:03 [2513] serverHA1 pengine: info: determine_online_status:
Node serverHA1 is online
Jan 30 12:13:03 [2513] serverHA1 pengine: info: clone_print: Clone
Set: ping-clone [ping]
Jan 30 12:13:03 [2513] serverHA1 pengine: info: short_print:
Started: [ serverHA1 serverHA2 ]
Jan 30 12:13:03 [2513] serverHA1 pengine: info: clone_print: Clone
Set: clusterfs-clone [clusterfs]
Jan 30 12:13:03 [2513] serverHA1 pengine: info: short_print:
Started: [ serverHA1 serverHA2 ]
Jan 30 12:13:03 [2513] serverHA1 pengine: info: native_print:
iscsi-stonith-device (stonith:fence_scsi): Started serverHA1
Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave
ping:0 (Started serverHA2)
Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave
ping:1 (Started serverHA1)
Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave
clusterfs:0 (Started serverHA2)
Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave
clusterfs:1 (Started serverHA1)
Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave
iscsi-stonith-device (Started serverHA1)
Jan 30 12:13:03 [2514] serverHA1 crmd: info: handle_response:
pe_calc calculation pe_calc-dc-1422616383-286 is obsolete
Jan 30 12:13:03 [2513] serverHA1 pengine: notice: process_pe_message:
Calculated Transition 189: /var/lib/pacemaker/pengine/pe-input-145.bz2
Jan 30 12:13:03 [2513] serverHA1 pengine: notice: unpack_config: On loss
of CCM Quorum: Ignore
Jan 30 12:13:03 [2513] serverHA1 pengine: info:
determine_online_status_fencing: - Node serverHA2 is not ready to run
resources
Jan 30 12:13:03 [2513] serverHA1 pengine: info: determine_online_status:
Node serverHA2 is pending
Jan 30 12:13:03 [2513] serverHA1 pengine: info:
determine_online_status_fencing: Node serverHA1 is active
Jan 30 12:13:03 [2513] serverHA1 pengine: info: determine_online_status:
Node serverHA1 is online
Jan 30 12:13:03 [2513] serverHA1 pengine: info: clone_print: Clone
Set: ping-clone [ping]
Jan 30 12:13:03 [2513] serverHA1 pengine: info: short_print:
Started: [ serverHA1 ]
Jan 30 12:13:03 [2513] serverHA1 pengine: info: short_print:
Stopped: [ serverHA2 ]
Jan 30 12:13:03 [2513] serverHA1 pengine: info: clone_print: Clone
Set: clusterfs-clone [clusterfs]
Jan 30 12:13:03 [2513] serverHA1 pengine: info: short_print:
Started: [ serverHA1 ]
Jan 30 12:13:03 [2513] serverHA1 pengine: info: short_print:
Stopped: [ serverHA2 ]
Jan 30 12:13:03 [2513] serverHA1 pengine: info: native_print:
iscsi-stonith-device (stonith:fence_scsi): Started serverHA1
Jan 30 12:13:03 [2513] serverHA1 pengine: info: native_color:
Resource ping:1 cannot run anywhere
Jan 30 12:13:03 [2513] serverHA1 pengine: info: native_color:
Resource clusterfs:1 cannot run anywhere
Jan 30 12:13:03 [2513] serverHA1 pengine: info: probe_resources:
Action probe_complete-serverHA2 on serverHA2 is unrunnable (pending)
Jan 30 12:13:03 [2513] serverHA1 pengine: warning: custom_action: Action
ping:0_monitor_0 on serverHA2 is unrunnable (pending)
Jan 30 12:13:03 [2513] serverHA1 pengine: warning: custom_action: Action
clusterfs:0_monitor_0 on serverHA2 is unrunnable (pending)
Jan 30 12:13:03 [2513] serverHA1 pengine: warning: custom_action: Action
iscsi-stonith-device_monitor_0 on serverHA2 is unrunnable (pending)
Jan 30 12:13:03 [2513] serverHA1 pengine: notice: trigger_unfencing:
Unfencing serverHA2: node discovery
Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave
ping:0 (Started serverHA1)
Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave
ping:1 (Stopped)
Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave
clusterfs:0 (Started serverHA1)
Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave
clusterfs:1 (Stopped)
Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave
iscsi-stonith-device (Started serverHA1)
Jan 30 12:13:03 [2514] serverHA1 crmd: info: do_state_transition:
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Jan 30 12:13:03 [2514] serverHA1 crmd: info: do_te_invoke:
Processing graph 190 (ref=pe_calc-dc-1422616383-287) derived from
/var/lib/pacemaker/pengine/pe-input-146.bz2
Jan 30 12:13:03 [2513] serverHA1 pengine: notice: process_pe_message:
Calculated Transition 190: /var/lib/pacemaker/pengine/pe-input-146.bz2
Jan 30 12:13:03 [2514] serverHA1 crmd: notice: te_fence_node:
Executing on fencing operation (5) on serverHA2 (timeout=60000)
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: notice: handle_request: Client
crmd.2514.b5961dc1 wants to fence (on) 'serverHA2' with device '(any)'
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: notice:
initiate_remote_stonith_op: Initiating remote operation on for serverHA2:
e19629dc-bec3-4e63-baf6-a7ecd5ed44bb (0)
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: info:
process_remote_stonith_query: Query result 2 of 2 from serverHA2 for
serverHA2/on (1 devices) e19629dc-bec3-4e63-baf6-a7ecd5ed44bb
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: info:
process_remote_stonith_query: All queries have arrived, continuing (2, 2, 2)
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: info: call_remote_stonith:
Total remote op timeout set to 60 for fencing of node serverHA2 for
crmd.2514.e19629dc
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: info: call_remote_stonith:
Requesting that serverHA2 perform op on serverHA2 for crmd.2514 (72s)
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: warning: get_xpath_object:
No match for //@st_delegate in /st-reply
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: notice: remote_op_done:
Operation on of serverHA2 by serverHA2 for crmd.2514 at serverHA1.e19629dc: OK
Jan 30 12:13:03 [2514] serverHA1 crmd: notice:
tengine_stonith_callback: Stonith operation
9/5:190:0:4e500b84-bb92-4406-8f9c-f4140dd40ec7: OK (0)
Jan 30 12:13:03 [2514] serverHA1 crmd: notice: tengine_stonith_notify:
serverHA2 was successfully unfenced by serverHA2 (at the request of serverHA1)
Jan 30 12:13:03 [2514] serverHA1 crmd: notice: run_graph:
Transition 190 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-146.bz2): Complete
Jan 30 12:13:03 [2514] serverHA1 crmd: info: do_log: FSA: Input
I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE
Jan 30 12:13:03 [2514] serverHA1 crmd: notice: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
log serverHA2
Jan 30 12:13:11 [2627] serverHA2 stonith-ng: notice:
can_fence_host_with_device: iscsi-stonith-device can fence (reboot)
serverHA2: static-list
Jan 30 12:13:11 [2627] serverHA2 stonith-ng: notice: remote_op_done:
Operation reboot of serverHA2 by serverHA1 for
stonith_admin.1907 at serverHA1.70b75107: OK
Jan 30 12:13:11 [2631] serverHA2 crmd: crit: tengine_stonith_notify:
We were alegedly just fenced by serverHA1 for serverHA1!
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: Diff:
--- 0.51.86 2
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: Diff:
+++ 0.51.87 (null)
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: +
/cib: @num_updates=87
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: +
/cib/status/node_state[@id='serverHA2']:
@crm-debug-origin=send_stonith_update, @join=down, @expected=down
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_process_request:
Completed cib_modify operation for section status: OK (rc=0,
origin=serverHA1/crmd/333, version=0.51.87)
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: Diff:
--- 0.51.87 2
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: Diff:
+++ 0.51.88 (null)
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: --
/cib/status/node_state[@id='serverHA2']/lrm[@id='serverHA2']
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: +
/cib: @num_updates=88
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_process_request:
Completed cib_delete operation for section
//node_state[@uname='serverHA2']/lrm: OK (rc=0, origin=serverHA1/crmd/334,
version=0.51.88)
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: Diff:
--- 0.51.88 2
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: Diff:
+++ 0.51.89 (null)
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: --
/cib/status/node_state[@id='serverHA2']/transient_attributes[@id='serverHA2']
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: +
/cib: @num_updates=89
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_process_request:
Completed cib_delete operation for section
//node_state[@uname='serverHA2']/transient_attributes: OK (rc=0,
origin=serverHA1/crmd/335, version=0.51.89)
Jan 30 12:13:11 [2627] serverHA2 stonith-ng: notice:
can_fence_host_with_device: iscsi-stonith-device can fence (on) serverHA2:
static-list
Jan 30 12:13:11 [2627] serverHA2 stonith-ng: notice:
can_fence_host_with_device: iscsi-stonith-device can fence (on) serverHA2:
static-list
Jan 30 12:13:11 [2627] serverHA2 stonith-ng: info:
stonith_fence_get_devices_cb: Found 1 matching devices for 'serverHA2'
Jan 30 12:13:11 [2627] serverHA2 stonith-ng: notice: log_operation:
Operation 'on' [3037] (call 9 from crmd.2514) for host 'serverHA2' with device
'iscsi-stonith-device' returned: 0 (OK)
Jan 30 12:13:11 [2627] serverHA2 stonith-ng: notice: remote_op_done:
Operation on of serverHA2 by serverHA2 for crmd.2514 at serverHA1.e19629dc: OK
I will continue testing....
Andrea
More information about the Pacemaker
mailing list