Hi again,<br /><br />should i open a bug report about this issue ?<br /><br />Thanks<br /><br />Thomas<br /><br />Thomas Börnert schrieb am 02.03.2012 um 12:06 Uhr<br /><br />> Hi List,<br /> ><br /> > my problem is that stonith will execute the command to fence on the remote<br /> > dead host and not on the local machine :-(. this will end with an timeout.<br /> ><br /> > some facts:<br /> > - 2 node cluster with 2 <br /> > dell servers<br /> > - each server have an own drac card<br /> > - pacemaker 1.1.6<br /> > - heartbeat 3.0.4<br /> > - corosync 1.4.1<br /> ><br /> > node1 should fence node2 if node2 is dead and<br /> > node2 should fence node1 if node1 is dead<br /> ><br /> > it <br /> > works fine manual with the stonith script<br /> > fence_drac5 ....<br /> ><br /> > my config<br /> > <---------------------------------- snip --------------------------------><br /> > node node1 \<br /> > attributes standby="off"<br /> > node node2 \<br /> ><br /> > attributes standby="off"<br /> > primitive httpd ocf:heartbeat:apache \<br /> > params configfile="/etc/httpd/conf/httpd.conf" port="80" \<br /> > op start interval="0" timeout="60s" \<br /> > op monitor <br /> > interval="5s" timeout="20s" \<br /> > op stop interval="0" timeout="60s"<br /> > primitive node1-stonith stonith:fence_drac5 \<br /> > params ipaddr="192.168.1.101" login="root" passwd="1234" action="reboot" <br /> > secure="true" cmd_prompt="admin1->" power_wait="300" pcmk_host_list="node1"<br /> > primitive node2-stonith stonith:fence_drac5 \<br /> > params ipaddr="192.168.1.102" login="root" passwd="1234" action="reboot" <br /> > secure="true" cmd_prompt="admin1->" power_wait="300" pcmk_host_list="node2"<br /> > primitive nodeIP ocf:heartbeat:IPaddr2 \<br /> > op monitor interval="60" timeout="20" \<br /> > params ip="192.168.1.10" <br /> > cidr_netmask="24" nic="eth0:0" broadcast="192.168.1.255"<br /> > primitive nodeIParp ocf:heartbeat:SendArp \<br /> > params ip="192.168.1.10" nic="eth0:0"<br /> > group WebServices nodeIP nodeIParp httpd<br /> > location <br /> > node1-stonith-log node1-stonith -inf: node1<br /> > location node2-stonith-log node2-stonith -inf: node2<br /> > property $id="cib-bootstrap-options" \<br /> > <br /> > dc-version="1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558" \<br /> > cluster-infrastructure="openais" \<br /> > expected-quorum-votes="2" \<br /> > stonith-enabled="true" \<br /> > no-quorum-policy="ignore" \<br /> ><br /> > last-lrm-refresh="1330685786"<br /> > <---------------------------------- snip --------------------------------><br /> ><br /> > [root@node2 ~]# stonith_admin -l node1<br /> > node1-stonith<br /> > 1 devices found<br /> ><br /> > it seems ok<br /> ><br /> > now <br /> > i try<br /> ><br /> > [root@node2 ~]# stonith_admin -V -F node1<br /> > stonith_admin[5685]: 2012/03/02_13:00:44 debug: main: Create<br /> > stonith_admin[5685]: 2012/03/02_13:00:44 debug: init_client_ipc_comms_nodispatch: <br /> > Attempting to talk on: /var/run/crm/st_command<br /> > stonith_admin[5685]: 2012/03/02_13:00:44 debug: get_stonith_token: Obtained registration token: 6258828b-4b19-472f-9256-8da36fe87962<br /> ><br /> > stonith_admin[5685]: 2012/03/02_13:00:44 debug: init_client_ipc_comms_nodispatch: Attempting to talk on: /var/run/crm/st_callback<br /> > stonith_admin[5685]: 2012/03/02_13:00:44 debug: get_stonith_token: <br /> > Obtained registration token: 6266ebb8-2112-4378-a00c-3eaff47c9a9d<br /> > stonith_admin[5685]: 2012/03/02_13:00:44 debug: stonith_api_signon: Connection to STONITH successful<br /> > stonith_admin[5685]: <br /> > 2012/03/02_13:00:44 debug: main: Connect: 0<br /> > Command failed: Operation timed out<br /> > stonith_admin[5685]: 2012/03/02_13:00:56 debug: stonith_api_signoff: Signing out of the STONITH Service<br /> ><br /> > stonith_admin[5685]: 2012/03/02_13:00:56 debug: main: Disconnect: -8<br /> > stonith_admin[5685]: 2012/03/02_13:00:56 debug: main: Destroy<br /> ><br /> > the log on node2 shows:<br /> ><br /> > <----------------------------------------------- snip ---------------------------------------><br /> > Mar 2 13:00:58 node2 crmd: [2665]: info: te_fence_node: Executing reboot fencing operation (21) on <br /> > node1 (timeout=60000)<br /> > Mar 2 13:00:58 node2 stonith-ng: [2660]: info: initiate_remote_stonith_op: Initiating remote operation reboot for node1: 3325df94-8d59-4c00-a37e-be31e79f7503<br /> > Mar 2 13:00:58 <br /> > node2 stonith-ng: [2638]: info: stonith_command: Processed st_query from node2: rc=0<br /> > <----------------------------------------------- snip ---------------------------------------><br /> ><br /> > why remote on the <br /> > dead host ?<br /> ><br /> > Thanks<br /> ><br /> > Thomas<br /> ><br /> > the complete log<br /> > <----------------------------------------------- snip ---------------------------------------><br /> > Mar 2 13:00:44 node2 stonith_admin: [5685]: info: <br /> > crm_log_init_worker: Changed active directory to /var/lib/heartbeat/cores/root<br /> > Mar 2 13:00:44 node2 stonith-ng: [2660]: info: initiate_remote_stonith_op: Initiating remote operation off for node1: <br /> > 7d8beca4-1853-44fd-9bb2-4015b080c37b<br /> > Mar 2 13:00:44 node2 stonith-ng: [2638]: info: stonith_command: Processed st_query from node2: rc=0<br /> > Mar 2 13:00:46 node2 stonith-ng: [2660]: ERROR: <br /> > remote_op_query_timeout: Query 561e89af-6f5a-45cb-adc2-45389940f1db for node1 timed out<br /> > Mar 2 13:00:46 node2 stonith-ng: [2660]: ERROR: remote_op_timeout: Action reboot <br /> > (561e89af-6f5a-45cb-adc2-45389940f1db) for node1 timed out<br /> > Mar 2 13:00:46 node2 stonith-ng: [2660]: info: remote_op_done: Notifing clients of 561e89af-6f5a-45cb-adc2-45389940f1db (reboot of node1 <br /> > from 8231841e-3537-44a9-8870-899d0d846c42 by (null)): 0, rc=-8<br /> > Mar 2 13:00:46 node2 stonith-ng: [2660]: info: stonith_notify_client: Sending st_fence-notification to client <br /> > 2665/ff16ec78-3634-444c-88a6-275ce79eec6b<br /> > Mar 2 13:00:46 node2 crmd: [2665]: info: tengine_stonith_callback: StonithOp <remote-op state="0" st_target="node1" st_op="reboot" ><br /> > Mar 2 13:00:46 node2 <br /> > crmd: [2665]: info: tengine_stonith_callback: Stonith operation 798/21:815:0:d274c31a-571b-4e22-b453-1c151a8871b1: Operation timed out (-8)<br /> > Mar 2 13:00:46 node2 crmd: [2665]: ERROR: <br /> > tengine_stonith_callback: Stonith of node1 failed (-8)... aborting transition.<br /> > Mar 2 13:00:46 node2 crmd: [2665]: info: abort_transition_graph: tengine_stonith_callback:454 - Triggered transition <br /> > abort (complete=0) : Stonith failed<br /> > Mar 2 13:00:46 node2 crmd: [2665]: info: update_abort_priority: Abort priority upgraded from 0 to 1000000<br /> > Mar 2 13:00:46 node2 crmd: [2665]: info: <br /> > update_abort_priority: Abort action done superceeded by restart<br /> > Mar 2 13:00:46 node2 crmd: [2665]: ERROR: tengine_stonith_notify: Peer node1 could not be terminated (reboot) by <anyone> for node2 <br /> > (ref=561e89af-6f5a-45cb-adc2-45389940f1db): Operation timed out<br /> > Mar 2 13:00:46 node2 crmd: [2665]: info: run_graph: ====================================================<br /> > Mar 2 13:00:46 node2 crmd: <br /> > [2665]: notice: run_graph: Transition 815 (Complete=3, Pending=0, Fired=0, Skipped=14, Incomplete=0, Source=/var/lib/pengine/pe-warn-39.bz2): Stopped<br /> > Mar 2 13:00:46 node2 crmd: [2665]: info: <br /> > te_graph_trigger: Transition 815 is now complete<br /> > Mar 2 13:00:46 node2 crmd: [2665]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC <br /> > cause=C_FSA_INTERNAL origin=notify_crmd ]<br /> > Mar 2 13:00:46 node2 crmd: [2665]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.<br /> > Mar 2 13:00:46 node2 crmd: [2665]: info: <br /> > do_pe_invoke: Query 1271: Requesting the current CIB: S_POLICY_ENGINE<br /> > Mar 2 13:00:46 node2 crmd: [2665]: info: do_pe_invoke_callback: Invoking the PE: query=1271, ref=pe_calc-dc-1330689646-1028, <br /> > seq=404, quorate=0<br /> > Mar 2 13:00:46 node2 pengine: [2664]: notice: unpack_config: On loss of CCM Quorum: Ignore<br /> > Mar 2 13:00:46 node2 pengine: [2664]: WARN: pe_fence_node: Node node1 will be fenced <br /> > because it is un-expectedly down<br /> > Mar 2 13:00:46 node2 pengine: [2664]: WARN: determine_online_status: Node node1 is unclean<br /> > Mar 2 13:00:46 node2 pengine: [2664]: notice: unpack_rsc_op: Operation <br /> > nodeIParp_last_failure_0 found resource nodeIParp active on node2<br /> > Mar 2 13:00:46 node2 pengine: [2664]: notice: unpack_rsc_op: Operation node1-stonith_last_failure_0 found resource node1-stonith <br /> > active on node2<br /> > Mar 2 13:00:46 node2 pengine: [2664]: notice: unpack_rsc_op: Operation nodeIParp_last_failure_0 found resource nodeIParp active on node1<br /> > Mar 2 13:00:46 node2 pengine: [2664]: <br /> > notice: unpack_rsc_op: Operation nodeIP_last_failure_0 found resource nodeIP active on node1<br /> > Mar 2 13:00:46 node2 pengine: [2664]: notice: unpack_rsc_op: Operation httpd_last_failure_0 found <br /> > resource httpd active on node1<br /> > Mar 2 13:00:46 node2 pengine: [2664]: WARN: custom_action: Action nodeIP_stop_0 on node1 is unrunnable (offline)<br /> > Mar 2 13:00:46 node2 pengine: [2664]: WARN: <br /> > custom_action: Marking node node1 unclean<br /> > Mar 2 13:00:46 node2 pengine: [2664]: notice: RecurringOp: Start recurring monitor (60s) for nodeIP on node2<br /> > Mar 2 13:00:46 node2 pengine: [2664]: WARN: <br /> > custom_action: Action nodeIParp_stop_0 on node1 is unrunnable (offline)<br /> > Mar 2 13:00:46 node2 pengine: [2664]: WARN: custom_action: Marking node node1 unclean<br /> > Mar 2 13:00:46 node2 pengine: [2664]: <br /> > WARN: custom_action: Action httpd_stop_0 on node1 is unrunnable (offline)<br /> > Mar 2 13:00:46 node2 pengine: [2664]: WARN: custom_action: Marking node node1 unclean<br /> > Mar 2 13:00:46 node2 pengine: [2664]:<br /> > notice: RecurringOp: Start recurring monitor (5s) for httpd on node2<br /> > Mar 2 13:00:46 node2 pengine: [2664]: WARN: custom_action: Action node2-stonith_stop_0 on node1 is unrunnable (offline)<br /> > Mar 2 <br /> > 13:00:46 node2 pengine: [2664]: WARN: custom_action: Marking node node1 unclean<br /> > Mar 2 13:00:46 node2 pengine: [2664]: WARN: stage6: Scheduling Node node1 for STONITH<br /> > Mar 2 13:00:46 node2 pengine: <br /> > [2664]: notice: LogActions: Move nodeIP#011(Started node1 -> node2)<br /> > Mar 2 13:00:46 node2 pengine: [2664]: notice: LogActions: Move nodeIParp#011(Started node1 -> node2)<br /> > Mar 2 13:00:46 node2 <br /> > pengine: [2664]: notice: LogActions: Move httpd#011(Started node1 -> node2)<br /> > Mar 2 13:00:46 node2 pengine: [2664]: notice: LogActions: Leave node1-stonith#011(Started node2)<br /> > Mar 2 13:00:46 <br /> > node2 pengine: [2664]: notice: LogActions: Stop node2-stonith#011(node1)<br /> > Mar 2 13:00:46 node2 pengine: [2664]: WARN: process_pe_message: Transition 816: WARNINGs found during PE processing. <br /> > PEngine Input stored in: /var/lib/pengine/pe-warn-39.bz2<br /> > Mar 2 13:00:46 node2 pengine: [2664]: notice: process_pe_message: Configuration WARNINGs found during PE processing. Please run "crm_verify <br /> > -L" to identify issues.<br /> > Mar 2 13:00:46 node2 crmd: [2665]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE <br /> > origin=handle_response ]<br /> > Mar 2 13:00:46 node2 crmd: [2665]: info: unpack_graph: Unpacked transition 816: 17 actions in 17 synapses<br /> > Mar 2 13:00:46 node2 crmd: [2665]: info: do_te_invoke: Processing <br /> > graph 816 (ref=pe_calc-dc-1330689646-1028) derived from /var/lib/pengine/pe-warn-39.bz2<br /> > Mar 2 13:00:46 node2 crmd: [2665]: info: te_pseudo_action: Pseudo action 18 fired and confirmed<br /> > Mar 2 <br /> > 13:00:46 node2 crmd: [2665]: info: te_pseudo_action: Pseudo action 19 fired and confirmed<br /> > Mar 2 13:00:46 node2 crmd: [2665]: info: te_fence_node: Executing reboot fencing operation (21) on node1 <br /> > (timeout=60000)<br /> > Mar 2 13:00:46 node2 stonith-ng: [2660]: info: initiate_remote_stonith_op: Initiating remote operation reboot for node1: 07f10c9c-b33e-41b4-8781-fb32eb850bd2<br /> > Mar 2 13:00:46 node2 <br /> > stonith-ng: [2638]: info: stonith_command: Processed st_query from node2: rc=0<br /> > Mar 2 13:00:52 node2 stonith-ng: [2660]: ERROR: remote_op_query_timeout: Query 07f10c9c-b33e-41b4-8781-fb32eb850bd2 for<br /> > node1 timed out<br /> > Mar 2 13:00:52 node2 stonith-ng: [2660]: ERROR: remote_op_timeout: Action reboot (07f10c9c-b33e-41b4-8781-fb32eb850bd2) for node1 timed out<br /> > Mar 2 13:00:52 node2 stonith-ng: [2660]:<br /> > info: remote_op_done: Notifing clients of 07f10c9c-b33e-41b4-8781-fb32eb850bd2 (reboot of node1 from 8231841e-3537-44a9-8870-899d0d846c42 by (null)): 0, rc=-8<br /> > Mar 2 13:00:52 node2 stonith-ng: <br /> > [2660]: info: stonith_notify_client: Sending st_fence-notification to client 2665/ff16ec78-3634-444c-88a6-275ce79eec6b<br /> > Mar 2 13:00:52 node2 crmd: [2665]: info: tengine_stonith_callback: StonithOp <br /> > <remote-op state="0" st_target="node1" st_op="reboot" ><br /> > Mar 2 13:00:52 node2 crmd: [2665]: info: tengine_stonith_callback: Stonith operation 799/21:816:0:d274c31a-571b-4e22-b453-1c151a8871b1: <br /> > Operation timed out (-8)<br /> > Mar 2 13:00:52 node2 crmd: [2665]: ERROR: tengine_stonith_callback: Stonith of node1 failed (-8)... aborting transition.<br /> > Mar 2 13:00:52 node2 crmd: [2665]: info: <br /> > abort_transition_graph: tengine_stonith_callback:454 - Triggered transition abort (complete=0) : Stonith failed<br /> > Mar 2 13:00:52 node2 crmd: [2665]: info: update_abort_priority: Abort priority <br /> > upgraded from 0 to 1000000<br /> > Mar 2 13:00:52 node2 crmd: [2665]: info: update_abort_priority: Abort action done superceeded by restart<br /> > Mar 2 13:00:52 node2 crmd: [2665]: ERROR: tengine_stonith_notify:<br /> > Peer node1 could not be terminated (reboot) by <anyone> for node2 (ref=07f10c9c-b33e-41b4-8781-fb32eb850bd2): Operation timed out<br /> > Mar 2 13:00:52 node2 crmd: [2665]: info: run_graph: <br /> > ====================================================<br /> > Mar 2 13:00:52 node2 crmd: [2665]: notice: run_graph: Transition 816 (Complete=3, Pending=0, Fired=0, Skipped=14, Incomplete=0, <br /> > Source=/var/lib/pengine/pe-warn-39.bz2): Stopped<br /> > Mar 2 13:00:52 node2 crmd: [2665]: info: te_graph_trigger: Transition 816 is now complete<br /> > Mar 2 13:00:52 node2 crmd: [2665]: info: <br /> > do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]<br /> > Mar 2 13:00:52 node2 crmd: [2665]: info: do_state_transition:<br /> > All 1 cluster nodes are eligible to run resources.<br /> > Mar 2 13:00:52 node2 crmd: [2665]: info: do_pe_invoke: Query 1272: Requesting the current CIB: S_POLICY_ENGINE<br /> > Mar 2 13:00:52 node2 crmd: [2665]:<br /> > info: do_pe_invoke_callback: Invoking the PE: query=1272, ref=pe_calc-dc-1330689652-1029, seq=404, quorate=0<br /> > Mar 2 13:00:52 node2 pengine: [2664]: notice: unpack_config: On loss of CCM Quorum: <br /> > Ignore<br /> > Mar 2 13:00:52 node2 pengine: [2664]: WARN: pe_fence_node: Node node1 will be fenced because it is un-expectedly down<br /> > Mar 2 13:00:52 node2 pengine: [2664]: WARN: determine_online_status: <br /> > Node node1 is unclean<br /> > Mar 2 13:00:52 node2 pengine: [2664]: notice: unpack_rsc_op: Operation nodeIParp_last_failure_0 found resource nodeIParp active on node2<br /> > Mar 2 13:00:52 node2 pengine: [2664]: <br /> > notice: unpack_rsc_op: Operation node1-stonith_last_failure_0 found resource node1-stonith active on node2<br /> > Mar 2 13:00:52 node2 pengine: [2664]: notice: unpack_rsc_op: Operation <br /> > nodeIParp_last_failure_0 found resource nodeIParp active on node1<br /> > Mar 2 13:00:52 node2 pengine: [2664]: notice: unpack_rsc_op: Operation nodeIP_last_failure_0 found resource nodeIP active on node1<br /> ><br /> > Mar 2 13:00:52 node2 pengine: [2664]: notice: unpack_rsc_op: Operation httpd_last_failure_0 found resource httpd active on node1<br /> > Mar 2 13:00:52 node2 pengine: [2664]: WARN: custom_action: Action <br /> > nodeIP_stop_0 on node1 is unrunnable (offline)<br /> > Mar 2 13:00:52 node2 pengine: [2664]: WARN: custom_action: Marking node node1 unclean<br /> > Mar 2 13:00:52 node2 pengine: [2664]: notice: RecurringOp: <br /> > Start recurring monitor (60s) for nodeIP on node2<br /> > Mar 2 13:00:52 node2 pengine: [2664]: WARN: custom_action: Action nodeIParp_stop_0 on node1 is unrunnable (offline)<br /> > Mar 2 13:00:52 node2 pengine: <br /> > [2664]: WARN: custom_action: Marking node node1 unclean<br /> > Mar 2 13:00:52 node2 pengine: [2664]: WARN: custom_action: Action httpd_stop_0 on node1 is unrunnable (offline)<br /> > Mar 2 13:00:52 node2 pengine:<br /> > [2664]: WARN: custom_action: Marking node node1 unclean<br /> > Mar 2 13:00:52 node2 pengine: [2664]: notice: RecurringOp: Start recurring monitor (5s) for httpd on node2<br /> > Mar 2 13:00:52 node2 pengine: <br /> > [2664]: WARN: custom_action: Action node2-stonith_stop_0 on node1 is unrunnable (offline)<br /> > Mar 2 13:00:52 node2 pengine: [2664]: WARN: custom_action: Marking node node1 unclean<br /> > Mar 2 13:00:52 node2 <br /> > pengine: [2664]: WARN: stage6: Scheduling Node node1 for STONITH<br /> > Mar 2 13:00:52 node2 pengine: [2664]: notice: LogActions: Move nodeIP#011(Started node1 -> node2)<br /> > Mar 2 13:00:52 node2 pengine: <br /> > [2664]: notice: LogActions: Move nodeIParp#011(Started node1 -> node2)<br /> > Mar 2 13:00:52 node2 pengine: [2664]: notice: LogActions: Move httpd#011(Started node1 -> node2)<br /> > Mar 2 13:00:52 node2 <br /> > pengine: [2664]: notice: LogActions: Leave node1-stonith#011(Started node2)<br /> > Mar 2 13:00:52 node2 pengine: [2664]: notice: LogActions: Stop node2-stonith#011(node1)<br /> > Mar 2 13:00:52 node2 <br /> > pengine: [2664]: WARN: process_pe_message: Transition 817: WARNINGs found during PE processing. PEngine Input stored in: /var/lib/pengine/pe-warn-39.bz2<br /> > Mar 2 13:00:52 node2 pengine: [2664]: notice:<br /> > process_pe_message: Configuration WARNINGs found during PE processing. Please run "crm_verify -L" to identify issues.<br /> > Mar 2 13:00:52 node2 crmd: [2665]: info: do_state_transition: State transition<br /> > S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]<br /> > Mar 2 13:00:52 node2 crmd: [2665]: info: unpack_graph: Unpacked transition 817: 17 actions <br /> > in 17 synapses<br /> > Mar 2 13:00:52 node2 crmd: [2665]: info: do_te_invoke: Processing graph 817 (ref=pe_calc-dc-1330689652-1029) derived from /var/lib/pengine/pe-warn-39.bz2<br /> > Mar 2 13:00:52 node2 crmd: <br /> > [2665]: info: te_pseudo_action: Pseudo action 18 fired and confirmed<br /> > Mar 2 13:00:52 node2 crmd: [2665]: info: te_pseudo_action: Pseudo action 19 fired and confirmed<br /> > Mar 2 13:00:52 node2 crmd: <br /> > [2665]: info: te_fence_node: Executing reboot fencing operation (21) on node1 (timeout=60000)<br /> > Mar 2 13:00:52 node2 stonith-ng: [2660]: info: initiate_remote_stonith_op: Initiating remote operation <br /> > reboot for node1: a4ebce93-0eee-43dd-b610-0115e62b0285<br /> > Mar 2 13:00:52 node2 stonith-ng: [2638]: info: stonith_command: Processed st_query from node2: rc=0<br /> > Mar 2 13:00:56 node2 stonith-ng: [2660]: <br /> > ERROR: remote_op_query_timeout: Query 7d8beca4-1853-44fd-9bb2-4015b080c37b for node1 timed out<br /> > Mar 2 13:00:56 node2 stonith-ng: [2660]: ERROR: remote_op_timeout: Action off <br /> > (7d8beca4-1853-44fd-9bb2-4015b080c37b) for node1 timed out<br /> > Mar 2 13:00:56 node2 stonith-ng: [2660]: info: remote_op_done: Notifing clients of 7d8beca4-1853-44fd-9bb2-4015b080c37b (off of node1 from <br /> > 6258828b-4b19-472f-9256-8da36fe87962 by (null)): 0, rc=-8<br /> > Mar 2 13:00:56 node2 stonith-ng: [2660]: info: stonith_notify_client: Sending st_fence-notification to client <br /> > 2665/ff16ec78-3634-444c-88a6-275ce79eec6b<br /> > Mar 2 13:00:56 node2 crmd: [2665]: ERROR: tengine_stonith_notify: Peer node1 could not be terminated (off) by <anyone> for node2 <br /> > (ref=7d8beca4-1853-44fd-9bb2-4015b080c37b): Operation timed out<br /> > Mar 2 13:00:58 node2 stonith-ng: [2660]: ERROR: remote_op_query_timeout: Query a4ebce93-0eee-43dd-b610-0115e62b0285 for node1 timed <br /> > out<br /> > Mar 2 13:00:58 node2 stonith-ng: [2660]: ERROR: remote_op_timeout: Action reboot (a4ebce93-0eee-43dd-b610-0115e62b0285) for node1 timed out<br /> > Mar 2 13:00:58 node2 stonith-ng: [2660]: info: <br /> > remote_op_done: Notifing clients of a4ebce93-0eee-43dd-b610-0115e62b0285 (reboot of node1 from 8231841e-3537-44a9-8870-899d0d846c42 by (null)): 0, rc=-8<br /> > Mar 2 13:00:58 node2 stonith-ng: [2660]: <br /> > info: stonith_notify_client: Sending st_fence-notification to client 2665/ff16ec78-3634-444c-88a6-275ce79eec6b<br /> > Mar 2 13:00:58 node2 crmd: [2665]: info: tengine_stonith_callback: StonithOp <remote-op><br /> > state="0" st_target="node1" st_op="reboot" /><br /> > Mar 2 13:00:58 node2 crmd: [2665]: info: tengine_stonith_callback: Stonith operation 800/21:817:0:d274c31a-571b-4e22-b453-1c151a8871b1: Operation timed<br /> > out (-8)<br /> > Mar 2 13:00:58 node2 crmd: [2665]: ERROR: tengine_stonith_callback: Stonith of node1 failed (-8)... aborting transition.<br /> > Mar 2 13:00:58 node2 crmd: [2665]: info: abort_transition_graph: <br /> > tengine_stonith_callback:454 - Triggered transition abort (complete=0) : Stonith failed<br /> > Mar 2 13:00:58 node2 crmd: [2665]: info: update_abort_priority: Abort priority upgraded from 0 to 1000000<br /> > Mar <br /> > 2 13:00:58 node2 crmd: [2665]: info: update_abort_priority: Abort action done superceeded by restart<br /> > Mar 2 13:00:58 node2 crmd: [2665]: ERROR: tengine_stonith_notify: Peer node1 could not be <br /> > terminated (reboot) by <anyone> for node2 (ref=a4ebce93-0eee-43dd-b610-0115e62b0285): Operation timed out<br /> > Mar 2 13:00:58 node2 crmd: [2665]: info: run_graph: <br /> > ====================================================<br /> > Mar 2 13:00:58 node2 crmd: [2665]: notice: run_graph: Transition 817 (Complete=3, Pending=0, Fired=0, Skipped=14, Incomplete=0, <br /> > Source=/var/lib/pengine/pe-warn-39.bz2): Stopped<br /> > Mar 2 13:00:58 node2 crmd: [2665]: info: te_graph_trigger: Transition 817 is now complete<br /> > Mar 2 13:00:58 node2 crmd: [2665]: info: <br /> > do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]<br /> > Mar 2 13:00:58 node2 crmd: [2665]: info: do_state_transition:<br /> > All 1 cluster nodes are eligible to run resources.<br /> > Mar 2 13:00:58 node2 crmd: [2665]: info: do_pe_invoke: Query 1273: Requesting the current CIB: S_POLICY_ENGINE<br /> > Mar 2 13:00:58 node2 crmd: [2665]:<br /> > info: do_pe_invoke_callback: Invoking the PE: query=1273, ref=pe_calc-dc-1330689658-1030, seq=404, quorate=0<br /> > Mar 2 13:00:58 node2 pengine: [2664]: notice: unpack_config: On loss of CCM Quorum: <br /> > Ignore<br /> > Mar 2 13:00:58 node2 pengine: [2664]: WARN: pe_fence_node: Node node1 will be fenced because it is un-expectedly down<br /> > Mar 2 13:00:58 node2 pengine: [2664]: WARN: determine_online_status: <br /> > Node node1 is unclean<br /> > Mar 2 13:00:58 node2 pengine: [2664]: notice: unpack_rsc_op: Operation nodeIParp_last_failure_0 found resource nodeIParp active on node2<br /> > Mar 2 13:00:58 node2 pengine: [2664]: <br /> > notice: unpack_rsc_op: Operation node1-stonith_last_failure_0 found resource node1-stonith active on node2<br /> > Mar 2 13:00:58 node2 pengine: [2664]: notice: unpack_rsc_op: Operation <br /> > nodeIParp_last_failure_0 found resource nodeIParp active on node1<br /> > Mar 2 13:00:58 node2 pengine: [2664]: notice: unpack_rsc_op: Operation nodeIP_last_failure_0 found resource nodeIP active on node1<br /> ><br /> > Mar 2 13:00:58 node2 pengine: [2664]: notice: unpack_rsc_op: Operation httpd_last_failure_0 found resource httpd active on node1<br /> > Mar 2 13:00:58 node2 pengine: [2664]: WARN: custom_action: Action <br /> > nodeIP_stop_0 on node1 is unrunnable (offline)<br /> > Mar 2 13:00:58 node2 pengine: [2664]: WARN: custom_action: Marking node node1 unclean<br /> > Mar 2 13:00:58 node2 pengine: [2664]: notice: RecurringOp: <br /> > Start recurring monitor (60s) for nodeIP on node2<br /> > Mar 2 13:00:58 node2 pengine: [2664]: WARN: custom_action: Action nodeIParp_stop_0 on node1 is unrunnable (offline)<br /> > Mar 2 13:00:58 node2 pengine: <br /> > [2664]: WARN: custom_action: Marking node node1 unclean<br /> > Mar 2 13:00:58 node2 pengine: [2664]: WARN: custom_action: Action httpd_stop_0 on node1 is unrunnable (offline)<br /> > Mar 2 13:00:58 node2 pengine:<br /> > [2664]: WARN: custom_action: Marking node node1 unclean<br /> > Mar 2 13:00:58 node2 pengine: [2664]: notice: RecurringOp: Start recurring monitor (5s) for httpd on node2<br /> > Mar 2 13:00:58 node2 pengine: <br /> > [2664]: WARN: custom_action: Action node2-stonith_stop_0 on node1 is unrunnable (offline)<br /> > Mar 2 13:00:58 node2 pengine: [2664]: WARN: custom_action: Marking node node1 unclean<br /> > Mar 2 13:00:58 node2 <br /> > pengine: [2664]: WARN: stage6: Scheduling Node node1 for STONITH<br /> > Mar 2 13:00:58 node2 pengine: [2664]: notice: LogActions: Move nodeIP#011(Started node1 -> node2)<br /> > Mar 2 13:00:58 node2 pengine: <br /> > [2664]: notice: LogActions: Move nodeIParp#011(Started node1 -> node2)<br /> > Mar 2 13:00:58 node2 pengine: [2664]: notice: LogActions: Move httpd#011(Started node1 -> node2)<br /> > Mar 2 13:00:58 node2 <br /> > pengine: [2664]: notice: LogActions: Leave node1-stonith#011(Started node2)<br /> > Mar 2 13:00:58 node2 pengine: [2664]: notice: LogActions: Stop node2-stonith#011(node1)<br /> > Mar 2 13:00:58 node2 <br /> > pengine: [2664]: WARN: process_pe_message: Transition 818: WARNINGs found during PE processing. PEngine Input stored in: /var/lib/pengine/pe-warn-39.bz2<br /> > Mar 2 13:00:58 node2 pengine: [2664]: notice:<br /> > process_pe_message: Configuration WARNINGs found during PE processing. Please run "crm_verify -L" to identify issues.<br /> > Mar 2 13:00:58 node2 crmd: [2665]: info: do_state_transition: State transition<br /> > S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]<br /> > Mar 2 13:00:58 node2 crmd: [2665]: info: unpack_graph: Unpacked transition 818: 17 actions <br /> > in 17 synapses<br /> > Mar 2 13:00:58 node2 crmd: [2665]: info: do_te_invoke: Processing graph 818 (ref=pe_calc-dc-1330689658-1030) derived from /var/lib/pengine/pe-warn-39.bz2<br /> > Mar 2 13:00:58 node2 crmd: <br /> > [2665]: info: te_pseudo_action: Pseudo action 18 fired and confirmed<br /> > Mar 2 13:00:58 node2 crmd: [2665]: info: te_pseudo_action: Pseudo action 19 fired and confirmed<br /> > Mar 2 13:00:58 node2 crmd: <br /> > [2665]: info: te_fence_node: Executing reboot fencing operation (21) on node1 (timeout=60000)<br /> > Mar 2 13:00:58 node2 stonith-ng: [2660]: info: initiate_remote_stonith_op: Initiating remote operation <br /> > reboot for node1: 3325df94-8d59-4c00-a37e-be31e79f7503<br /> > Mar 2 13:00:58 node2 stonith-ng: [2638]: info: stonith_command: Processed st_query from node2: rc=0<br /> ><br /> > <----------------------------------------------- snip ---------------------------------------><br /> ><br /> ><br /><br /><br />
<div id="mailSignature">--<br />Mit freundlichen Grüßen<br />Best regards<br /><br />Thomas Börnert<br />Gesellschafter Geschäftsführer<br />Senior IT Consultant & Manager<br />BSI lizenzierter Auditor für ISO 27001<br /><br />TBits.net GmbH, Seeweg 6, 73553 Alfdorf, Germany<br />phone: +49 (0)7172 18391-0, fax: +49 (0)7172 18391-99<br />Key fingerprint = 8602 2EF5 78FD 3C04 B148 2506 5D4F 6A49 E4E2 9D15<br />Geschäftsführer: Thomas Börnert, Amtsgericht Stuttgart HRB 281836<br />USt.-IdNr. DE 207 740 994<br /></div></anyone></remote-op></anyone></anyone></remote-op state="0" st_target="node1" st_op="reboot" ></anyone></remote-op state="0" st_target="node1" st_op="reboot" ><br />